domingo, 3 de março de 2013



Once the flat initialization is done, you are ready to begin training the acoustic models for the base or "context-independent" or CI phones. This step is called CI-training. In CI-training, the flat-initialized models are re-estimated through the forward-backward re-estimation algorithm called the Baum-Welch algorithm. This is an iterative re-estimation process, so you have to run many "passes" of the Baum-Welch re-estimation over your training data. Each of these passes, or iterations, results in a slightly better set of models for the CI phones. However, since the objective function maximized in each of theses passes is the likelihood, too many iterations would ultimately result in models which fit very closely to the training data. you might not want this to happen for many reasons. Typically, 5-8 iterations of Baum-Welch are sufficient for getting good estimates of the CI models. You can automatically determine the number of iterations that you need by looking at the total likelihood of the training data at the end of the first iteration and deciding on a "convergence ratio" of likelihoods. This is simply the ratio of the total likelihood in the current iteration to that of the previous iteration. As the models get more and more fitted to the training data in each iteration, the training data likelihoods typically increase monotonically. The convergence ratio is therefore a small positive number. The convergence ratio becomes smaller and smaller as the iterations progress, since each time the current models are a little less different from the previous ones. Convergence ratios are data and task specific, but typical values at which you may stop the Baum-Welch iterations for your CI training may range from 0.1-0.001. When the models are variance-normalized, the convergence ratios are much smaller.
The executable used to run a Buam-Welch iteration is called "bw", and takes the following example arguments for training continuous CI models:
-moddeffn model definition file for CI phones
-ts2cbfn this flag should be set to ".cont." if you are training continuous models, and to ".semi." if you are training semi-continuous models, without the double quotes
-mixwfn name of the file in which the mixture-weights from the previous iteration are stored. Full path must be provided
-mwfloor Floor value for the mixture weights. Any number below the floor value is set to the floor value.
-tmatfn name of the file in which the transition matrices from the previous iteration are stored. Full path must be provided
-meanfn name of the file in which the means from the previous iteration are stored. Full path must be provided
-varfn name of the file in which the variances fromt he previous iteration are stored. Full path must be provided
-dictfn Dictionary
-fdictfn Filler dictionary
-ctlfn control file
-part You can split the training into N equal parts by setting a flag. If there are M utterances in your control file, then this will enable you to run the training separately on each (M/N)th part. This flag may be set to specify which of these parts you want to currently train on. As an example, if your total number of parts is 3, this flag can take one of the values 1,2 or 3
-npart number of parts in which you have split the training
-cepdir directory where your feature files are stored
-cepext the extension that comes after the name listed in the control file. For example, you may have a file called a/b/c.d and may have listed a/b/c in your control file. Then this flag must be given the argument "d", without the double quotes or the dot before it
-lsnfn name of the transcript file
-accumdir Intermediate results from each part of your training will be written in this directory. If you have T means to estimate, then the size of the mean buffer from the current part of your training will be T*4 bytes (say). There will likewise be a variance buffer, a buffer for mixture weights, and a buffer for transition matrices
-varfloor minimum variance value allowed
-topn no. of gaussians to consider for computing the likelihood of each state. For example, if you have 8 gaussians/state models and topn is 4, then the 4 most likely gaussian are used.
-abeam forward beamwidth
-bbeam backward beamwidth
-agc automatic gain control
-cmn cepstral mean normalization
-varnorm variance normalization
-meanreest mean re-estimation
-varreest variance re-estimation
-2passvar Setting this flag to "yes" lets bw use the previous means in the estimation of the variance. The current variance is then estimated as E[(x - prev_mean)2]. If this flag is set to "no" the current estimate of the means are used to estimate variances. This requires the estimation of variance as E[x2] - (E[x])2, an unstable estimator that sometimes results in negative estimates of the variance due to arithmetic imprecision
-tmatreest re-estimate transition matrices or not
-feat feature configuration
-ceplen length of basic feature vector
If you have run the training in many parts, or even if you have run the training in one part, the executable for Baum-Welch described above generates only intermediate buffer(s). The final model parameters, namely the means, variances, mixture-weights and transition matrices, have to be estimated using the values stored in these buffers. This is done by the executable called "norm", which takes the following arguments:
-accumdir Intermediate buffer directory
-feat feature configuration
-mixwfn name of the file in which you want to write the mixture weights. Full path must be provided
-tmatfn name of the file in which you want to write the transition matrices. Full path must be provided
-meanfn name of the file in which you want to write the means. Full path must be provided
-varfn name of the file in which you want to write the variances. Full path must be provided
-ceplen length of basic feature vector
If you have not re-estimated any of the model parameters in the bw step, then the corresponding flag must be omitted from the argument given to the norm executable. The executable will otherwise try to read a non-existent buffer from the buffer directory and will not go through. Thus if you have set -meanreest to be "no" in the argument for bw, then the flag -meanfn must not be given in the argument for norm. This is useful mostly during adaptation. Iterations of baum-welch and norm finally result CI models. The iterations can be stopped once the likelihood on the training data converges. The model parameters computed by norm in the final iteration are now used to initialize the models for context-dependent phones (triphones) with untied states. This is the next major step of the training process. We refer to the process of training triphones HMMs with untied states as the "CD untied training".

Nenhum comentário:

Postar um comentário