TRAINING CONTINUOUS MODELS
Once the flat initialization is done, you are ready to begin training the acoustic models
for the base or "context-independent" or CI phones. This step is
called CI-training. In CI-training, the flat-initialized models
are re-estimated through the forward-backward re-estimation algorithm
called the Baum-Welch algorithm. This is an iterative re-estimation
process, so you have to run many "passes" of the Baum-Welch re-estimation
over your training data. Each of these passes, or iterations, results in a
slightly better set of models for the CI phones. However, since the
objective function maximized in each of theses passes is the likelihood,
too many iterations would ultimately result in models which fit very
closely to the training data. you might not want this to happen for many
reasons. Typically, 5-8 iterations of Baum-Welch are sufficient for
getting good estimates of the CI models. You can automatically determine the
number of iterations that you need by looking at the total likelihood of the
training data at the end of the first iteration and deciding on a
"convergence ratio" of likelihoods. This is simply the ratio of the
total likelihood in the current iteration to that of the previous iteration.
As the models get more and more fitted to the training data in each
iteration, the training data likelihoods typically increase monotonically.
The convergence ratio is therefore a small positive number. The convergence
ratio becomes smaller and smaller as the iterations progress, since each
time the current models are a little less different from the previous ones.
Convergence ratios are data and task specific, but typical values at which
you may stop the Baum-Welch iterations for your CI training may
range from 0.1-0.001. When the models are variance-normalized, the convergence ratios are much smaller.
The executable used to run a Buam-Welch iteration is called "bw", and takes the
following example arguments for training continuous CI models:
FLAG | DESCRIPTION |
-moddeffn | model definition file for CI phones |
-ts2cbfn | this flag should be set to ".cont." if
you are training continuous models, and to
".semi." if you are training semi-continuous
models, without the double quotes |
-mixwfn | name of the file in which the
mixture-weights from the previous iteration are stored. Full path must be
provided |
-mwfloor | Floor value for the mixture weights. Any number
below the floor value is set to the floor
value. |
-tmatfn | name of the file in which
the transition matrices from the previous iteration are stored.
Full path must be provided |
-meanfn | name of the file in which
the means from the previous iteration are stored. Full path must be
provided |
-varfn | name of the file in which
the variances fromt he previous iteration are stored.
Full path must be provided |
-dictfn | Dictionary |
-fdictfn | Filler dictionary |
-ctlfn | control file |
-part | You can split the training into N equal parts by
setting a flag. If there are M utterances in your control file, then this will
enable you to run the training separately on each (M/N)th part. This
flag may be set to specify which of these parts you want to currently train
on. As an example, if your total number of parts is 3, this flag can take
one of the values 1,2 or 3 |
-npart | number of parts in which you have split
the training |
-cepdir | directory where your feature files are
stored |
-cepext | the extension that comes after the name listed
in the control file. For example, you may have a file called a/b/c.d and
may have listed a/b/c in your control file. Then this flag must be given the
argument "d", without the double quotes or the dot before it |
-lsnfn | name of the transcript file |
-accumdir | Intermediate results from each part of your training will be written in this directory. If you have T means to estimate, then
the size of the mean buffer from the current part of your training will
be T*4 bytes (say). There will likewise be a variance buffer, a buffer for
mixture weights, and a buffer for transition matrices |
-varfloor | minimum variance value allowed |
-topn | no. of gaussians to consider for
computing the likelihood of each state. For example, if you have 8
gaussians/state models and topn is 4, then the 4 most
likely gaussian are used. |
-abeam | forward beamwidth |
-bbeam | backward beamwidth |
-agc | automatic gain control |
-cmn | cepstral mean normalization |
-varnorm | variance normalization |
-meanreest | mean re-estimation |
-varreest | variance re-estimation |
-2passvar | Setting this flag to "yes" lets bw
use the previous means in the estimation of the variance. The current variance
is then estimated as E[(x - prev_mean)2]. If this flag is set to
"no" the current estimate of the means are used to estimate variances. This
requires the estimation of variance as E[x2] - (E[x])2,
an unstable estimator that sometimes results in negative estimates of the
variance due to arithmetic imprecision |
-tmatreest | re-estimate transition matrices or not |
-feat | feature configuration |
-ceplen | length of basic feature vector |
If you have run the training in many parts, or even if you have run the
training in one part, the executable for Baum-Welch described above generates
only intermediate buffer(s). The final model parameters, namely the
means, variances, mixture-weights and transition matrices, have to be
estimated using the values stored in these buffers. This is done by the
executable called "norm", which takes the following arguments:
FLAG | DESCRIPTION |
-accumdir | Intermediate buffer directory |
-feat | feature configuration |
-mixwfn | name of the file in which you want
to write the mixture weights.
Full path must be provided |
-tmatfn | name of the file in which you want to write
the transition matrices. Full path must be
provided |
-meanfn | name of the file in which you want to write
the means. Full path must be
provided |
-varfn | name of the file in which you want to write
the variances. Full path must be
provided |
-ceplen | length of basic feature vector |
If you have not re-estimated any of the model parameters in the bw step, then
the corresponding flag must be omitted from the argument given to the
norm executable. The executable will otherwise try to read a non-existent
buffer from the buffer directory and will not go through. Thus if you have
set -meanreest to be "no" in the argument for bw, then the flag -meanfn must
not be given in the argument for norm. This is useful mostly during adaptation.
Iterations of baum-welch and norm finally result CI models. The iterations
can be stopped once the likelihood on the training data converges. The
model parameters computed by norm in the final iteration are now used
to initialize the models for context-dependent phones (triphones) with
untied states. This is the next major step of the training process. We
refer to the process of training triphones HMMs with untied states as the
"CD untied training".
Nenhum comentário:
Postar um comentário