TRAINING CONTINUOUS MODELS
-
First, a list of all triphones possible in the vocabulary is generated
from the dictionary. To get this complete list of triphones from the
dictionary, it is first necessary to write the list of phones in the
following format:
phone1 0 0 0 0 phone2 0 0 0 0 phone3 0 0 0 0 phone4 0 0 0 0 ...The phonelist used for the CI training must be used to generate this, and the order in which the phones are listed must be the same. Next, a temporary dictionary is generated, which has all words except the filler words (words enclosed in ++()++ ). The entry
SIL SILmust be added to this temporary dictionary, and the dictionary must be sorted in alphabetical order. The program "quick_count" provided with the SPHINX-III package can now be used to generate the list of all possible triphones from the temporary dictionary. It takes the following arguments:
FLAG | DESCRIPTION |
-q | mandatory flag to tell quick_count to consider all word pairs while constructing triphone list |
-p | formatted phonelist |
-b | temporary dictionary |
-o | output triphone list |
AA(AA,AA)s 1 AA(AA,AE)b 1 AA(AA,AO)1 1 AA(AA,AW)e 1The "1" in AA(AA,AO)1 indicates that this is a word-internal triphone. This is a carry over from Sphinx-II. The output from quick_count has to be now written into the following format:
AA AA AA s AA AA AE b AA AA AO i AA AA AW eThis can be done by simply replacing "(", ",", and ")" in the output of quick_count by a space and printing only the first four columns. While doing so, all instances of " 1" must be replaced by " i". To the top of the resulting file the list of CI phones must be appened in the following format
AA - - - AE - - - AO - - - AW - - - .. .. AA AA AA s AA AA AE b AA AA AO i AA AA AW e
For example, if the output of the quick_count is stored in a file named "quick_count.out", the following perl command will generate the phone list in the desired form. perl -nae '$F[0] =~ s/\(|\)|\,/ /g; $F[0] =~ s/1/i/g; print $F[0]; if ($F[0] =~ /\s+$/){print "i"}; print "\n"' quick_count.out The above list of triphones (and phones) is converted to the model definition file that lists all possible triphones from the dictionary. The program used from this is "mk_model_def" with the following arguments number of states per HMM
FLAG | DESCRIPTION |
-moddeffn | model definition file with all possible triphones(alltriphones_mdef)to be written |
-phonelstfn | list of all triphones |
-n_state_pm |
FLAG | DESCRIPTION |
-moddeffn | model definition file with all possible triphones(alltriphones_mdef) |
-ts2cbfn | takes the value ".cont." if you are building continuous models |
-ctlfn | control file corresponding to your training transcripts |
-lsnfn | transcript file for training |
-dictfn | training dictionary |
-fdictfn | filler dictionary |
-paramtype | write "phone" here, without the double quotes |
-segdir | /dev/null |
(param_cnt [arguments] > triphone_count_file) >&! LOGHere's an example of the output of this program
+GARBAGE+ - - - 98 +LAUGH+ - - - 29 SIL - - - 31694 AA - - - 0 AE - - - 0 ... AA AA AA s 1 AA AA AE s 0 AA AA AO s 4The final number in each row shows the number of times that particular triphone (or filler phone) has occured in the training corpus. Not that if all possible triphones of a CI phone are listed in the all_triphones.mdef the CI phone itself will have 0 counts since all instances of it would have been mapped to a triphone. This list of counted triphones is used to shortlist the triphones that have occured a minimum number (threshold) of times. The shortlisted triphones appear in the same format as the file from which they have been selected. The shortlisted triphone list has the same format as the triphone list used to generate the all_triphones.mdef. The formatted list of CI phones has to be included in this as before. So, in the earlier example, if a threshold of 4 were used, we would obtain the shortlisted triphone list as
AA - - - AE - - - AO - - - AW - - - .. .. AA AA AO s ..The threshold is adjusted such that the total number of triphones above the threshold is less that the maximum number of triphones that the system can train (or that you wish to train). It is good to train as many triphones as possible. The maximum number of triphones may however be dependent on the memory available on your machine. The logistics related to this are described in the beginning of this manual. Note that thresholding is usually done so to reduce the number of triphones, in order that the resulting models will be small enough to fit in the computer's memory. If this is not a problem, then the threshold can be set to a smaller number. If the triphone occurs too few times, however, (ie, if the threshold is too small), there will not be enough data to train the HMM state distributions properly. This would lead to poorly estimated CD untied models, which in turn may affect the decision trees which are to be built using these models in the next major step of the training.
A model definition file is now created to include only these shortlisted triphones. This is the final model definition file to be used for the CD untied training. The reduced triphone list is then to the model definition file using mk_model_def with the following arguments: number of states per HMM
FLAG | DESCRIPTION |
-moddeffn | model definition file for CD untied training |
-phonelstfn | list of shortlisted triphones |
-n_state_pm |
SIL B AE T
and specify that you want to build three state HMMs for each of these phones, and if you have one utterance listed in your transcript file:
<s> BAT A TAB </s> for which your dictionary and fillerdict entries are:
Fillerdict: <s> SIL </s> SIL
Dictionary: A AX BAT B AE T TAB T AE Bthen your CD-untied model-definition file will look like this:
# Generated by /mk_model_def on Thu Aug 10 14:57:15 2000 0.3 5 n_base 7 n_tri 48 n_state_map 36 n_tied_state 15 n_tied_ci_state 5 n_tied_tmat # # Columns definitions #base lft rt p attrib tmat ...state id's ... SIL - - - filler 0 0 1 2 N AE - - - n/a 1 3 4 5 N AX - - - n/a 2 6 7 8 N B - - - n/a 3 9 10 11 N T - - - n/a 4 12 13 14 N AE B T i n/a 1 15 16 17 N AE T B i n/a 1 18 19 20 N AX T T s n/a 2 21 22 23 N B SIL AE b n/a 3 24 25 26 N B AE SIL e n/a 3 27 28 29 N T AE AX e n/a 4 30 31 32 N T AX AE b n/a 4 33 34 35 N The # lines are simply comments. The rest of the variables mean the following: n_base : no. of CI phones (also called "base" phones), 5 here n_tri : no. of triphones , 7 in this case n_state_map : Total no. of HMM states (emitting and non-emitting) The Sphinx appends an extra terminal non-emitting state to every HMM, hence for 5+7 phones, each specified by the user to be modeled by a 3-state HMM, this number will be 12phones*4states = 48 n_tied_state: no. of states of all phones after state-sharing is done. We do not share states at this stage. Hence this number is the same as the total number of emitting states, 12*3=36 n_tied_ci_state:no. of states for your CI phones after state-sharing is done. The CI states are not shared, now or later. This number is thus again the total number of emitting CI states, 5*3=15 n_tied_tmat : The total number of transition matrices is always the same as the total number of CI phones being modeled. All triphones for a given phone share the same transition matrix. This number is thus 5. Columns definitions: The following columns are defined: base : name of each phone lft : left-context of the phone (- if none) rt : right-context of the phone (- if none) p : position of a triphone. Four position markers are supported: b = word begining triphone e = word ending triphone i = word internal triphone s = single word triphone attrib: attribute of phone. In the phone list, if the phone is "SIL", or if the phone is enclosed by "+", as in "+BANG+", these phones are interpreted as non-speech events. These are also called "filler" phones, and the attribute "filler" is assigned to each such phone. The base phones and the triphones have no special attributes, and hence are labelled as "n/a", standing for "no attribute" tmat : the id of the transition matrix associated with the phone state id's : the ids of the HMM states associated with any phone. This list is terminated by an "N" which stands for a non-emitting state. No id is assigned to it. However, it exists, and is listed.
Nenhum comentário:
Postar um comentário