BEFORE YOU TRAIN
|THE SET OF BASE AND HIGHER ORDER FEATURE VECTORS|
- The set of difference vectors, where the component-wise difference between *some* succeeding and preceding vector(s), used to get an estimate of the slope or trend at the current time instant, are the "extension" of the current vector. These are called "delta" features. A more appropriate name would be the "trend" features.
- The set of difference vectors of difference vectors. The component-wise difference between the succeeding and preceding "delta" vectors are the "extension" of the current vector. These are called "double delta" features
- The set of difference vectors, where the component-wise difference between the n^th succeeding and n^th preceding vector are the "extension" of the current vector. These are called "long-term delta" features, differing from the "delta" features in just that they capture trends over a longer window of time.
- The vector composed of the first elements of the current vector and the first elements of some of the above "extension" vectors. This is called the "power" feature, and its dimensionality is less than or equal to the total number of feature types you consider.
In semi-continuous models, it is a usual practice to keep the identities of the base vectors and their "extension" vectors separate. Each such set is called a "feature stream". You must specify how many feature streams you want to use in your semi-continuous models and how you want them arranged. The feature-set options currently supported by the Sphinx are:
c/1..L-1/,d/1..L-1/,c/0/d/0/dd/0/,dd/1..L-1/ : read this as cepstra/second to last component,
deltacepstra/second to last component,
cepstra/first component deltacepstra/first component doubledeltacepstra/first component,
doubledeltacepstra/second to last component
This is a 4-stream feature vector used mostly in semi-continuous models. There is no particular advantage to this arrangement - any permutation would give you the same models, with parameters written in different orders.
Here's something that's not obvious from the notation used for the 4-stream feature set: the dimensionality of the 4-stream feature vector is 12cepstra+24deltas+3powerterms+12doubledeltas
the deltas are computed as the difference between the cepstra two frames removed on either side of the current frame (12 of these), followed by the difference between the cepstra four frames removed on either side of the current frame (12 of these). The power stream uses the first component of the two-frames-removed deltas, computed using C0.
(more to come....)