The backward gesture starts with the right hand elevated and the palm toward the face. The gesture then consists of repeated movements of the palm of the right hand toward the body.
The down gesture starts with the palm of the right hand parallel to the floor followed by repeated movement of the palm of the right hand vertically.
The forward gesture is opposite of the backward gesture. The gesture starts with the right hand elevated as the backward gesture, but the back of the hand is toward the face. The gesture then consists of repeated movements of the hand away from the body.
The left gesture starts with the right hand elevated at approximately shoulder level, and the palm of the hand is turned perpendicular to the floor and toward the body. The palm is then swept repeatedly from left to right.
The release gesture starts with the right hand elevated at approximately shoulder level, and the hand starts in a fist. The fist is then opened to a hand shaped like it is grasping a large ball. (Think of throwing something from your clasped hand.)
The right gesture is opposite of the left gesture. The right gesture starts with the right hand elevated at approximately shoulder level, and the palm of the hand is turned perpendicular to the floor and away the body. The palm is then swept repeatedly from right to left.
The stop gesture is the "international symbol" stop gesture. The hand is held above shoulder level, with the palm of the hand visible. The hand does not move.
The up gesture is opposite of the down gesture. The gesture starts with the back of the right hand parallel to the floor followed by repeated movement of the palm of the right hand vertically.
As stated before, the data was recorded on video tape using a production grade S-VHS camera. The video data was digitized from video tape, and the even fields were used for visual feature extraction. Hence the feature vector rate was determined by the frame rate: 30 Hz or 33.3 milliseconds. Again as mentioned in the speech details, the video sequence endpoints were segmented by hand. (i.e. Start times and end times for digitization were determined by my visual analysis of the video sequence.) To create features vectors that would parameterize the video sequences, a similar concept to speech analysis was utilized. Instead of spectral features, temporal features of the view of the hands were exploited. In other words, the hands motion over a period of time and the visual shape of the hand describe the gesture. For example the stop gesture seen in Figure 9 provides much more hand surface area over time than the up gesture as seen in Figure 10. The surface area varies temporally in the up gesture. Hence, temporal derivatives of location along with center of mass measurements were used as data for gesture analysis. These derivatives of location, velocity and acceleration, emphasize relative movements which are not positionally dependent in the scene. This relative measurement provides for robust detection no matter where the subject is in the display. Also, a simple measurement of hand distance from head was used to attempt to classify different gestures in which the hand appears to be the same in both gestures. The idea being that although spatially an up gesture (Figure 10) can appear very similar to a forward gesture (Figure 5), the up gesture is typically performed further from the body.
A separate program was developed as a gesture analyzer and performs the
same step, feature vector creation, as HCopy in HTK. It was created as a project for a Computer
Vision class, and a complete write up can be found in [1].
Essentially the hand and head are tracked using a ``blob tracking''
algorithm, which segments an image via skin color.
In other words, the centroid of skin regions were calculated to determine
velocities and accelerations of blobs, as well as, the major and minor axis
of a constant irradiance ellipse with a tilt. The centroid
is calculated using the standard second order equations:
| (1) |
| (2) |
![]() |
(3) |
| (4) |
For each command gesture a single test vectors was chosen from the total vector set. These chosen vectors were denoted as test vectors. See Table 2.2. for the number of training versus test vectors. The prototype HMM was trained using the vectors set aside for each gesture.
| Backward | Down | Forward | Left | |
| Train | 15 | 11 | 19 | 11 |
| Test | 1 | 1 | 1 | 1 |
| Total | 16 | 12 | 20 | 12 |
| Release | Right | Stop | Up | |
| Train | 15 | 11 | 18 | 15 |
| Test | 1 | 1 | 1 | 1 |
| Total | 16 | 12 | 19 | 16 |