MRI VOWELS IMAGE DATABASE ----------------------------------------------------------------- Contents______________________________________ A. Image Files B. Outline Files C. Speech Files D. Documentation E. Software F. Known Image Problems ---------------------------------------- A. IMAGE FILES ---------------------------------------- Image files are stored in directories of the form /////. = [ cal | f1 | f2 | m1 | m2 | m2_2 | m3 ]; = [ jpg | mr ]; = [ aa | ae | ah | eh | er | ey | ih | iy | ow | uh | uw | _loc | _palate | _cast | _castpal | (others) ]; = [ a | c | s ]; = [ 01 .. 81 ]; 1. Speaker: /cal - calibration images /f1 - images and speech of speaker F1, 3mm slice thickness /f2 - images and speech of speaker F2, 3mm slice thickness /m1 - images and speech of speaker M1, 3mm slice thickness /m2 - images and speech of speaker M2, 3mm slice thickness /m2_2 - images of speaker M2, collected with a 2mm slice thickness /m3 - images and speech of speaker M3, 3mm slice thickness 2. Format: mr - the native GE scanner image format. The MR image format uses 16 bits to code each intensity level, allowing for greater image contrast than any standard image file format. MR images can be converted to raw 8-bit grayscale images using the program provided as /src/ge2raw.c, or they can be read using the matlab routine /src/matlab/geread.m. jpg - since some people will not want to mess with image format conversions, we are also making available a version of the database with JPG-format images, which can be read by most image manipulation programs. 3. Series: Series labels which start with underscore (_) are auxiliary images. Common auxiliary series include: _loc - Sagittal locator images _palate - Images of the subject at rest, wearing an artificial palate _cast - Images of a dental cast submerged in water. _castpal - Images of the dental cast wearing the artificial palate. Other series labels are phone labels. Most are from the ARPABET phonetic alphabet; some are extensions of that alphabet. See the file 'prompts.txt' for more information. 4. Plane /a - Axial /c - Coronal /s - Sagittal 5. Number Two-digit image number, from 01 to 81. ---------------------------------------------- B. OUTLINE FILES ---------------------------------------------- The outline file found in the following directory: //otl///.otl Contains information about the following image files: /////. The OTL file format consists of a text header, followed by multi-channel little-endian short integer data. Most OTL files contain three channels: I, J, and C. I and J give the position of a point on the outline of some region of interest, and C is a color code. These are coded as follows: (I,J) = H.offset + ( (2^15/H.npixels_x) * X + (2^15/H.npixels_y) * Y) (X, Y) give the coordinates of a point on the outline of the region of interest, in pixel coordinate space. H.offset gives the offset from zero; default is 0.5. H.npixels_x and H.npixels_y are the size of the original image in the X and Y directions. Default values are (256, 256). C = H.offset + (2^15/128) * color_code. color_code is the ASCII code for one of the following: 'm' (magenta) --- Magenta points are used to mark gingival margins and peak of the palatal vault. 'c' (cyan) --- Cyan is usually (but not always) used to outline the tongue. 'g' (green) -- Green is usually (but not always) used to outline non-tongue borders of the vocal tract. Axial images use green for all VT borders, including tongue. 'y' (yellow) -- Yellow is occasionally used for non-tongue parts of the vocal tract. TOOLS FOR READING OTL FILES__________________________ These files can be manipulated in several ways: 1 - The routines /src/matlab/otlread.m and /src/matlab/otlwrite.m 2 - The function /src/otl2txt.c. 3 - Header information can be read using the NIST SPHERE library, which is found e.g. on the TIMIT CD-ROM. One of the useful pieces of information in the OTL file header is the location of each image in an absolute 3D coordinate system. This information is also provided in the MR image file header, but many users will find it more convenient to use the copy in the OTL header. The OTL header information is provided in the following fields: top_left_hand_corner___1 - R/L coordinate of the TLHC of the image top_left_hand_corner___2 - A/P coordinate of the TLHC of the image top_left_hand_corner___3 - S/I coordinate of the TLHC of the image top_right_hand_corner___1 - R/L coordinate of the TRHC of the image ... bottom_right_hand_corner___1 - R/L coordinate of the BRHC of the image ... slice_thickness - thickness of the image slice in the direction perpendicular to the plane given by the TLHC, TRHC, and BRHC. --------------------------------------- C. SPEECH WAVEFORM FILES --------------------------------------- Speech data is contained in the files /voice//. = [ prone | sitting ]; = [ 1 | 2 | 3 | 4 ]; = wav; 1. Posture: All waveforms were recorded in an acoustically silent room, using an omnidirectional microphone (most were recorded with a Shure 58). Utterances were recorded to DAT tape, and later re-digitized at a 16kHz sampling rate with 16 bits/sample resolution. Most subjects began by lying prone on a couch, with the microphone hanging 20cm above the lips. Subjects were asked to maintain each phone for as long as they were comfortably able. Each phone was repeated 3 times. Subjects were then asked to sit up on the couch, and repeat each phone 3-5 times in a normal tone of voice. 2. Format Speech file format depends on which version of the database you have. The HTML/JPG/AU database contains speech files in the AU file format. Files are sampled at 8000Hz, and stored in mu-law compressed format. If you have this version of the database, you should be able to listen to the speech files by clicking on the name of the file in your web browser. The MR/SPHERE database contains no JPG images, and no AU audio files. Instead, speech files are stored in the NIST SPHERE file format. This file format consists of a 1024-byte ASCII header, followed by little-endian short integer samples with a sampling rate of 16kHz. ------------------------------------------- D. DOCUMENTATION ------------------------------------------- Documentation for each speaker includes the files /doc/profile.txt - Demographic information /doc/series.dat - MRI image collection protocol and image numbers In addition, the top-level directory contains the files /doc/prompts.txt - Information about the speech sounds imaged /doc/imagefaq - A copy of the Medical Image Format FAQ, from the internet newsgroup alt.image.medical. -------------------------------------------- E. SOFTWARE -------------------------------------------- 1. C-Language Software The following programs are provided. They are designed for unix systems, but can probably be modified to work under other operating systems: /src/ge2raw.c - Read DPCM-compressed GE SIGNA image files, dither, quantize the 16-bit image intensities down to 8 bits per pixel using a user-selected black/white range, and output a raw 256x256-pixel image file using 8 bits per pixel. /src/otl2txt.c - Read an OTL file, and write out the X and Y coordinates and color of each outline point in an ASCII table. The user may choose whether or not to output OTL header information. 2. MATLAB Software The directory /src/matlab contains a copy of the CTMRedit tool (Hasegawa-Johnson and Cha, "CTMRedit: a Matlab-Based ROI Editor with Simultaneous Interpolated Zooming and Registration of Three Orthogonal Planes," 1999 BMES/EMBS Conference, Atlanta, GA, October 1999). This tool is designed to accomplish the following tasks: - Display MR or CT images, including a main image and two locator images. Images are smoothly interpolated, and the position of the main image on each locator image is marked with a locator line. - Edit regions of interest, either manually or using an included automatic region-growing algorithm. ROI may be labeled in up to six colors. ROI points are not constrained to pixel boundaries; since images are smoothly interpolated, manual segmentation typically has an accuracy of about 0.25 pixels. - Interpolate ROIs between image slices, creating a three-dimensional surface or solid. Linear and shape-based interpolation methods are currently supported. - Easily extensible and platform-independent. CTMRedit has been tested on Linux, Solaris, and Windows 98. The software is designed so that you can run it directly from the CDROM. Start matlab on your own computer, Add the directory [CDROM]:/src/matlab to your matlab path, and type >> cd [CDROM]:/m2/mr/aa/c >> CTMRedit Of course, running from the CDROM is a bit slower than running from the hard drive. In particular, image loading will be much faster if you copy the file /cdrom/src/matlab/geread_i.c to some other directory which is also on your path, and then type >> mex geread_i.c ---------------------------------------- F. KNOWN IMAGE PROBLEMS ---------------------------------------- Speaker f2_____________ To gather all of the images required for a coronal image stack requires typically 40-50 seconds of imaging time, depending on the imaging software used. To gather all of the images for an axial image stack usually requires even more time, depending on the number of images required. For this reason, a coronal image stack was typically collected in two breath-holds, and an axial image stack in three breath-holds. The procedure was as follows: 1. Subject holds breath, and the first set of images is collected; e.g. if two breath-holds are required, images 1,3,5,7,... are collected. 2. There is a pause, and the subject is instructed to breathe freely. 3. Subject holds breath, and the second set of images is collected. This procedure required subjects to produce vowels with identical tongue positions during each successive breath hold. Judging from the images, all subjects were able to perform this task except subject f2. The odd-numbered coronal images of subject f2 were taken during the first breath-hold. The even-numbered coronal images were taken during the second breath-hold. There is typically a tongue position difference of 2-3mm between breath-holds, and in the worst case (vowel /aa/) there is a difference of about 1cm. For this reason, the coronal images of subject f2 have been divided into odd and even sets. The axial images of subject f2 are probably similarly inconsistent, but their consistency has not yet been analyzed. Speaker m3_________________ All images of subject m3 show two large voids near the mandibular canine teeth. These voids are clearly imaging artifacts, perhaps caused by dental work containing metal or some other substance with high magnetic susceptibility. All subjects were instructed, both verbally and in writing, that they should withdraw from the study if they had metal dental fillings, but subjects were not required to undergo a confirmatory dental exam prior to imaging. The coronal images of subject m3 have been segmented as well as possible. Because of the imaging artifacts, segmentation results near the front of the mouth (approximately anterior to image slice 31) are speculative, and should not be considered reliable.