IDFL


Visual Perception and Eyetracking in Virtual Environments
Dr. George McConkie*, Dr. Art Kramer, Dr. David Zola (UIUC), Dr. Celestine Ntuen (NCAT),
Dr. Craig Reinhart (RSC), Dr. Jim Walrath, Mr. Robert Karsh, Mr. Ber\ ard Corona, and
Dr. Michael Benedict (ARL)

April 1 - June 30, 1996 (FY96Q3)

Goals: Complete computer program for creating multi-resolutional displays, eliminating higher spatial frequency information in the visual periphery during picture viewing.

Progress: An initial experiment has been completed in which participants carry out three visual tasks with a dynamic display in which information in the visual periphery is degraded. The tasks included a search function, attempting to remember a picture and attempting to identify objects changed in the picture. Data are currently being analyzed, and the next study in this series is being developed. The initial computer program has been completed.

July 1 - September 30, 1996 (FY96Q4)

Goals: Complete initial experiments on visual performance with multi-resolutional displays. Initiate experiments on visual attention in 3-D space.

Progress: We have completed two experiments examining human performance in search and image retention tasks with gaze-controlled multi-resolutional displays. These studies have examined perceptual efficiency and eye movement characteristics when only the area to which the observer's gaze is directed has high resolution, with less detail in the visual periphery. Effect of the size of the high-resolution area, and level of detail in the periphery, were examined. Data analysis is now underway.

A search of literature related to visual attention and visual discrimination is underway. We are seeking information about which characteristics of visual objects can be discriminated at different regions of the visual field, and which characteristics tend to draw attention.

We have initiated experiments on visual attention in 3-D space. Data will be collected in the next few months with a report forthcoming in the next fiscal year.

We have had a paper ("A Comparison of Sequential and Spatial Displays in a Complex Monitoring Task") accepted for publication in Human Factors. The paper reports the results of a study examining the efficacy of different display formats for the rapid presentation of visual data.

We have also had a paper accepted for publication which reports the results of five studies examining boundary conditions on the flexibility of visual spatial attention. It is entitled "Further Evidence for the Division of Attention Among Noncontiguous Locations" and will appear in Visual Cognition.

We have had a paper accepted for publication in the edited series Attention and Performance, a training program for the improvement of human performance in complex multi-task environments. The paper is called "Training for Executive Control".

October 1 - December 31, 1996 (FY97Q1)

Goals: Complete additional studies on visual performance with multi-resolutional displays. Complete literature review of visual attention and discrimination issues related to examining complex displays. Complete initial studies on visual attention in 3-D space.

Progress: We have conducted further analysis of eye movement data from initial experiments with multiresolutional displays. We also planned two additional studies, one to examine conditions under which peripheral degradation is detected, and the second to examine the effects of delays in updating the display in response to eye movements. The literature review is underway, but not complete. We have completed the initial studies on visual attention in 3-D space.

January 1 - March 31, 1997 (FY97Q2)

Goals: Complete initial studies on visual attention in 3-D space. Complete additional study on visual performance with multi-resolutional displays.

Progress: We have completed a series of six studies on attention and depth and have written and submitted three separate manuscripts on this topic.

Our research on attention in depth has yielded a number of important findings. First, we have found that human study participants will focus attention to a limited extent of the 3-D visual world when task-relevant information is embedded in visual clutter. In the absence of visual clutter attention is deployed broadly over depth. Second, even when attention is well focused on a specific depth plane task irrelevant information can interfere with the selection and processing of task relevant information. This especially occurs when the task-irrelevant information has similar physical features (e.g. color, form) as the task-relevant information. In current studies we are examining the geometric characteristics of visual objects that encourage both focused attention to particular regions in depth as well the division of attention across multiple depth planes.

The study on visual performance with multi-resolutional displays is currently underway, but not yet completed. The second study, which we are calling the Occasional Window study, will present windows with degradation in the periphery on selected fixations with the task being detection of the degradation.

In the previous study we had multi-resolutional windows; the variables were the size of the window with high resolution (Figure 1), and the amount of degradation of remaining window as shown in and Figure 2. During those trials, the window was on 100% of the time. A paper on this multi-resolutional display work will be presented at the American Psychological Society Meetings during May, 1997.


Figure 1: Highest Degradation (level 1), Largest Window (radius 5 degrees).


Figure 2: Intermediate Degradation (level 4), Smallest Window (radius 2 degrees)

The question arises as to how noticeable the window would be if it was not on all the time, or if the observer would fail to notice the window. This additional work will demonstrate whether the observers attention is concentrated in the center of vision, or spread over a larger portion of the field of view. We are also testing the hypothesis that a viewer is broadly attending early on in the viewing and more narrowly attending later as he/she tries to capture more detail about the picture. We hypothesize that the degraded periphery will be more noticeable early in viewing (while broadly attending), than later in viewing (while concentrating on details in the center of vision).

April 1 - June 30, 1997 (FY97Q3)

Goals: Complete additional study on visual performance with multi-resolutional displays (Reported as not complete in FY97Q2). Complete literature review of visual attention and discrimination issues related to examining complex displays.

Progress: A poster presentation was made at the American Psychological Society conference in May titled "Viewing Pictures through a Moving Window: Effects on Search Times and Eye Movements. Results of these studies of functional visual field during searches of photographic images were reported. Using eye-contingent control, a high spatial resolution 'window' was present wherever the subject was looking, with lower resolution in the periphery. Window size affected search time for objects; window size and peripheral degradation level affected eye movements. We anticipate completing the experiments and conducting further data analysis during the fourth quarter of FY97.

The literature review of visual attention and discrimination issues related to examining complex displays has been completed and is being written up.

At RSC, eye tracking is being investigated as an input mode modeled as a non-command interface. In this mode the eye tracking passively observes and monitors the user's interest as derived from patterns of fixations. This information can then be used to resolve contextual ambiguities such as 'Who is the commander of that unit?' when more than one unit is being displayed to the user.

Significant progress was made in capturing and analyzing gaze data for a non-command interface. The ability to display a test image and capture the reported gaze locations from the Iscan hardware was augmented with a fixation analysis filter. The fixation analysis is currently performed as a post-processing step but is organized to permit on-the-fly derivation of fixations from the spatial and temporal patterns of individual gaze points. As seen in Figure 3, the resulting fixations (large circles), as well as input gaze points (small dots), can be overlaid on the test scene. While the Iscan system reports fairly low rate data with much lower resolution than the Purkinje tracker, it can be both head mounted and bench mounted and does not require a bite bar. This low rate, low-resolution data does however pose challenges in accurately deriving the fixation location and duration as well as dealing with head motion.


Figure 3: Clustering of gaze locations into fixations.

Fourth quarter work will involve experimenting with different fixation filter schemes, making those filters work on-the-fly, and performing experiments to determine if idle gazing can be differentiated from task directed gazing.

July 1 - September 30, 1997 (FY97Q4)

Goals: Complete additional study on visual performance with multi-resolutional displays (incomplete in FY97Q2). Complete initial studies on use of gaze-controlled cursor for specifying objects in the visual field.

Progress: We conducted an experiment investigating the ability to detect visually degraded information in the visual periphery during selected eye fixations (Figure 4). The results are being compared with results from our earlier studies using tachistoscopic presentations. Initial analyses suggest that peripheral degradation is more detectable than originally anticipated.


Figure 4: Observers examine complex scenes with an eye-linked multiple resolution display, which produces high visual resolution only in the region to which the eyes are directed. This type of display can provide needed visual information for the user, while reducing the computational-intrusive rendering and bandwidth requirements.

The implications may be that it is possible that the retina should not be considered to simply be a low-pass filter. There may be some interaction between the filtering being done with the image to produce degraded versions, and the filtering that occurs in the visual periphery. There may also be an important distinction between whether a person is (or can be if he wishes) aware of degradation in peripheral vision, and whether it interferes with a visual task. It appears that subjects are aware of degradation of types that appear to have very little negative impact on their performance.

As an initial pilot study, we have developed the capability to allow subjects to select from alternative choices by directing their gaze at the selected alternative. This involves highlighting the alternative to which the gaze is directed in order to provide visual feedback to the user. This method has now been employed successfully in two experiments investigating memory for objects in scenes, and appears to work quite well. This work is giving us experience in the use of gaze direction as a pointing device.

October 1 - December 31, 1997 (FY98Q1)

Goals: Set up binocular eyetracking capability with high-resolution Dual Purkinje Eyetrackers. Complete initial report on how visual attention is deployed in 3-D space. Demonstrate use of free motion eye tracking using ISCAN equipment in a multi-modal display system. Examine an observer's ability to detect changes in dynamic informational displays measured by an observer's ability to quickly respond to those changes with eye movements and fixations.

Progress: The left-eye Dual Purkinje Eyetracker has been acquired to match the existing right-eye eyetracker. The frame that holds the eyetrackers has been modified to accommodate the additional equipment. This has also necessitated a complete reconstruction of the system for head stabilization. The equipment is now in place and software is being developed to calibrate the two eyetrackers and to collect data from the two eyes simultaneously (Figure 5).


Figure 5: Eyetracking equipment in use.

The ISCAN equipment is on order, but has not yet arrived, so the planned demonstration has not yet been accomplished. The company indicates that delivery should occur within the next two weeks. Delivery has been delayed because the company is mounting their eyetracking equipment on shutter glasses and on a head-mounted display that were acquired from other sources, and this process is taking longer than anticipated.

A series of experiments has been conducted examining subjects' responses to sudden onsets of objects in the stimulus display. These studies indicate that even when a subject is concentrating on a search task and trying to ignore these stimuli, and when the suddenly-appearing stimuli do not have the critical features of the search target, the subjects' eyes are still drawn to it about half the time; thus, their response to the true target was slowed substantially. Furthermore, subjects are frequently unaware that their eyes have been drawn to the appearing object.

The data from an earlier study have been analyzed and show a contrasting result. When an object appears in, or disappears from, a complex scene, with this change occurring during the time when the eyes are moving (that is, during a saccadic eye movement), this change is often not detected. Even when the critical object is large, and the eyes are on or within one degree of it, detection of the change occurs only about 60% of the time.

Thus, discrete changes in the display can either draw the observer's attention away from a target, or be missed entirely, depending on whether the change occurs while the observer's eyes are still or moving. This indicates that some care is needed in providing visual signals to observers that do not interfere with higher-priority tasks that they may be carrying out, but, at the same time, can be reliably detected when needed.

January 1 - March 31, 1998 (FY98Q2)

Goals: Demonstrate use of free motion eye tracking using ISCAN equipment in a multi-modal display system (Q198). Develop basic software for binocular eyetracking capability with high-resolution Dual Purkinje Eyetrackers. Complete a study on how quickly following a saccade the updating of multi-resolutional displays must occur to avoid disturbing visual processing. Complete a report on free motion eye tracking repeatability and accuracy with ISCAN equipment. Determine those factors that optimize an observer's ability to locate and attend to changes in dynamic informational displays.

Progress: The ISCAN eyetracking equipment is on order, but has not yet arrived, so the planned demonstration and evaluation were not accomplished. The company indicated that delivery should occur within two weeks. Delivery was delayed a second time because, by doing so, we were able to receive higher quality cameras.

Data collection is currently in process for the study investigating how quickly, following an eye movement, a multiresolutional display must be updated in order to avoid disturbing visual processing. Also, we developed basic software for doing binocular eyetracking using high-resolution Dual Purkinje Eyetrackers. This is currently at the implementation stage.

We continued analyses of data from prior studies on the use of multiresolutional displays where high-resolution information is present only in the area at which the gaze is directed during each eye fixation. The most recent study examined the degree to which observers notice the peripheral degradation. We found that while the level of peripheral degradation in a multiresolutional display does not reliably affect the time to locate objects in the display, it does largely impact the detectability of the display as multiresolutional. Thus, that a user can detect a display as multiresolutional does not necessarily indicate that having degraded information in the periphery will negatively impact his/her performance in naturalistic tasks.

We also found when the high-resolution area extends approximately 4° radius from the point of fixation with a moderate level of degradation (using 7 of 13 sets of wavelet coefficients), detection of the multiresolutional nature of the display was considerably lower (see Figure 2.2.3.3-1). The size of the high-resolution area and the level of peripheral degradation that produced these results also produced eye movement and search behavior indistinguishable from that of a completely high-resolution display. In practical terms, this suggests a combination of window size (4°) and degradation level (7 sets of wavelet coefficients) is sufficient to produce unfettered performance and minimal detection while using an eye-contingent, multiresolutional display. Two articles about this research are in progress.


Figure 6: Restricted-viewing phase experiment data.

We conducted a series of studies to examine the impact of the appearance of task-irrelevant objects (which could become relevant to the task later but were not relevant at the time of presentation) on visual search performance and eye movements. We found a substantial disruption of search performance and eye movements (with eye movements often being directed to the irrelevant-object rather than the search target) when a task-irrelevant object appeared in the display. We are currently exploring techniques that can be used to (a) utilize the attention capturing ability of abrupt onsets to direct attention to important areas of the visual display, and (b) minimize the negative effects of this form of distraction.

April 1 - June 30, 1998 (FY98Q3)

Goals: Complete a report on basic issues concerning the use of multi-resolutional displays to reduce bandwidth requirements for video displays. Evaluate the utility and accuracy of eye movement tracking using head stabilization versus using a head-mounted tracking system to allow a free range of head movement.

Progress: UIUC has now received the ISCAN eyetracker, purchased with our NSF Infrastructure grant funds. This unit has been specially modified to work with shutter glasses in a 3D virtual reality environment and to use a higher-than-normal sampling rate (120 samples per second). The head-tracking equipment that is required for use with this eyetracker has been installed on the ImmersaDesk in the Integrated Support Laboratory (ISL). The eyetracker is head-mounted, and allows the user free movement (turning, walking, standing or sitting) within a restricted area in front of the ImmersaDesk. It will track the eyes of the user, not only when looking at the ImmersaDesk, but also when looking at other displays or defined regions. A video-taped record can be made of the user's field of view, with a marker indicating the location to which gaze is being directed at each moment, in addition to the numerical data that is employed for data analysis.

The study on the rapidity of updating multiresolutional displays is currently in progress. An analysis of data initially collected for this study revealed a problem in the software, causing the experimental manipulations to be implemented incorrectly. The software was modified and tested, and data from about half of the required 28 subjects has now been collected.

The report on free motion eyetracking repeatability has been delayed by the late delivery of the ISCAN eyetracking equipment. We will begin the testing as soon as the eyetracker has been fully installed and tested.

The report on reducing bandwidth size is delayed until we complete the study (listed above) on how quickly following a saccade the updating of multiresolutional displays must occur to avoid disturbing visual processing. A full report of this line of research, together with its implications for military displays, will be completed in the next quarter.

In addition, we have continued analysis of data from past studies of multiresolutional displays, to understand how peripheral degradation affects eye movement control. Figure 7 shows one of the sample multiresolutional displays.


Figure 7: Sample multiresolutional display.

July 1 - September 30, 1998 (FY98Q4)

Goals: Complete a study on how quickly following a saccade the updating of multiresolutional displays must occur to avoid disturbing visual processing.(Q298) Complete a report on free motion eye tracking repeatability and accuracy with ISCAN equipment.(Q298) Complete a report on basic issues concerning the use of multi-resolutional displays to reduce bandwidth requirements for video displays.(Q398) Evaluate the utility and accuracy of eye movement tracking using head stabilization versus using a head-mounted tracking system to allow a free range of head movement.(Q398) Complete a report on the effects of distracting visual information on oculomotor control. Complete study of factors that attract the eyes during viewing complex displays. Demonstrate coarse fixation analysis with ISCAN equipment as input to multi-modal display system. Determine the advantages/disadvantages of using head stabilization versus using a head mounted system in monitoring dynamic informational displays.

Progress: At UIUC, we have been investigating the effects of multi-resolutional displays on perception in natural visual tasks. This is motivated by the idea that use of multi-resolutional displays, in which the center of vision is at high resolution, and the periphery is at a lower level of resolution, may be a way of economizing on bandwidth in single-user head-mounted displays. In one of our previous studies, we investigated the effects of the size of the high-resolution center (called the "window") and the degree of degradation in the low-resolution surround. We found that a dynamic multi-resolutional display having a high-resolution window with a radius of 4.1 degrees produced nearly identical performance to a display in which the entire display was in high resolution. Performance was measured in terms of the time taken to find objects in naturalistic photographs and also in terms of the viewer's fixation duration and lengths of saccades made during both the search task and a visual memory task.

An important question that remained, however, was the extent to which the speed of updating a multi-resolutional display would effect these same performance measures. We know from a previous study that viewers can detect a low-resolution image if it is not updated to a high-resolution image within the first 5 milliseconds (msec) of a fixation. Based on that finding, our previous study had a deadline for image updating at 5 msec after the beginning of a fixation. We were able to attain these high update rates by use of an extremely high temporal resolution eyetracker (the Purkinje Generation 5) and by pre-storing all possible multi-resolutional images that we would use. Unfortunately, such speed of updating a multi-resolutional display would most likely be impossible in dynamic VR systems (e.g., BattleView). Thus, we designed a study to investigate the effects of delays in updating multi-resolutional displays on viewers' performance as measured by search times and eye movement parameters in the same tasks as used in our previous studies.

The results of our study replicated our previous findings in that the 4.1 degree radius window produced search times roughly equivalent to a full high resolution display, but contrary to our previous findings we also found that greater degradation in the low-resolution periphery resulted in longer search times. We attribute the latter difference to the greater power of our current study because we included twice as many subjects. However, we found that updating delays in the range of 5-45 msec had no reliable effect on the time taken to search for objects in naturalistic photographs.

Nevertheless, we did find effects on more subtle measures of viewers' visual processing that indicate that at least a 45-msec delay in updating a multi-resolutional display may be too long. Specifically, viewers' average fixation times were reliably longer for 45 msec delays than for either 5 or 15 msec delays. Consistent with our previous results, we also found that the 4.1-degree radius window produced average fixation duration nearly equivalent to the full high-resolution condition, but that the level of degradation in the low-resolution periphery did not make a reliable difference.

These effects on average fixation duration were made clearer when we looked at the distribution of fixation duration separately for each window radius and update delay. These showed that the distribution most different from that of the full high-resolution control is for the smallest window with the longest delay (window radius =1.6 degrees, update delay = 45 msec). Conversely, the distribution most similar to that of the full high-resolution control is from the largest window and shortest update delay (window radius = 4.1 degrees, update delay = 5 msec). These results were what we had expected. To our surprise, however, we also found that there is a fairly strong similarity between the distributions for the smallest window and shortest update delay (window radius =1.6 degrees, update delay = 5 msec) and the largest window and longest update delay (window radius = 4.1 degrees, update delay = 45 msec). One way of explaining this is to say that in the latter condition (large window, long delay), the delay in window updating makes the first 45 msec of the fixation similar to the smallest window condition (1.6 degrees). This would be the case particularly when the preceding eye movement (saccade) had moved the point of regard about 2.5 degrees. Then, on the half of the visual field in the direction of the saccade, the edge of the window would be 1.6 degrees away (4.1°-2.5°=1.6°). In fact, the average saccade length preceding the 45-msec delay in the 4.1-degree radius condition is 2.53 degrees. It is interesting to note that this should only lead to the observed similarity in fixation duration if the first 45 msec of a fixation are particularly important in determining overall fixation duration. In fact, previous research (e.g., McConkie & Dyer, in preparation) indicates that processing during the early part of a fixation (e.g., the first 50-75 msec) is indeed important in determining the overall length of the fixation.

Overall, the results of this study indicate that while a delay of as much as 45 msec in updating a multi-resolutional display may not appreciably affect gross measures of performance on visual tasks (e.g., total time taken to find an object in a photograph), it nevertheless does affect visual processing (e.g., fixation duration). However, delays on the order of 15 msec seem to affect neither type of measure. We do not know from this study at what point in the range of 15-45 msec these effects on processing begin. However, we would recommend that updating delays be kept to less than 45 msec if possible when using multi-resolutional displays in single-user head-mounted displays.

At UIUC, we recently received the ISCAN eye-tracking equipment and developed an initial procedure for calibrating it. The ISCAN eye tracker is very light-weight and adapts for head movement, so it is very unrestrictive to the user's movements. To test the ISCAN system's accuracy and reliability, two subjects (LL and DR) undertook a reliability test, using a large screen that is approximately the size of NCSA's ImmersaDesk. After calibration, each subject fixated on individual points a 3 x 3 grid that was back-projected onto a partially transparent Plexiglass screen. These fixation points were placed at the outer edge of the projected image. The subjects repeated this procedure nine times, for a total of nine "trial sets." Some delays were introduced in an attempt to induce drift in the ISCAN system.

In terms of accuracy, the ISCAN system reported an average offset from the fixation points of 4.8 degrees of visual angle of the fixation points (+/- a standard error of 0.75 degrees). This amount varied by subject: LL's mean error was 5.9 degrees (+/-1.1 degrees), DR's mean error was 3.8 degrees (+/-1.0 degree). This disparity between what the subjects focused on and what the ISCAN reported was greater horizontally (mean of 0.6 degrees) than vertically (mean of 0.04 degrees). While the accuracy results reported here are small, particularly since the subjects were seated six feet away from the projected image, 4.8 degrees represents a mean error of 121.9 pixels on an 800x600 pixel, 43" by 27.5" image. This means that without improvements to the calibration procedure, each fixation is likely to be off by a standard, constant amount. This is not uncommon for eye tracking equipment. It is very likely that further improvements to our setup and calibration procedure will reduce these numbers. The most difficult part of setting up the ISCAN equipment is programming ISCAN's Line-Of-Sight tracking program for the correct plane; at present, mapping the coordinates of the displayed image into the ISCAN software is awkward. Software modifications are needed in order to facilitate this process and make the system more user-friendly for military applications.

The reliability of the system is arguably more important than its accuracy to what is displayed. If the system is reliable in its output, then subsequent analysis of the eye data can be adapted to the initial calibration, like a baseline, making the results accurate. However, this is not possible if an eye-tracking system tends to exhibit a lot of drift; should the tracking move off calibration quickly, then adjustments for non-accurate readings are much more difficult.

Fortunately, as shown in Figures 8 and 9, the ISCAN system does not appear to demonstrate much drift over a short time. Each plot shows the fixation points for each subject by trial set. In subject LL's plot, most of the readings are clustered very close together, with some difficulty on the right side of the image, where some points were not captured at all. In subject DR's plot, very similar results are seen. The problem of some missing points on the right-hand side is still an issue to be addressed. As mentioned above, it is likely that further experience with the system will iron out some of these problems. The fixation points were located at the edge of the projected image, where we are the most likely to encounter problems with the mapping or registration of the projected image into the ISCAN tracking software.


Figure 8: Fixation locations by trial set LL using the ISCAN eye tracking system.


Figure 9: Fixation locations by trial set DR using the ISCAN eye tracking system.

Overall, the results are very positive. With more data, more rigorous trials consisting of more fixation points, and more experience on the system, the ISCAN eye tracking system looks likely to be a capable component in the ISL project.

An executive summary of the basic issues concerning the use of multi-resolutional displays to reduce bandwidth is now available and will be presented at the February FedLab Conference in Maryland. The full report should be available in the near future.

As mentioned above, at UIUC, we have been working to determine the accuracy and reliability of the ISCAN head-free eyetracking system that we recently received. The above described study of the accuracy and reliability of the ISCAN was done with the head free, since this is the system's most likely use in the military, and the results show considerable promise. Our current project is to compare the accuracy and reliability of the ISCAN system when used with head stabilization (through use of a bite-bar or chin rest) and with the head free. We expect to complete this study within the very near future.

A report describing the effects of distracting visual information on oculomotor control was prepared and mailed to the Journal of Experimental Psychology: Human Perception and Performance. The authors are Jan Theeuwes, Arthur Kramer, Sowan Hahn, David Irwin and Greg Zelinsky.

Because of the emphasis put on completing our milestones related to the ISCAN and the multi-resolutional display studies, the study of factors that attract the eyes during the viewing of complex displays has not yet been conducted. We are currently in the design stage of the study and should complete it this coming quarter.

At RSC, the development of an eyetracking server is in progress. Using an ISCAN system, the intention is to allow the capability of non-platform-specific clients to request the current eye gaze position of a user from the server over a TCP socket. The ISCAN system is installed on a PC, which is connected to a second PC over a serial line. The Windows-based server software runs on the second PC and accepts the streaming eye gaze position data from the ISCAN PC. The serial communication and network interface have been implemented, and work is continuing on how to best utilize the streaming eye gaze position data from the ISCAN system to determine an accurate reading of a user's eye gaze position at a particular moment in time. Because the eye gaze position data is continually streaming and is not time-stamped, it is necessary to collect and analyze a set of data (in real-time) to determine the user's actual gaze position at a particular moment. Furthermore, invalid data due to the user blinking or drifting must be accounted for.

A precision test program has been implemented which will allow the collection of data from a procedure involving a user fixating on a sequence of predetermined screen positions. This test will give insight into the ISCAN system's capabilities, which will aid in the design of eyetracking applications. The collection of data using this testbed has begun and will continue before the data is analyzed.

At UIUC, as mentioned above, we have been working on determining the accuracy and reliability of the ISCAN head-mounted eyetracking system. We have a fixation- and saccade-parsing program that we have used with other eye tracking systems (the Purkinje Generation 3 and 5 eyetrackers). We are working at present to modify this program in order to parse the continuous streaming data from the ISCAN into fixations and saccades. Once the on-line fixation-parsing program is completed, we will coordinate with RSC to modify their eyetracking server to either give this information to the ISL multi-modal display system, or have the eyetracking server replace the sample-by-sample information from the ISCAN with the fixation based information. We expect the integration of the ISCAN with the ISL to be complete within the following quarter.

This quarter most of the energies of the UIUC Eyetracking lab have been spent in getting the ISCAN system functioning in such a way that it can be integrated into the Integration Support Laboratory (ISL). Initially, this involved the acquisition of a new head tracker for the system (the Ascension PCBird), because the head-tracking equipment that had been installed on the ImmersaDesk in the ISL was incompatible with the ISCAN system. After this, the ISCAN system was moved to a new laboratory in order to explore its use and capabilities more intensively. After extensive consultation with ISCAN, members of the laboratory have been able to get fairly accurate and reliable eye and head position data from the system. The system can now be quickly calibrated with a number of different users, including those wearing eyeglasses (frequently a problem with other eyetrackers). In addition, by combining head and eye tracking, we are currently able to track the viewer's point of regard on a test screen that encompasses 31° x 21°of visual angle horizontally and vertically. This is much wider than the limits of the eye tracking system alone (8° visual angle horizontally and vertically), but it is far from the limits of the system since the head tracker can encompass head movements in any direction on a sphere. Though current tests of the system have investigated tracking on a single defined plane, further tests will eventually include multiple defined planes such as might be seen in a situation involving multiple display systems.

We have also written several programs specifically for use with the new ISCAN system. We have developed and field-tested a program that displays an accuracy and reliability test screen and collects the corresponding eye position data from the ISCAN. Other programs currently in progress include one that parses the online data stream into fixations and saccades, and another that matches names for predefined regions of space with the current eye position and passes the names on to other systems (e.g., to indicate the viewer's area of interest, or to disambiguate referents in speech analysis programs). We expect to finish both of the latter programs within the next quarter.

A study was conducted to determine the impact of combining symbols and color on eye fixation. The pilot result shows that eye movements follow the path of maximum information gradient. The intensity of color stimuli induces attention focus, while familiar shapes increases the fixation times. Experimental results show that when objects are familiar to the viewer, the number of fixation (NF) is a function on attention focus time (FT) which has a statistical relationship: NF=0.247 +1.48FT (FT is in microseconds). This preliminary result should not be generalized since more experiments and data analysis are required for validation (18 subjects were used in the study).

October 1 - December 31, 1998 (FY99Q1)

Goals: Evaluate the utility and accuracy of eye movement tracking using head stabilization versus using a head-mounted tracking system to allow a free range of head movement. (Reported as incomplete FY98Q3). Complete study of factors that attract the eyes during viewing complex displays. Demonstrate coarse fixation analysis with ISCAN equipment as input to multi-modal display system. (Reported as incomplete FY98Q4). Demonstrate coarse fixation analysis with ISCAN equipment as input to multi-modal display system. (Reported as incomplete FY98Q4). Determine the advantages/disadvantages of using head stabilization versus using a head-mounted system in monitoring dynamic informational displays. (Reported as incomplete FY98Q4). Determine requirements and compose specification for eyetracking server based on head-mounted ISCAN system. Conduct experiments to determine the relationship between visual displays, information chunking, and eye movement response time. Examine the influence of task irrelevant visual information on the perturbation of the eye movement system.

Progress: Regarding the evaluation of the utility and accuracy of eye movement tracking using head stabilization versus using a head-mounted tracking system to allow a free range of head movement: It is well known among users of eyetracking technology that the highest spatial resolution is usually attainable when the head of the person being tracked is in some way restrained. This is typically accomplished by having the user bite onto a 'bitebar,' a dental impression of the user molded on a bar clamped to some fixture such as a table. By fixing the position of the head, the only movements made by the eyes are those within their orbits. This greatly simplifies computation of the location of the viewer's gaze since head position is constant and therefore does not need to be compensated for. The Dual Purkinje eyetracker, which has a spatial resolution to within approximately 1/16°, is a good example of an extremely high-resolution eyetracker requiring use of a bitebar. An exception to this generalization is found in the scleral search coil technique, which also has extremely high spatial resolution but does not require the use of any head restraint. Instead, however, this technique requires that the user put a special contact lens containing a small metal coil on at least one eye, and the user must stay within a metal framework containing a magnetic field. Unfortunately, neither of these methods provides enough freedom of movement or comfort for the typical user to be of practical use in human computer interaction.

As noted above, an alternative to restraining the user's head movements is to compensate for them. There are various methods of doing this, but all require that the user's head movements be tracked simultaneously with their eye movements. An example of this is found in several eyetracking systems produced by ISCAN Inc., which are designed for use with head trackers. One such system, the ISCAN RK-726PCI, is in the form of a visor worn on the head. The eyetracker uses a diachronic mirror which reflects an infrared image of one eye to a small video camera in the visor, and the head tracker (built by Ascension Technology) uses a small receiver attached to the side of the visor, together with a transmitter placed somewhere within several feet of the user in his/her environment. The spatial resolution of this system is limited in comparison to either a dual Purkinje or scleral search coil tracker, but seems sufficient for many practical applications in human computer interaction, some of which we are currently developing in our lab. The precise spatial resolution of the ISCAN system is a question we are currently investigating. Most importantly, because the head-mounted tracker allows the user a great deal of freedom of movement and is fairly comfortable to wear, it is practical for use in a wide range of human-computer interaction technologies. Nevertheless, a question remains as to how much spatial resolution is potentially lost by not using a bitebar with the system. Below we report our initial findings of a study at UIUC that addresses this question by looking at the issue of reliability or repeatability of the ISCAN system (described above) both with and without the use of a bitebar.

The study involved four participants. One wore glasses, one wore contacts, and two had uncorrected vision. The participants were asked to look at each point on a 5 x 5 array and press a button when they fixated on a point. The array was back-projected on a large screen and measured 12.5° square from the viewing distance of approximately 74", with points being separated by approximately 3°. This task was repeated a number of times (3-10) both with and without use of a bitebar.

Our preliminary results point to several conclusions. First, we find large differences in the patterns of raw eyetracker values due to inter-subject differences and to differences from one calibration to the next. Many of these differences, however, are systematic and may be compensated for when the raw eyetracker values are put through a non-linear translation program based on quadrants of the screen. This is a question we are currently looking into. The more important issue, however, concerns the differences in variability of the ISCAN in localizing viewers' point of gaze between when the viewers used a bitebar and when they were free to make head movements. Figure 10, from subject LL, shows a comparison between these two conditions.


Figure 10: Repeatability of head-mounted ISCAN on a 5 by 5 array (3 degrees separation
between points) with 3 trials using a Bitebar vs. 3 trials head Free

As can be surmised by an examination of the scatterplot, the two viewing conditions produced very similar standard deviations, suggesting that the use of a bitebar in this case did not result in an increase in spatial resolution. However, this comparison was based on only 3 trials per condition and the ISCAN was recalibrated between conditions in order to avoid drift in the calibration. As noted above, we found that different calibrations sometimes resulted in differences in the mapping of spatial locations to the ISCAN's raw values. Therefore, we ran further tests in which we alternated bitebar and head-free trials and collected 10 trials of each without recalibrating the ISCAN. The results did not show much drift across the twenty trials, but did show more of a difference between the viewing conditions. As can be seen Figure 11 (from subject DZ), the results were fairly similar, though the difference between the standard deviations in the bitebar and headfree conditions was relatively greater than in the previous comparison. This suggests that there was some benefit to using the bitebar to stabilize the head.


Figure 11: Repeatability of head-mounted ISCAN on a 5 by 5 array (3 degrees separation
between points) over 20 trials alternating between use of Bitebar vs. Head Free

Overall, our initial results are somewhat mixed. As expected, the head-free condition produced a higher degree of fixation-to-fixation variability for all four subjects than did the bitebar condition. That is, there was greater spatial uncertainty as to the location of the eyes in the head free condition. Nevertheless, overall this difference in variability (or spatial uncertainty) seems somewhat smaller than we had expected. Further analyses will quantify this relationship and test whether the differences are statistically significant. However, we feel at this time that the ISCAN system appears to be able to compensate for users' head movements sufficiently well (or the grain of the system is sufficiently coarse) that the potential gains to be made in spatial resolution through use of a bitebar are relatively small.

Regarding the completion of a study of factors that attract the eyes during viewing complex displays: At UIUC, we have developed a theoretical framework and research plan for investigating stimulus and task influences on saccade target selection in complex environments, such as those found in military displays.

Theoretical framework:

  1. Where the eyes go on each saccade results from a competition among candidate target objects. Each candidate is considered to have a particular level of attractiveness. Winning the competition is based on the pattern of attractiveness of candidates.

  2. There are various alternatives regarding how the winning candidate emerges, including first crossing a threshold, or most attractive at the time a saccade is programmed. The correct rule is not currently known and must be studied.

  3. Objects in the display serve as candidates for saccade targets. A complex display has a hierarchical part structure: that is, an object at one level can be decomposed as a set of objects at a lower level. Candidates at different levels can be in competition; this is particularly common for candidates at different retinal eccentricities. The study tests hypotheses regarding what constitutes candidate target objects, and how this varies during and between tasks.

  4. A number of factors can influence the attractiveness level of candidates. This includes:

    1. Distance of the object from the current gaze location.
    2. Clarity (amount of resolvable detail) in the object itself.
    3. Size of the object, perhaps relative to nearby objects.
    4. Level of stimulus contrast with local, and perhaps global, context.
    5. Degree of overlap of its visual features with expected features of targets of the current search.
    6. Occurrence of motion or change (onset, offset, feature change), degree of change, and whether the change occurs during fixation or saccade.
    7. Time since it was last attended (inhibition of return).
    8. Its spatial relation to the currently fixated object, relative to a scanning pattern.
    9. The observer's interest levels in the semantic category of which it is a member.
    10. Whether or not it is a member of certain high-priority categories (face; danger signal).
    11. Location of the object with respect to the observer's head/body position.

The studies will provide data that allows modeling the degree and interaction of the influences of several of these variables on saccade selection during different tasks.

Research plan:

  1. Stimuli
    1. Stimuli consist of terrain-based displays, with textures and boundary features (roads, rivers, etc) dividing them into regions. Hierarchically structured objects of different types serve as candidate objects: types include friendly, hostile and neutral (house, water tower, etc.) objects.

    2. Each category will have 2-3 subcategories of visually similar objects, thus requiring foveal examination to distinguish.

    3. Identical configurations of object locations will occur in multiple displays, but divided differently by terrain regions, with different types of objects at these locations, in order to distinguish effects resulting from different stimulus characteristics.

    4. Objects will also vary in their overlap with search target features in the search task.

  2. Tasks:
    1. Search task: Subject counts the number of objects of a specified type. Thus, objects of this type receive priority.

    2. Learning task: Subject examines the display in preparation for answering questions about the locations of objects. All objects are of equal importance to the task.

  3. Data analysis:
    1. The data are modeled using non-linear regression methods, exploring the functions relating stimulus variables to attractiveness under different assumptions about how the winner is selected. This will begin with retinal distance (inverted U-shaped function), time since object was last fixated (positive, negatively accelerated function), same or different region, feature overlap with search object, etc.

    2. Determine whether the same model will account for data from both tasks, but with different parameter estimates.

    3. Determine how search strategy can be represented in the model.

    4. Determine conditions under which it is possible to distinguish the level of the hierarchy from which saccade target candidates are being selected.

    5. Determine whether individual differences among subjects can be represented as modifiers on parameter estimates.

Product
This line of research will allow us to determine the weight which various stimulus and task factors have in influencing where the eyes are sent in a display and, hence, which areas and objects are likely to be examined early vs. later in a viewing period. This information can guide human engineering work that attempts to optimize the acquisition and use of information from displays by military personnel.

Regarding the demonstration of coarse fixation analysis with ISCAN equipment as input to multi-modal display system: At Rockwell, progress has been made on a program that takes raw ISCAN eye location data and parses it into a sequence of fixations. A fuller description of this is given below (see text below for milestone "Determine requirements and compose specifications for eyetracking server based on head-mounted ISCAN system"). The next step is to integrate the stream of fixation data into the multi-modal display system. Staff members of Dr. Robin Bargar's lab at UIUC have established a protocol and a software architecture for integrating the eye tracker within an interactive visual and sound display environment that includes speech, hand gesture and haptic control devices. Over the next quarter they will be implementing this integration.

Regarding the determination of the advantages/disadvantages of using head stabilization versus using a head mounted system in monitoring dynamic informational displays: As noted above, eyetracking technologies that make use of head stabilization (e.g., a bitebar) generally have higher spatial resolution than when the head is free to move. However, the user buys this added spatial resolution at the cost of severe constraints on movement. Such constraints rule out most practical applications, which might make use of eyetracking in human computer interaction. However, a different approach to improving an eyetracker's spatial resolution, which allows greater freedom of movement, is to compensate for the user's head movements by simultaneously gathering headtracking and eyetracking data. As noted above, the preliminary results of our investigations at UIUC of the head-mounted ISCAN RK-726PCI head and eyetracking system suggest that not much spatial resolution is gained by use of head stabilization. Furthermore, use of a head-mounted system allows the user to perform a number of important types of action which are impossible with most forms of head stabilization (e.g., use of a bitebar or chinrest), while gaining the benefits in human-computer interaction available when eye movements are monitored. With a head-mounted system, the user can move his/her head, speak, make gestures, and walk around. All of these actions will be important in the multi-modal display system being developed for the Displays Federated Laboratory project. Head movements will be necessary for surveying the ImmersaDesk informational displays (e.g., monitoring troop activities). Speech will be an important form of input to natural language speech recognition systems in issuing commands. Likewise, gesture recognition programs to cause commands to go out for action will monitor gestures. And freedom for the user to walk around the TOC will be necessary to interact with all the auxiliary resources available to the commander. With a head-mounted head/eyetracking system, all these actions can occur while the eyes are being tracked for purposes of facilitating human-computer interaction (e.g., to indicate points of interest about which the system can provide more information, to help speech recognition systems disambiguate unclear referents, and to manipulate the display with the eyes more quickly and easily than with a mouse). All the above considerations suggest that there is no need to consider the use of head stabilization with this system.

Regarding the determination of requirements and specifications for eyetracking server based on head-mounted ISCAN system: In order to provide non-platform-specific client applications the capability of quickly and easily obtaining eyetracking services from the ISCAN system, an eyetracking server is being developed at Rockwell. The ISCAN system tracks the user's pupil and corneal reflection and may return either raw pupil/corneal reflection data or calibrated eye gaze position data. This calibrated position assumes no head movement relative to the scene. The ISCAN data is continually streamed over a serial line and can be received by a second computer system. A Windows-based computer has been chosen as the second computer system that will receive the streaming ISCAN data and run the eyetracking server application. In this scenario, a client application running on any platform supporting TCP/IP may connect to the eyetracking server running on the Windows-based PC. The client may then request eyetracking data from the server. It is envisioned that various eyetracking data may be desired including raw pupil/corneal reflection data, calibrated eye gaze position assuming no head movement, calibrated eye gaze position compensating for head movement, as well as eye gaze fixation position data. As the ISCAN system does not provide for real-time fixation analysis, a fixation filter is being developed. Also, because this ISCAN system does not provide head motion compensation, a head motion compensator is being developed. These two efforts are described in more detail.

Background study on eye gaze fixation filtering has continued. Based on methods already in practice elsewhere, a fixation filter has been implemented which keeps a running standard deviation of 100ms of eye gaze position data. The filter also throws out any invalid data (appearing as [0,0]) due to the user-blinking, etc. Fixation data has been collected and is being analyzed in order to determine a suitable standard deviation threshold value. In gathering the data, a user distinctly fixates on known screen positions. This allows for the "tuning" of the fixation filter standard deviation threshold value in order to determine a fixation. Based on a small selection of data, the value seems to be between 1 and 1.5deg.

The head-mounted eyetracking system from ISCAN was returned to ISCAN and transferred to a visor mounting system. This allows much quicker mounting and dismounting of the headgear as well as increased flexibility of the eye imager-mounting angle, which aids in avoiding eyeglass reflection.

Additional data was collected for precision analysis of the ISCAN equipment without head tracking. However, it was determined that without head tracking, the data is too sensitive to any small change in the user's head orientation. Further data collection and analysis without head-tracking was abandoned, as it was clear that in order to use the ISCAN to determine eye gaze position, the user must either be immobilized, or a head motion compensator must be implemented.

The basic hardware and software from ISCAN requires that the user remains practically still when the system is being used, including the initial stage when the system is being calibrated. To alleviate this restriction, one needs to take into account possible head motion of the user and compensate for it accordingly. The fundamental working principles of the system were studied, and a new set of calibration equations was developed to improve the operational accuracy of the eyetracking apparatus. The approach assumes that the user's head position and orientation in addition to raw measurements of the eye positions relative to the user's head are known. The plan is to implement this algorithm using magnetic sensors to gather head position and orientation measurements of the user. Ultimately, the fixation filter and head motion compensator algorithms will both be implemented into the eyetracking server architecture allowing non-platform-specific clients to simply request eyetracking services.

Regarding the experiments to determine the relationship between visual displays, information chunking, and eye movement response time: NCA&T used eye movement studies to study and quantify the relationship between the display of a group of military symbols (information chunking) and the eye movement response times. Also investigated were the cognitive correspondence problem between spatio-temporal learning and eye fixation in visual space.

Eight military symbols were used in the study (see Figure 12). The subjects (six undergraduate students at NCA&T, all with 20-20 vision) were asked to fixate on the central location of the display presented on a seventeen-inch color television monitor. The subjects were given the military symbols to learn two days before they participated in the experiment.


Figure 12: The three military symbols used in the study.

The subjects participated individually on pre-trial experiments including briefing on what to do. They were told that a chunk of military symbols (not more than 3 in a group) will be randomly displayed at the upper left or right hand (at 0o or 180o) of the television. The subjects were asked to fixate on the target from the time it appears until it disappears. Four trials were conducted per subject. The subjects were then asked to recall a particular symbol and the position in the screen.

This experiment-measured attention captures (extent to which codes are detectable), detection, time, fixations and/or scan paths.

Items of interest during this experiment were the time lapse between the display of the target, the start of the eye movement in the direction of the target, and the eye's behavior when focused on the target. The eye movement behavior gives information on how the target is being processed. For example, if a chunk is similar geometrically, such as in the infantry and mechanized infantry symbols with "X", is more time fixated on the "X" or does it move around following the outline of the square. If the eye spends too much time scanning the outline of the target (the square), then the square may be distracter for the target, thus increasing the response time (this is not analyzed yet).

The stimuli presentation had five chunks with the symbols:
C1 = (Infantry)
C2 = (Mechanized Infantry)
C3 = (Armored Calvary)
C4 = (Infantry, Mechanized Infantry)
C5 = (Armored Calvary, Mechanized Infantry)
C6 = (Infantry, Mechanized Infantry, Armored Calvary)

Note: The chunk groupings C4 – C6 are based on stimulus similarity. For example, the stimulus similarity symbol in C4 = "X"; C5 = "0" with "/" inside the cycle, and C6 = ("X", C5).

Sample Results:

A. Mean Initial Latency Times versus Chunk Presentation


Figure 13: Mean Initial Latency Times versus Chunk Presentation

B. Detection Times Versus Stimulus Similarity Within Chunks

There were observed differences in detection times when chunks with the same stimulus similarity symbols were presented (C4 and C5) as opposed to chunks with different stimulus similarity symbols (C6). These results were confirmed statistically by submitting the detection times (DTs) to an analysis of variance [F(1, 24) = 19.683 > F0.05 = 4.26

C. Detection Accuracy

Although the analysis is not complete, the preliminary results seem to indicate that the presence of stimulus similarity within an information chunk induces higher detection error. This phenomenon can be attributed to stereotyping the stimulus similarity symbol in the memory during gaze and eye fixation. For example, subjects tend to refer to the "Infantry" symbols as "Mechanized Infantry" in four out of twenty three times (error probability of 0.173).

Observations:
The following observations are related to the current experimental results.

  • Human subjects tend to fixate certain parts of a pictorial stimulus, presumable, based on the object familiarity.

  • A change in fixation (gaze) location may tend to give rise to completely different percept (this was not analyzed on this experiment).

  • An increased detection time is observed as the symbol grouping (chunks) are presented. This may be attributed to: (a) distribution of attention across the symbol chunks; (b) search for familiar targets; and (c) increased complexity in matching pre-learned symbol cues with the target symbols.

Regarding the examination of the influence of task irrelevant visual information on the perturbation of the eye movement system, three papers are in press on this topic. We also have a number of additional studies underway including studies on:

  • The role of a variety of display factors (e.g. color, luminance, onsets, offsets) on the capture of covert attention as well as the eyes.

  • Examinations of the eye movement search strategies in search through cluttered displays.

  • Assessment of eye movement and attention disruption with multiple irrelevant, but salient, distracters in a visual display.

In other research, over the last quarter at UIUC we have made good progress in developing software to be used with the ISCAN equipment. We have developed a calibration program, which verbally reports to the user (using synthesized speech) the difference between the previous pair of eye positions, and whether this was within acceptable limits or another eye position sample is needed. This greatly facilitates the process of calibration by giving feedback to the user on-line. A proposal for improving the system would use a pair of musical tones, which would come into phase as the eyes move to the correct screen position. Non-speech auditory feedback can provide a high degree of resolution in both time and discretization of the calibration data. The tuning process can be intuitive, more immediate, and more accurate than linguistic signals can support, based upon the desired goal of eliminating frequency differences between the two tones.

We have also developed a program that matches the current eye position (as indicated by the ISCAN head/eyetracker) with the name associated with region of space. Identification of predefined objects would be useful for numerous purposes, e.g., to indicate the viewer's area of interest, or to disambiguate referents in speech analysis programs. The next step is to integrate this program into the multi-modal display system. At present it is a stand-alone program which uses a speech synthesizer to speak out loud the names of the fixated objects.

January 1 - March 31, 1999 (FY99Q2)

Goals: Complete an experiment on the roles of visual features, and previously obtained information about the locations of objects on directing eye movements in complex displays. Demonstrate information retrieval and display in a multi-modal system using eye gaze as an attention indicator. Evaluate variables within the dynamic display which affect locating and detecting changes that might interfere with concurrent cognitive activity (e.g., working memory). Develop eyetracking server to allow use of eye gaze as pointing device in large display virtual reality environment. Continue the experiments for determining the relationship between visual displays, information chunking and eye movement response time.

Progress: The experiment on the roles of visual features has been completed and a manuscript has been submitted for publication (H. Pringle, D. Irwin, A. Kramer, and P. Atchley. "Relationship between attention and perceptual change detection in driving scenes", submitted to Psychonomic Bulletin and Review). Previous research has shown that changes to scenes are often surprisingly hard to detect. The purpose of the research reported here was to investigate the relationship between individual differences in attention and change detection. We did this by assessing subjects' breadth of attention in a functional field of view task (FFOV) and relating this measure of attention to the speed with which individuals detected changes in scenes. We also examined how the salience, meaningfulness and eccentricity of the scene changes affected perceptual change performance. In an effort to broaden the range of individual differences in attentional breadth, both young and old adults participated in the study. A strong negative relationship was obtained between attentional breadth and the latency with which perceptual changes were detected; subjects with broader attentional windows detected changes faster. Salience and eccentricity had large effects on perceptual change detection, but meaning aided the performance of young adults only when changes also had low salience. The results are discussed in terms of the role of attention in perceptual change detection.

Figure 14 illustrated a subset of the results from this study. First, as shown in all panels' older adults were substantially slower in detecting changes in realistic scenes than were younger adults. Second, high salient changes were detected more quickly than low salient changes. Finally, the meaningfulness of the change played a fairly small role in change detection performance (and only for the young adults).


Figure 14: Response time vs. Salience

Regarding the demonstration of information retrieval and display, at UIUC, we have developed a program which takes gaze position data from an ISCAN eye/head tracking system, matches it to the object name associated with that position, and speaks the name using a DECTalk speech synthesizer. At present, the program does this whenever the user clicks a mouse, indicating that he/she wants hear the name of the chosen object. This function could be valuable in military displays in cases when the user needs information about an object in a visual display (e.g., friend or foe, identity of a unit, etc.).

Another more subtle use of the program would be to aid in disambiguation of referents in speech or gestures. For example, suppose a user wants to move a tank on a display from one location to another. If the user says "Move that tank over there" and points to the intended tank and target location, eye position information linked to a specific object identity can help disambiguate the intended referent. Clearly, the linguistic expression "that tank" is ambiguous if there are multiple tanks present in the display. Likewise, the expression "over there" is not interpretable without further information. In both cases, gesture information, for example from pointing with a wand, could aid in disambiguation of the speech. However, gesture information is only useful for this purpose to the degree that the gesture tracking is accurate. Much previous research indicates that when a person points to an object their eyes precede their hand to the object. Furthermore, the eyes' accuracy in localizing visual objects is greater than that of the hands (because pointing must be done largely on the basis of visual information from the eyes, and positional accuracy of pointing can only be inferred from information from the eyes). Thus, eye position information can greatly add to the accuracy of object identification gotten from to gesture coordinates.

Another program we have developed is one in which the user can move an object in the display using the eyes rather than a mouse. This program is still in the testing stage, however. An important but unresolved issue is how to signal that the object being looked at is one the user wants to move, and that the location the eyes move to is the location to which the user desires to move the object. At present a mouse click, and dragging and dropping the object indicate this. The ideal solution to this problem would be to use verbal commands as suggested above (e.g., "Move this "X" over here"). To accomplish this will require integration of the online eye movement data with other input (speech) and programs (speech recognition programs).

At present, both programs have been tested in a lab outside of the Integration Support Lab (ISL) at Beckman Institute. We are currently working on integrating these programs, and, more generally, eye/head tracking using the ISCAN, into a multi-modal system using speech and gesture recognition. All of the programs we have developed thus far using the ISCAN are written in assembly language to be run on PCs. Thus, we are currently working on translating the assembly code to a more platform independent language, such as C++, that can be run on the SGI machines used in the ISL, or other platforms.

An experiment has been completed in which the variables within a dynamic display that may affect cognitive activity have been determined. Previous research has shown that during visual search young and old adults eye movements are equivalently influenced by the appearance of task-irrelevant abrupt onsets. That is, when required to rapidly move their eyes to a uniquely colored object both young and old adults misdirect their gaze to task-irrelevant abrupt onsets (new objects) on a large proportion of trials. This finding of age-equivalent oculomotor capture is quite surprising in light of the abundant research which suggests that older adults exhibit poorer inhibitory control than young adults on a variety of different tasks (Zacks & Hasher, 1997). In the present study we examine the hypothesis that oculomotor capture will be age-invariant when subjects awareness of the appearance of task-irrelevant onsets is low but that older adults will have more difficulty than young adults in inhibiting reflexive eye movements to task-irrelevant onsets when awareness of these objects is high. This hypothesis was examined in two conditions; when onsets were equiluminant with other stimuli in the display and when onsets were brighter than other stimuli. Our results were consistent with the level of awareness hypothesis. Young and old adults showed equivalent patterns of oculomotor capture with equiluminant onsets while older adults misdirected their eyes more often than young adults to bright onsets. These data are discussed in terms of their implications for the nature of inhibitory processes, which underlie eye movements and visual attention.

As described in the FY99Q1 report, an eyetracking server is being developed at RSC. Currently, the server accepts requests from clients over a TCP socket. The server receives eye gaze position data over a serial connection to the ISCAN eyetracking computer and provides this data to the client over the TCP socket. A fixation filter has been developed and will be integrated into the server. Furthermore, an algorithm for head motion compensation has been developed and will be implemented. This algorithm will require 6DOF head-tracking data which may be provided by any tracking means such as magnetic, ultrasonic, or computer-vision based. These various motion trackers are being evaluated, and software is being written to interface between the motion trackers and the eyetracking server.

In experiments for determining the relationship between visual displays, information chunking, and eye movement response, NCA&T used eye movement studies to study and quantify the relationship between information recall rate, symbol size, and information density. The hypothesis tested is "Does information density and symbol size affect recall rate"? Information density (or generally, display density) is the total number of target symbols in the spatial space (e.g., a map). Information recall rate is the total number of symbols that the subject can remember per allowed time unit. Symbol size is the area, boundary, or total volume of a geometric object. Dynamic displays are target symbols that move in space and time; while static displays are symbols fixed to a spatial location and are not moveable with respect to time and space.

Eight military symbols were used in the study. The subjects (24 graduates and undergraduate students at NCA&T, all with 20-20 vision) were asked to fixate on the central location of the display presented on a 17-inch color television monitor. The subjects were given the military symbols to learn two days before they participated in the experiment.

The subjects participated individually on pre-trial experiments including briefing on what to do. They were told that a chunk of military symbols (not more than 3 in a group) will be randomly displayed at the upper left or right hand (at 0o or 180o) of the television. The subjects were asked to fixate on the target from the time it appears until it disappears. Four trials were conducted per subject. The subjects were then asked to recall a particular symbol and the position in the screen.

This experiment design consisted of presentation of selected military symbols of different sizes in two dimensions: static or dynamic. Chunk (target) sizes were either small (0.6" x 0.3") or large (1.0" x 0.5"). Targets were presented randomly on a TV monitor, and stays for 3 seconds (cue time); at most three of the targets are programmed to move after the cue time for another 5 seconds, then all of the symbols disappeared. The subjects were asked to recall the meaning of the target that were moving and the direction of the movement. The experimenter using an audio tape and TV recorder manually recorded the subject's response.

The ratio of symbol (target) density to dynamic symbol movement investigated were (12,1), (12,3), (12,3), (4,1),4(2),4(3), (8,1),(8,2),(8,3) The symbol recall rates (in msec) for each of the experimental scenarios are shown in Figure 15.


Figure 15: Recall rate differences for symbol size and target density to dynamic ratio.

Note: the results obtained are preliminary and as more data is collected, the interaction may change. All results were statistically significant at 0.05 except the target to dynamic symbol movement ratio of 4:1 whose mean response rate for small and larger symbols were the same (student t statistics = 1.06 < 2.069).

In other research, demonstration entitled "3D Audio and Eyetracking: Increasing Differentiability and Intelligibility of Multiple Simultaneous Channels of Speech" was developed by RSC and presented at the Third Annual Fed Lab Symposium. This demonstration comprises three segments. In the first segment, four channels of speech audio are displayed in stereo. One voice is completely contained in the left stereo channel. The second voice is biased to the left, the third is biased to the right, and the fourth is completely in the right channel. While this stereo display offers improved differentiability over monaural presentation of multiple simultaneous channels of speech, it is still difficult for a listener to differentiate the multiple channels. In the second demonstration segment, the four voices are spatialized in 3D. One voice is two feet to the left of the listener, the second is 1.5 feet away at 202 degrees (zero degrees being directly behind the listener), the third is 1.5 feet away at 22 degrees, and the fourth is two feet directly to the right of the listener. The sources are also moving at 4 fps, covering a six-inch left to right range centered over the aforementioned positions. The 3D spatialization and associated source movement allow the listener to perceive each source as emanating from a separate physical location, aiding in differentiability and intelligibility. In the third demonstration segment, the user wears an eyetracking apparatus and, after undergoing an eyetracking calibration routine, is able to gaze at one of four visual icons corresponding to the four voices. By clicking a mouse button, the particular voice channel the user has chosen through eye gaze is displayed in the center of his/her head, while the other three voices are pushed further away from the listener but remaining in the same relative angle to the listener. This demonstration utilizes the RSC 3DA Server as well as the eyetracking server currently under development at RSC.

April 1 - June 30, 1999 (FY99Q3)

Goals: Develop algorithms to compensate for head movement (using motion-tracking sensors) within head-mounted eyetracking system. Extend the experiment in Q1 and Q2 (determine relationship. Between visual displays, information chunking and eye movement response time) to include information detection rate and refresh rate.

Research will examine how prior knowledge combines with current visual features to direct the eyes in examining complex displays. Conduct experiments to determine the relationship between army symbol displays, information chunking, eye movement response time, and information refresh rate (NCAT).

Progress: This is a continuing study on the use of eye movement study to explore spatial coupling of attention and latency of saccades in processing high pay-off information targets in a display. The main hypothesis is:

Do targets that move in space and information presentation refresh rate have effect on saccade latency?

Dependent variable: Initial latency time was the major dependent variable. The latency of saccades is defined as the interval between the target presentation and the beginning of eye movements.

Independent variable: Two independent variables were manipulated. These are information presentation refresh rate and the dynamicity of the targets. Information refresh rate is the cue or lapse time required updating information on a computer screen. Three levels of refresh rates were used: 6oHz, 75Hz, and 90Hz. The higher the refresh frequency, the lower the screen update time. Dynamic targets are symbols or objects that move. The movement involves spatial and temporal change in positions. For this discussion, objects that moved are referred to as "Enemy targets". We limit the number of moving targets to 2. Targets can move in any geographic direction, North, Northeast, Northwest, etc. However, symbols can only move in the direction towards or away from the opposing symbol (Friendly targets). All targets used in the experiment had the same symbol size. Symbol sizes were derived from the Military FM Handbook of Symbols (FM 105-5-1, 1985). Displays using symbol sizes of 0.6" by 0.3" are considered large, and symbol sizes of 0.5" by 0.25" are considered small.

METHOD

Subjects
Twenty-seven subjects with at least 20/30 vision between the ages of 17 and 30 participated in the experiment. The subjects consisted of graduate and undergraduate students attending North Carolina A&T State University. Students earned extra credit for participation.

Apparatus
The equipment used in this experiment consisted of:

  • An IBM compatible computer, monitor and keyboard - this was used to input the different displays through to the TV Monitor.

  • A 32-inch TV Monitor - used to display the information to the participants

  • Internet Explorer 3.0 - used to run the animated displays (the scenarios were animated gifs).

  • Video Recorder - used to record the display, which illustrated the participant's eye movements.

  • ISCAN Eye Movement Monitoring System Version 2.05 - used to record and measure eye movement data.

  • PowerPoint Presentation Software - used to display symbols to participants during the preliminary test.

Systat 8.0 statistical analysis software was used to analyze the data obtained from the experiments. Figure 16 illustrates sample symbol display used.


Figure 16. Sample screen capture of the symbol display used in the study.

The subjects participated individually on pre-trial experiments including briefing on what to do. They were told that a chunk of military symbols (not more than 3 in a group) will be randomly displayed at the upper left or right hand (at 0o or 180o) of the television. The subjects were asked to fixate on the target from the time it appears until it disappears. Four trials were conducted per subject. The subjects were then asked to recall a particular symbol and the position in the screen.

This experiment design consisted of presentation of selected military symbols of different sizes in two dimensions: static or dynamic. Targets were presented randomly on a 16" TV monitor, and stays for 3 seconds (cue time); at most three of the targets are programmed to move after the cue time for another 5 seconds, then all of the symbols disappeared. Subjects were asked to recall the meaning of the target that were moving and the direction of the movement. The experimenter manually recorded the subject's response using an audio tape and TV recorder. .At each experimental trial the computer refresh parameter was reset to either 60Hz, 75Hz, or 90Hz, respectively. The ISCAN was used to record the on set of eye movements, eye fixations, and scanpaths.

Results

Initial Latency Measure (msc).
Analysis of Variance (ANOVA) with repeated measures shows a significant main effect of target dynamicity , Fcalculated = 73.21 > F(2,105, 0.05) = 3.07 (p < 0.032). Similarly, there was significantly differences in referesh rate, Fcalculated = 19.885 > F(2,105, 0.05) = 3.07 (p < 0.012). There was statistical significant in interaction between refresh rate and dynamicity, Fcalculated = 4.527 > F(4,105, 0.05) = 2.67 (p < 0.002). As shown in Figure 4, there was no significant mean differences in initial latency when 2 targets were in motion simultaneously at different refresh rates.

Attention allocation during dynamic target information processing
By using data on theaverage number of eye fixations on a target and the average fixation times, the percentage of attention distribution between static and dynamic targets were analyzed. We used dynamic ratio density as the independent variable. A dynamic ration density is the proportion of dynamic targets to static targets in a display. For example, if 1 target is static, and two targets are in motion, then the dynamic ration density (DRD) is 1/3 = 0.333. The results of this analysis are shown in Table 3. Note that when the number of dynamic targets = 0, then all 2 subjects are in static position.


Figure 17. Initial latency plots at different dynamic targets.

Table 1. Attention allocation between static and moving objects in visual space

 

 

Dynamic ratio density

Dynamic Objects

Fixed Objects

Total # Fixations

% Attention Allocation

 

Avg. # of Eye Fixation

Avg. Fix. time (msc)

Avg. # of Eye Fixation

Avg. Fix. time (msc)

 

Dynamic

Static

-

-

16

7.365

16

-

100

1/3

9

7.46

5

3.76

15

60

40

2/3

12

10.316

2

1.69

14

86

14

1

15

11.38

3

2.14

18

83

17

0b

31

20.58

-

-

31

100

-

a: all three objects in dynamic position
b: all three objects in static position

Observations:
The following are observed from the study:

  • Target dynamicity has effect on saccade latency.

  • Information referesh rate has effect on saccad latency.

  • As the human processes complex dynamic information, there is evidence of visual attention shift to new dynamic targets while the saccade is centered around objects perceptually judged to be of high saliency. Attention leads to space-based information prioritization of the attended objects.

  • As the number of simultaneous dynamic objects increases in a display , latency time seem to be the same irrespective of the information refresh rate.

At Rockwell, we have developed a method to compensate for head movement for a head-mounted eyetracking system. The method requires that a set of calibration points be collected, much the same way done previously. The main difference is that head-tracking data are simultaneously captured. We assume that a few geometric parameters are available. They include the transformation between the head-tracking receiver and the user's left eye, and that between the computer screen and the head-tracking transmitter, which can be measured offline. At any instant, the gaze vector (originated from the user's eye) is determined based on a linear (affine) model, which we estimate using the set of calibration points initially obtained, and the current eye position and orientation as determined by the head-tracking subsystem. The fixation point on the screen can then be estimated based on the intersection between the gaze vector and the 2D plane on which the computer screen lies.

For testing purposes, we captured some data as follows. The user was instructed to gaze at nine locations on a computer screen while keeping his head relatively still. It was then repeated five times at five different head orientations/positions. It was reported previously that by employing ISCAN's built-in head tracker, eyetracking accuracy was measurably improved. Our own evaluation will follow shortly, although we expect a similar level of improvement. An added advantage of our system is that we can access the same head tracking data needed for other I/O modalities such as spatialized audio.

Regarding the role of prior knowledge in examining complex displays, an important question in scene perception is regarding the roles of expectations and physical features in guiding one's attention to objects within a scene. Specifically, at what point in the perception of a scene do anomalous objects begin to draw one's attention? There are two competing views on the answer to this question. One view is that anomalous objects begin to guide one's attention to those objects from the first fixation on a scene Gordon, 1999). The other view is that such violations of expectations have no effect on guiding one's attention to anomalous objects, rather it is the physical features of those objects that guides attention and subsequently the eyes to such objects. According to the latter view, if an object is anomalous in terms of its meaning within the scene yet does not have particularly salient physical features, it may take a considerable amount of time before it is noticed, if ever.

From the above it is clear that this question has important implications for military displays because, depending on one's view, the need to physically cue anomalous objects in order to bring them to the attention of a viewer will differ. For example, if the stimulus-based view of attentional allocation is correct, an unexpected incursion of enemy forces within in an area supposedly controlled by friendly forces would have a greater need for attentional cueing than if the expectation-based view is correct.

Previous results on this question have been conflicting. Several earlier studies suggested that attention is indeed drawn to an anomalous object in a scene early on, perhaps as early as the first fixation . However, a more recent study has indicated that the stimulus-based view is correct, and has suggested that the previously reported findings were due to various methodological flaws in the designs of those studies .

A key question addressed in these studies has been how quickly the eyes are drawn to an anomalous object in a scene. The more recent studies, conducted at Michigan State University, have suggested that when a person is not actively searching for the anomalous object, the eyes will tend to go to it no more quickly than to a matched non-anomalous object in the same scene . Nevertheless a series of studies recently conducted at the University of Illinois (Gordon, 1999) using the same stimuli as at Michigan State, suggest that covert attention (attention without movement of the eyes) is indeed drawn to an anomalous object on the first fixation. It is noteworthy that while the results of these two sets of studies appear to conflict, both the dependent measures used and the time scales investigated were different. That is, one set of studies looked at an extended series of eye movements made to objects in a scene , whereas the other study looked at covert attention during only the first fixation on a scene (Gordon, 1999). Thus, there is a question as to whether these contradictory results point up an important difference between earlier and later attentional processes during scene viewing, or whether the contradictions are simply due to spurious results in one or the other set of studies.

In order to answer the above question, the current study proposes to combine the dependent measures and time scales used in both sets of earlier studies Gordon, 1999). Using the same stimuli used in both sets of previous studies, both covert attention and the movements of the eyes will be measured while viewers look at realistic scenes, which either do or do not contain an anomalous object. In Gordon's studies, covert attention was measured by looking at the time taken to discriminate which of two versions of a spatial probe ("&" vs. "%") was presented at the location of a target object following the offset of the scene. This measure of covert attention has been validated in numerous previous studies, and is based on the premise that if attention is drawn to a particular object or location in a scene, it will be easier to identify a spatial probe presented at that location immediately afterwards. In this study, we will use the probe discrimination measure at the end of the first, second, or third fixation on a scene while also measuring where the eyes travel on the first and second eye movements.

Based on Gordon's results, we expect to find that time to respond to the discrimination probe should be less when it appears at the location of an anomalous object, thus indicating that covert attention has been drawn to the anomalous object. However, based on Henderson's findings, we expect that the eyes will be no more likely to travel in the direction of the anomalous object than the matched non-anomalous object, thus indicating that overt attention, as measured by the location of a person's gaze, is not drawn by the anomalous object. Our primary question is whether both sets of predictions will be fulfilled. If so, this will require more complex models of the relationship between covert and overt attention, specifically the assumption of a tight coupling between covert attention and the movements of the eyes. If, however, one of the predictions is not fulfilled, it will help resolve the question of the relative importance of expectations and physical features in guiding visual attention in scenes.

The stimuli to be used in the experiment are the same as those used in both previous sets of studies, and thus have already been made. We are currently programming the experiment for use with the Purkinje Generation V eye tracker, which will provide us with the spatial and temporal resolution necessary to answer our research question. Data collection is expected to begin soon.

In related research at UIUC, we completed a study (Irwin et al. submitted to Vision Research) to examine the role of visual characteristics, such as the appearance of new objects, uniquely colored objects, and high luminance objects on oculomotor control. One particularly interesting result was that if an item had served as a target in one session and later became a distractor, subjects had great difficult ignoring this object. That is, there appears to be some carryover priming for task-relevant objects that are no longer relevant to the subject's task.

In other research, at Rockwell, a Logitech ultrasonic tracker (LUT) was received. The immediate application of the LUT will be in implementing head motion compensation algorithms for eyetracking. The serial communication software necessary to communicate with the LUT as well as a sockets interface for providing the data to clients over an IP network have been written. This will allow non-platform-specific client applications to exploit the 6DOF motion-tracking services of the LUT. Also, a similar server application interfacing with the ISCAN gaze tracker has been updated. Mounting fixtures for the LUT transmitter and receiver were built to allow the 6DOF motion tracking of a user wearing the ISCAN gaze tracker visor. To aid in the continuing development and validation of algorithms to compensate for head movement within the eyetracking system, a data collection application has also been developed. This application prompts the subject to fixate on a sequence of nine points on a screen for a user-specified number of different head positions/orientations. This same application will likely be used as a calibration program once the head motion compensated system has been successfully implemented. Data is being collected using the head-tracking server, eyetracking server, and calibration application simultaneously. In addition, a fixation filter application has been developed. This application processes a set of sequential eye gaze position data and determines when the user was apparently fixating rather than going through a saccade. The fixation filter has been used to determine which of the data may be useful in validating an algorithm to compensate for head motion in eyetracking. With the combination of sample numbering, time stamping, and fixation filtering data, synchronized head tracking and eyetracking data from various head positions/orientations and fixation targets may be obtained.

July 1 - September 30, 1999 (FY99Q4)

Goals: Continue the experiments for determining the relationship between visual displays, information chunking and eye movement response time. Examine the time course of the deployment of attention (and the eyes) to object and scene-based representations of the visual field. Complete experiment examining the dynamics of visual attention when examining complex displays. Demonstrate eyetracking methods for detecting attended moving objects in a display. Demonstrate use of eye gaze as pointing device in large display virtual reality environment through use of head-tracked, head-mounted eyetracking system. Extend the experiment in Q1 and Q2 (determine relationship between visual displays, information chunking, and eye movement response time) to include information detection rate and refresh rate.

Progress: We have begun an experiment (Heather Pringle's dissertation which is currently underway in the ISL) to examine eye movements and attention in the inspection of static and dynamic scenes. The experiment will examine the influence of object meaningfulness, salience, and eccentricity on search strategy for changes in complex real-world displays. The study will also examine the influence of individual differences in visual attention, working memory, executive control, and perceptual speed on perceptual change detection. Additionally, a companion study conducted by Heather Pringle and Soo-Jin Lee will examine the long term memory and representation of scenes as a function of the factors described above. These results will be reported in part at this years (March 2000) ARL Fed Lab meeting as well as at the HICS 2000 meeting in the Beckman Institute. We are pleased to now have a miniature video camera eyetracking setup to facilitate such studies.


Figure 18: The miniature video camera plus illuminators of the EyeLink eyetracker. Miniaturization of video technology is making actual use of eyetracking in real-world
environments increasingly feasible.

With regard to the experiment on the dynamics of visual attention, in earlier research we have found that if elements of a complex display (locations of some objects, object orientation, object shapes, etc.) are changed while a person is making a saccadic eye movement, the change is often not detected. During the saccade, visual resolution is so low that such changes are not detectable. Of course, if the same change occurred while the eyes were still (during an eye fixation), it would be immediately noticed. When the change occurs during the saccade, the only way a person can detect it is if information is attended after the change that conflicts with information that was acquired before the change. Thus, the lack of detection of these intra-saccadic changes indicates that people are attending to and acquiring information from fixation to fixation on a rather piecemeal basis. These studies have inspired the development of a new area of research that is referred to as 'change blindness.'

We are now employing this method to study visual attention in complex displays. Using this method, we can obtain evidence about what information is and is not being attended and stored during fixations as a person examines the display. This involves investigating the conditions under which intrasaccadic changes in complex displays are and are not detected. In our first experiment, the subjects examine random arrays of nine common objects (chosen from a set of 36 objects), in preparation for an object retention test. On a selected saccade during the 12-second viewing time, either 0, 1, 2, or 3 objects are moved, changing their spatial locations. As a secondary task, the subject is asked to report whenever he notices the movement of objects in the display. He does this by 'shooting' any object that moves: looking directly at the object and pressing a button. Thus, the data indicate when an object displacement is detected, and which object(s) were seen to move. The data indicate a detection rate less than 30%, confirming the existence of 'change blindness' in this situation. Detection is greater when more objects move, but initial analysis suggests that this is due primarily to the fact that when more objects move, the eyes are more likely to be close to a moving object on the fixation following the change. Further analysis will determine whether there is any compounding of the effects of multiple objects changing, indicating that locations of multiple objects are being monitored on fixations. When multiple objects move, detection of their movement is no different when they move in the same direction, a consistent displacement, as when they move in different directions. Further analyses are currently underway and a second study is nearing completion.

As far as demonstrating eyetracking methods for detecting attended moving objects in a display, in the 'change blindness' studies described above, people fail to detect change (moving objects) that occurs during a saccade. Prior research has demonstrated that the distracting effect of movement in some parts of the display can also produce blindness to change in other parts. This is of concern in the present project because when new information becomes available to a commander, it is usually in the form of an update, or change, in some aspect of the display. If these changes can often go unnoticed, as the 'change blindness' studies suggest, it is important that this is better understood and remedies found. Thus, the studies described above contribute both to an understanding of how people acquire information from complex displays and how they detect (or fail to detect) critical changes in these displays. As noted above, initial analyses suggest that simply having more changes occur (given a constant distance from the eyes to the nearest change) or having changes occur that are all consistent (common movement) or not consistent (motion in different directions) is not sufficient to increase their likelihood of detection. The 'change blindness' paradigm appears to be an appropriate environment for further study of this important issue.

We have finished the implementation of our head motion compensation algorithm for a head-mounted eye tracking system. An initial evaluation experiment was performed using data we collected as follows. A user sitting in front of a 17" monitor (about 30 inches away) was instructed to gaze at nine calibration points evenly spaced on the screen. The user was then instructed to rotate his head about 10 degrees to the left, followed by three more repetitions for the up, right, and down positions, when nine gaze measurements were collected again in a similar fashion. Without head motion compensation, the average rms error was found to be about 470 pixels. The display resolution was 1280 by 1024 so the error was substantial. With head motion compensation, the error was reduced by 65%.

We noticed that the improvement was significant, although the accuracy was still limited, primarily by imprecise measurements of a few geometric parameters, including the transformation between the head-tracking receiver and the user's left eye, and that between the computer screen and the head-tracking transmitter. We are planning to improve this by modifying our calibration procedure so that these parameters can be simultaneously estimated.

Client/server architectures for both the eyetracking and headtracking services are in place and will allow these technologies to be integrated into large display virtual reality environments. Once the evaluation and refinement of the head motion compensation system has been completed, it will be integrated into the eyetracking server to allow the demonstration of eye gaze as a pointing device in such a large display environment.

The following is a continuation of the experimental framework presented in the last report (Q3). The following hypotheses were further tested :

Hypothesis 1: The level of complexity of a display will affect recall accuracy of information

Result: Analysis of variance (ANOVA) was conducted using percentage accuracy data. The main effect of complexity was tested using one-way ANOVA. The result showed that information recall dues to display complexity (Fcal = 12.475 > F 0.05 = 1.67, p =0.056). The result supported the hypothesis that the levels of complexity of a display will affect the recall accuracy of information in the display. Further analysis shows that as the information complexity increases, the percentage of information recall accuracy decreases.

Hypothesis 2: The interaction of symbol size, symbol density, and display dynamicity has effect on information recall.

Result: An ANOVA was uses to analyze the results of data collected. Table 2 shows the sample result.

Table 2 ANOVA for size, density, and dynamic's affects on recall accuracy

Source

Sum of
Squares

df

Mean
Squares

F

P

Size
Density
Dynamics
size * density
size * dynamics
density * dynamics
size * density * dynamics
Error

951.162
931.274
40389.582
1816.682
747.746
655.265
7350.342
114585.257
1
2
2
2
2
4
4
468
951.162
465.637
20194.791
908.341
373.816
163.816
1837.585
244.840
3.885
1.902
82.481
3.710
1.527
.669
7.505
.049
.150
.000
.025
.218
.614
.000

As shown in Table 2, there is a significant difference between the recall accuracy of displays with different military symbol sizes and that there is a significant difference in the recall accuracy of displays with different levels of number of dynamic symbols on the display. Displays with different levels of symbol density showed no significant difference in the participant's ability to accurately recall information.

October 1 - December 31, 1999 (FY00Q1)

Goals: Conduct experiments to determine the relationship between visual displays, information chunking, and eye movement response time. Continue the experiments for determining the relationship between visual displays, information chunking and eye movement response time.(Reported as incomplete in FY99Q2) Begin preparation of a chapter for the ARL Human Factors book (to be edited by C. Wickens). Conduct a series of studies to examine the relationship between attentional flexibility and the ability to detect task-relevant changes in the environment. Begin a series of studies to examine the influence of display clutter on goal-directed and reflexive eye movements in visual search tasks. Integrate head-tracked, head-mounted eyetracking system into ISL. Research will examine user constraints in the utilization of eye movement input in human-computer interaction. Begin experiment to look at the effect of spatial uncertainty of information presentation on direction of attention in time and space.

Progress:We used eye movement studies to study and quantify the relationship between the display of a group of military symbols (information chunking) and the eye movement response times. We also investigated the cognitive correspondence problem between spatio-temporal learning and eye fixation in visual space.

Eight military symbols were used in the study (see Figure 19). The subjects (six undergraduate students at NCA&T, all with 20-20 vision) were asked to fixate on the central location of the display presented on a seventeen inch color television monitor. The subjects were given the military symbols to learn two days before they participated in the experiment.


Figure 19: The three military symbols used in the study.

The subjects participated individually on pre-trial experiments including briefing on what to do. They were told that a chunk of military symbols (not more than 3 in a group) will be randomly displayed at the upper left or right hand (at 0o or 180o) of the television. The subjects were asked to fixate on the target from the time it appears until it disappears. Four trials were conducted per subject. The subjects were then asked to recall a particular symbol and the position in the screen.

This experiment measured attention capture (extent to which codes are detectable), detection, time, fixations and/or scan paths.

Items of interest during this experiment were the time lapse between the display of the target, the start of the eye movement in the direction of the target, and the eye's behavior when focused on the target. The eye movement behavior gives information on how the target is being processed. For example, if a chunk is similar geometrically, such as in the infantry and mechanized infantry symbols with "X", is more time fixated on the "X" or does it move around following the outline of the square. If the eye spends too much time scanning the outline of the target (the square), then the square may be distracter for the target, thus increasing the response time (this is not analyzed yet).

The stimuli presentation had five chunks with the symbols:

C1 = (Infantry)
C2 = (Mechanized Infantry)
C3 = (Armored Calvary)
C4 = (Infantry, Mechanized Infantry)
C5 = (Armored Calvary, Mechanized Infantry)
C6 = (Infantry, Mechanized Infantry, Armored Calvary)

Note that the chunk groupings C4 – C6 are based on stimulus similarity. For example, the stimulus similarity symbol in C4 = "X"; C5 = "0" with "/" inside the cycle, and C6 = ("X", C5).

Sample Results:

Figure 20: Mean response Times By Chunk Group

We have begun preparing materials for the chapters that will appear in the ARL Human Factors handbook.

Several studies have been conducted examining the relationship between attentional flexibility and task relevant changes. The data has been analyzed and we are currently writing a journal article describing these results. We anticipate that the journal article will be submitted early in the second quarter.

We have completed a series of studies on the influence of display clutter and a paper (Petersen et al.) has been submitted to the Journal of Experimental Psychology: Human Perception and Performance. Additional studies on this topic are underway.

Due to a delay in the availability of funding for FY00, it was not possible to complete the integration of the head-tracked, head-mounted eye tracking system into the ISL. Furthermore, in order to comply with the ISL proposal procedure and also as a result of the confirmation of an extension through FY01Q3, this milestone has been rescheduled for completion in FY00Q4.

In the process of implementing a head-mounted eyetracking system in the Immersadesk large-format display environment, we have examined some of the constraints that occur in the utilization of eye movement input in HCI. The nature of the constraints varies according to the type of eyetracking equipment that is being used. The tolerances on these constraints vary with the nature of the stimuli and task that the participants are to perform. Following are a few of the constraints.

The first involves the degree of accuracy and repeatability of the eyetracker in relationship to the retinal size of the objects to which gaze direction must be distinguished. A second constraint is the degree of 'slippage' of the equipment over time, which determines the frequency with which the calibration of the equipment must be checked. We plan to develop a method of updating the calibration continuously during use so that no separate task for calibrating the equipment is required.

Since it is necessary to track the participants' heads in addition to their eyes, the characteristics of the head-tracking equipment become an additional constraint. Magnetic head-tracking equipment limits the range over which the participants can move, since they must be within 3-4 feet of the tracking device in order to maintain reasonable accuracy. Also, this type of device is also subject to interference by other equipment and steel structures. We are currently looking into optical head tracking as an alternative.

As people look around a large display area, they tend to make a mixture of head- and eye-movements. For comfortable use, it is necessary to be able to track the eyes when their direction deviates about 15-20 deg from directly ahead. Typically if a person's attention is drawn to an object lying more distant than that, part of the excursion will be made by a head movement, thus reducing the range over which the eyes themselves must be tracked. However, we have found that some eyetrackers have a more limited range than this, and thus were not suitable for use in the large-display environment.

Other constraints involve the conditions under which binocular eyetracking is required vs. monocular eyetracking being sufficient and the sampling speed needed in the eyetracking equipment. These depend on the type of information needed from eyetracking for the nature of the interaction taking place.

In other Rockwell research, further eye and head tracking data were collected in support of the refinement of the head motion compensation algorithm. We have made an improvement to our head motion (including translation and rotation) compensation algorithm for a head-mounted eye tracking system. Our previous implementation assumed that a few geometric parameters were given a priori. They include the transformation between the head-tracking receiver and the user's left eye, and that between the center of the computer screen and the head-tracking transmitter. The new method only used the supplied parameters as initial estimates. They were then iteratively refined using a non-linear optimization procedure with an objective function that aims at minimizing the disparity between user's gaze locations on the screen and those predicted by the calibration model. For head rotations as much as 10 degrees, the revised method reduced the rms error by up to 78%, which is an improvement over the 65% figure we obtained previously. The current software is packaged as a class object in C++ that can be easily integrated in client applications for further evaluation. This head motion compensation component is also being integrated into the RSC Eye Tracking Server.

In other UIUC work, data analysis and modeling of data from prior studies on perceptual processes with multiresolutional displays, with high resolution only where the gaze is direction, is continuing. Additionally, papers have been prepared for the ARL Federated Laboratories year 2000 Symposium which report our research.

January 1 - March 31, 2000 (FY00Q2)

Goals: Continue the experiments for determining the relationship between visual displays, information chunking and eye movement response time. Extend the experiment in Q1 and Q2 (determine relationship between visual displays, information chunking, and eye movement response time) to include information detection rate and refresh rate. Integrate head-tracked, head-mounted eyetracking system into ISL. Begin experiment to look at the effect of spatial uncertainty of information presentation on direction of attention in time and space. Complete the data collection and analyses for attention and perceptual change study. Integrate eye tracking into multi-modal HCI demonstration. Implement a multi-resolutional display using eye and head tracking with a large format display. Begin the second perceptual change/scene representation study which examines the role of memory, attention, perceptual speed, and executive control processes on the ability to rapidly to detect scene changes and construct robust mental representations of scenes. Complete analysis of study investigating detection of configuration changes in multi-object displays Investigate ability to direct eye movements by display changes. Investigate ability to direct eye movements by display changes.

Progress: Counting the amount of time between the initial movement of a dynamic symbol, and the fixation of the eye on that symbol, eye movement response latency was collected. We analyze the main effects of display complexity on eye movement response latency. According to Ross and Ross (1985), new information as contrasted to the disappearance of old, should result in the delay of a saccade. This suggests that latency should be affected by displays because the displays are dynamic. A one-way analysis of variance was done to determine if there was a significant different between the response latencies for the different levels of complexity. Table 2 shows a Fcal = 3.164 > F0.05 = 1.75; p-value of .000, which indicates that there is a significant difference between the response latencies at different levels of complexity.

Table: ANOVA for eye movement response latency at complexity levels

 

Sum of Squares

df

Mean Square

F

P

Between Groups

3.228

17

.190

3.164

.000

Within Groups

4.321

72

6.002E-02

Total

7.550

89

None of these variables show a significant effect on the participants’ eye movement response latency. There is an interaction effect of density and dynamics that yields a significant result.

Figure 21 illustrates the eye movement response latencies at the different levels of complexity.


Figure 21: Eye movement response latency based on symbol size

Figure 22 illustrates the response latencies at the different levels of object movements This graphical analysis shows that displays with three dynamic symbols elicit longer response latencies than displays with fewer moving symbols.


Figure 22: Eye movement response latency at different levels of object movements.

Figure 23 illustrates the difference in response latencies at the different density levels. This graphical analysis shows that it takes the eyes a shorter time to reach symbols when there are twelve symbols displayed, than it does when there are fewer (four and eight) symbols displayed. Both Potter (1976) and Senders (1976) have suggested that the eye tend to remain relatively fixed at the place where information is rapidly being presented in order to process new information. This suggests that the density of the display would have no affect on recall latency, because the eyes tend to fixate in places that information is being presented.

Figure 23: Difference in eye movement response latency at different density levels.

The integration of a head-tracked, head-mounted eyetracking system into the ISL has been rescheduled for FY00Q4.

The first perceptual change study has been completed and the manuscript describing the results has been accepted for publication:

Pringle, H., Irwin, D.E., Kramer, A.F. & Atchley, P. (in press). Relationship between attention and perceptual change detection in driving scenes. Psychonomic Bulletin and Review.

The integration of eyetracking into the multi-modal HCI demonstration was complete. The eye and head tracking system includes:

  • Software client (CTS) written to integrate data streams from eye and head tracking equipment and perform coordinate transformations to translate head motion-compensated eyetracking data to point-of-regard data on an arbitrary plane.

  • Rapid region definition routine written for CTS to allow user to designate regions of interest on a plane. For instance, three coplanar display surfaces can be designated, allowing discrimination based on gaze point.

  • RSC BDSocket OCX integrated into CTS, allowing communication with Media Server and remote control of CTS by Media Server. This combination allows user to designate a particular media stream by gazing at its corresponding display surface. In response, the designated display is highlighted and its corresponding audio stream is brought to the auditory foreground, spacialized in relation to the display arrangement.

  • Calibration cycle sped significantly. For an experienced user, IScan system can be calibrated in under one minute and software can be calibrated (including region definitions for CTS) in an average of 100 to 110 seconds. This brings total calibration time to under three minutes, rather than the 10 to 15 minutes previously seen. Additionally, when using only three media streams, it is possible to remove and re-don equipment without appreciable loss of discrimination ability. Furthermore, in the event that a re-donned system has become unsatisfactory, an incremental calibration to return system to working order can be accomplished in under 20 seconds.

  • Hardware upgrade for IScan system researched and purchased. IScan scene-tracking system researched and may be integrated into system at a future date.

We have implemented an initial version of an eye movement driven multi-resolutional display system using a head-mounted ASL eyetracker for monitoring the eye movements and an Immersa-desk for the large format display. This system stores many pre-processed versions of an image in the computer's memory, each with a high resolutional area in a different location. As the observer looks around the picture, the program selects which version of the image to display at any moment based on where the eyes are directed, thus continuously placing high spatial resolution information at the location at which the person is looking. Future developments will improve the accuracy of image placement and allow image updating only at the beginning of each eye fixation.

The second perceptual change/scene representation study is underway in the Integrated Systems Lab (ISL) and subject running will be completed by the end of March 2000. The results will be presented by Heather Pringle (UIUC) at the HICS conference at Beckman Institute in May.

The results of the study investigating detection of configuration changes in multi-object displays were reported in detail at the Fed Lab 2000 Symposium, with a written report available in the proceedings (McConkie & Loschky, 2000).

Kramer and his students have recently completed two different studies on the ability to direct eye movements by display changes and have submitted the manuscripts to first line psychological journals. The papers are:

Kramer, A.F., Cassavaugh, N.D., Irwin, D.E., Peterson, M.S. & Hahn, S. (submitted). Influence of single and multiple onset distractors on visual search for singleton targets. Perception and Psychophysics.

Peterson, M.S., Kramer, A.F., Irwin, D.E. & Hahn, S. (submitted). Modulation of oculomotor control by abrupt onsets during attentionally demanding visual search. Journal of Experimental Psychology: Human Perception and Performance.

Additional work has also been conducted by McConkie and Loschky, and is described below:

Spatially Distributed Rapid Serial Visual Presentation: What is an Ideal Display Rate?

Imagine the following scenario. A tank commander has just turned on a display showing a map with the identities and locations of all friendly and enemy forces in the area. The commander’s immediate task is to locate all of these as quickly as possible in order to evaluate the situation. In order to do this, he/she will need to visually search the display, noting the identities and locations of each force. However, there are at least two important problems that could occur during this process. First, he/she may fail to notice a force if the display is cluttered. Second, even if successful, the search process may take more time than is available. One way of dealing with the first problem is to simply draw attention to all forces by means of various attentional capture techniques, for example having them blink. However, if all forces blink simultaneously, the situation is little better than if none blink. In this case, it is likely that a force will be missed because of attentional interference from other blinking forces. Thus, it makes more sense that the forces be highlighted serially, one after the other, such that the commander can attend to the identity and location of each force in order to build up a mental representation of the situation as completely and quickly as possible. This method of presenting information on a display can be called a spatially distributed rapid serial visual presentation (SDRSVP).

If one presents or highlights a series of spatially separated objects serially in rapid succession, then a natural question that arises is, how well can viewers’ eyes keep up with the changes? Much research has shown that when a new object appears in a display, a viewer’s attention is captured by it, and this frequently results in a reflexive saccade to the object (e.g., . Nevertheless, common sense suggests that if serial changes occur at a rate greater than the oculomotor system is able to match, the objects will not be foveated by the viewer. And clearly at such high presentation rates, it should cause the viewer difficulty in encoding their locations and identities. Conversely, if the objects are updated too slowly, the viewer’s attention and eyes may wander elsewhere. This would suggest that the display is wasting the viewer’s time, but worse, if the viewer’s attention and eyes wander from the currently displayed object, he/she may fail to notice the next presented object, particularly if its onset occurs during an eye movement away from or back to the currently displayed object. Thus, if one wants to rapidly serially present information across a display, there must be some ideal presentation rate (or a range of rates) suited to the needs of the cognitive and oculomotor systems.

The current study took as its starting point a simple memory task, in which viewers had to look at an array of everyday real world objects within a scene (e.g., tools on a workbench) and try to remember the location and identity of each object. After a viewer had looked at a certain number of objects, he/she was given a memory test for the identity of an object that had appeared at a given location. This task is simple to implement as a SDRSVP display, but rather than arbitrarily coming up with timing parameters for use in a SDRSVP version of this task, we decided to use actual eye movement parameters from real subjects performing the task under free viewing conditions. Thus, we used eye movement data from a previous study in which 6 subjects had performed this task while freely viewing a display in which all the objects were presented simultaneously. From this data, we were able to say which objects had been looked at, for how long, in what order, for each array of objects for each subject. This information was used to create SDRSVP displays for new subjects doing the same task. In this way, we could investigate the effects of timing parameters in such a SDRSVP display on oculomotor performance in this task.

In the SDRSVP version of the memory task, subjects would be asked to look at each object as it appeared on the screen. The stimuli would be presented in the same sequences, and for the same respective durations that previous subjects had looked at those stimuli when they were all simultaneously present. Nevertheless, there are complicating factors that must be dealt with in translating eyetracking data from a previous study into a SDRSVP stimulus presentation. These complications arise from basic constraints in the way the eyes move from one object to another. Since the eyes only take in detailed information when they are fixating, it might make sense to equate stimulus durations with gaze durations, which are the sum of all consecutive fixation durations on a given object. While this could provide a reasonable base duration for an SDRSVP stimulus, one would also need to account for the time the eyes take to get from object to object. To begin with, in order to move the eyes to a new location, the visual system must first select an object as a saccade target. A good way to ensure that the desired object becomes the next saccade target is to make it appear suddenly on the screen, since much research suggests that this grabs attention and often results in an eye movement directed to the object . Of course, under normal viewing conditions, the targeting of a saccade takes place while the eyes are fixating on the current object, so a natural SDRSVP display would have the next object appear while the current object is still being displayed. Since we know that under normal conditions, the least amount of time needed to prepare a saccade is about 150 ms , it makes sense that the next object appear on the screen 150 ms before the current object is erased. Finally, we also need to allow time for the eyes to get to the next target once they have taken flight. Much eye movement research indicates that the average saccade duration is 50 ms , and studies in our own laboratory have corroborated this. The above information forms the basis of a first attempt at creating a SDRSVP paradigm that makes use of normal eye movement processes:

  1. Stimuli would be flashed on the screen one at a time, which should grab attention and start the saccade programming process.
  2. The stimuli would be temporally overlapped, however, such that the next stimulus would appear on the screen at least 150 ms prior to the erasure of the current stimulus.
  3. An additional 50 ms would be added to the stimulus duration to give the eyes time to arrive at the object.
  4. The remainder of the stimulus duration would be based on the gaze duration of a previous subject viewing that same stimulus under the same task conditions.

In order to sample a wide range of gaze durations from the original data set, two blocks of 20 trials each from each original subject were used with the new subjects. These 12 blocks (6 subjects x 2 blocks each) were then randomly ordered and presented to the new subjects.

The specific research questions we addressed in this study were as follows:

  1. How well did the predicted timing parameters match the observed data?
    1. Specifically, how close were saccadic latencies to the predicted 150 ms, and how close were the saccadic durations to the predicted 50 ms?

  2. How did the stimulus durations affect the ability of the eyes to keep up with the stimuli?
    1. Specifically, what was the effect on the difference between predicted and observed gaze ending times?
    2. Also, what was the effect on the likelihood of the eyes missing a stimulus?

Method

In the current study, 6 new subjects took part in a SDRSVP version of memory task while their eyes were tracked using a Dual Purkinje, Gen 5 eyetracker. The scenes measured 756 x 486 pixels, with all objects fitting within a 100 x 100 pixel box. Thus, at the viewing distances used, the scenes measured 18° x 12° and with all objects fitting within a 2.4° square box.

Results

Match Between Predicted and Observed Timing Parameters

In general, we found that the values chosen for the saccadic latency and duration seem to have been quite accurate. The simplest case is for the saccadic durations. Here we found that the modal saccadic duration was 51 ms, the mean was 50 ms, and the median was 49 ms. Thus, adding the predicted value of 50 ms to stimulus durations in order to allow time for the viewer’s eyes to arrive at the next stimulus seems to have been well justified.

With regard to saccadic latencies, the results are somewhat more complex. Looking only at those cases in which the eyes went directly to the onset target, we find that the modal saccadic latency was almost exactly 150 ms, the predicted value. Histograms of these saccadic latencies make it clear that this mode is far and away the most frequent time at which the eyes left for the next target. However, the mean and median saccadic latencies were somewhat longer, being 218 and 180 ms respectively, indicating that the saccadic latency distribution is not normal like that of saccadic durations. In fact, saccadic latency distributions are always positively skewed, so it is expected that we will find a mean greater than the mode. This is shown in Figure 24.


Figure 24: Percent of saccadic latencies to the onset targets.

Of interest, however, is the fact that the mean saccadic latency is lengthened by a secondary mode at 300 ms after the onset of the next target. Thus, while the primary mode occurred 150 ms after the onset of the next target, the secondary mode occurred another 150 ms after the offset of the currently fixated target. One explanation for the secondary mode is that it represents a different type of saccadic programming than that evidenced by the 150 ms mode. According to this argument, the secondary mode at 300 ms post onset, would represent planned eye movements, while the mode at 150 ms post onset would represent reflexive saccades. It is not clear, however, what would motivate a subject to begin programming a planned saccade from the moment the target appeared, when programming a reflexive saccade would be much easier. Thus, a simpler explanation of the secondary mode is that it represents cases in which subjects chose to ignore the onset of the next target until the currently fixated target had been erased, in order to continue processing it. Once the current target was erased, the saccade to the next target was programmed, and it was executed 150 ms later. Evidence consistent with this explanation would be found if the secondary mode was linked to short stimulus durations. If this is the case, then it may be possible to eliminate the second mode by lengthening the stimulus duration.

Effects of Stimulus Durations on Eye Movements

Stimulus durations and the likelihood of gazes ending on time. Another way of discussing saccadic latency is within the broader context of when the eyes leave the current stimulus. This takes into account the fact that viewers may make eye movements away from the current stimulus before the next stimulus appears on the screen (especially if the current stimulus duration is too long). Thus, rather than measure the latency between the onset of the next stimulus and the saccade to it, we can measure the difference between the time the eyes are observed to leave the current stimulus and the time they were predicted to leave (i.e., 150 ms after the next stimulus onset, and 0 ms after the current stimulus offset). In this analysis, then, a difference between the predicted and observed end of gaze of 0 ms would be the same as a 150 ms saccadic latency to the next stimulus onset. Such an analysis allows us to determine whether the eyes left early, on time, or late, in a way that is meaningfully related to the duration of the current stimulus. In order to analyze the effects of the stimulus duration separately from the constant 200 ms added for saccadic latency and saccadic duration, all stimulus durations discussed hereafter have already had subtracted from them the extra 200 ms. Thus, the term "stimulus duration" as discussed hereafter actually refers to the predicted foveal stimulus duration. This represents that part of the total stimulus duration during which viewers are predicted to gaze on the stimulus (after programming a saccade and sending their eyes to it).

In order to make the relationship between stimulus duration and gaze ending time difference clearer, we broke down the gaze ending time difference data into three categories: gazes that ended early, on time, or late. This was based on the criterion that a gaze ending on time ended within plus or minus 49 ms of the predicted ending time (i.e., when the stimulus was erased from the screen). The thinking behind this criterion was that it is conceivable that the combined saccadic latency and duration, which we predicted to 200 ms for a normal subject, might be as low as 150 ms in some cases, or as long as 250 ms in some others, and still arrive at the next object on time. We then found the percentage of cases within each category ("left early," "on time," or "left late") for each stimulus duration. Finally, we found the relative proportion of cases across these three categories at each stimulus duration, giving us the relative proportion of cases in each of the 3 categories as a function of the current stimulus duration. The results are shown in Figure 25.


Figure 25: Relative proportion of gazes extending earlier, on time, or later than predicted as a function of stimulus duration (-200 ms). (On time gaze endings-+/-49 ms from stimulus offset)

Several things are apparent from this graph. First, we see that the relative proportion of cases in which the eyes left late decreases dramatically until stimulus durations of about 650 ms, after which the proportion stabilizes at about 20%. Thus, if we are chiefly concerned with reducing the number of times the eyes leave late, a stimulus duration of at least 650 ms may be necessary. Furthermore, the fact that the relative proportion of cases leaving late drops so precipitously with increasing stimulus durations also supports the claim that the secondary mode in the saccadic latency distribution may have due to ignoring the onset stimulus in order to continue processing the current stimulus.

Secondly, we see that the relative proportion of cases in which the eyes leave the stimulus early gradually rises until it reaches a peak at a stimulus duration of about 950 ms, after which it remains steady. This suggests conversely, that the in order to avoid cases in which the eyes wander off of the stimulus before it is erased, we should avoid stimulus durations of 950 ms or greater. This, then, becomes an upper bound for stimulus durations.

In order to come up with an ideal stimulus duration at which both late and early gaze endings are minimized, we can look at the point where the relative proportions of late gaze endings and early gaze endings cross. We see that this occurs about midway between the 450 and 550 ms stimulus duration bins, i.e., at a stimulus duration of about 500 ms. We also see that the peak relative proportion of cases in which the eyes leave on time is for stimulus durations of between 250-550 ms. Thus, a stimulus duration of roughly 500 ms appears to result in the highest proportion of cases in which the eyes neither leave too late nor too early. Unfortunately, even at this ideal stimulus duration, we also see that only about 40% of cases meet this relatively strict criterion for a gaze ending "on time."

This problem is even clearer when we look at the actual percentages, across the three categories, of cases that fall within the "late," "on time" or "early" gaze endings. This is shown in Figure 26. There are two important points can be seen in Figure 26. First, there is a strong effect of stimulus duration on the proportion of cases that fall in the "late," "on time," or "early" categories. And this effect can be seen most clearly between the "late" and "on time" categories until around the 450-500 ms stimulus durations. Thus, we earlier found the ideal stimulus duration to be within this range. However, thereafter, stimulus duration has very little effect on the probability of a gaze ending "late." This indicates that some other factor than the current stimulus duration is influencing the likelihood of ending a gaze late. (Recall that "stimulus duration" here refers to the predicted foveal stimulus duration.) Related to this is the second key point, namely that with this strict criterion for a gaze ending "on time," the probability of ending a gaze "late" is always higher than that of ending "on time." Since stimulus length cannot explain this discrepancy, this suggests that not enough time has been given to viewers for planning the saccade to the next target. This is not surprising since, although the modal saccadic latency of 150 ms matched the amount of time allotted for it, the mean latency was about 220 ms, an increase of 70 ms. Likewise, we find that the mean difference between predicted and actual gaze ending times is +100 ms. Thus, it may be necessary to increase the constant amount of time allotted for saccadic latency and duration from 200 ms to 270 or 300 ms.


Figure 26: Percentage of gazes ending earlier, on time, or later than predicted as a function of stimulus duration (-200 ms). (On time gaze endings=+/-49 ms from stimulus offset)

Stimulus durations and the likelihood of missing a stimulus. A different criterion for selecting an ideal stimulus duration is that it minimizes the likelihood that the eyes will never arrive at the stimulus. In order to determine whether viewers missed a stimulus, we set a criterion that the eyes must arrive at some point during the stimulus presentation within a 100 x 100 pixel box that defined the furthest extent of any object. This criterion was therefore that the eyes should land within 1.2° vertically or horizontally from the center of an object. While it is possible that some cases labeled "missed" were simply slightly outside of this box, inspection of the data indicated many cases in which the eyes never moved in the direction of an object at all. And, indeed, this proved to be a problem in the task, as the average rate of missing targets was about 30% across all subjects. This is despite the fact that subjects were explicitly told that it was important that they try to fixate every object that appeared on the screen.

In this analysis, we looked at the probability that viewers’ eyes would completely miss the stimulus as a function of stimulus duration. As in previous analyses of the effect of stimulus duration, all stimulus durations reported hereafter have had already subtracted from them the extra 200 ms that was added for planning and executing a saccade to it.

In considering the current task, there are at least two simple reasons why the duration of a stimulus could lead to the viewer’s eyes never arriving at it or the next stimulus. The first is that the duration of the target stimulus is too short, and thus the viewer does not have enough time to send their eyes to it before it is erased from the screen (even after giving the viewer 200 ms to plan and execute a saccade to the target). By plotting the proportion of objects missed as a function of the duration of the target stimulus, we found this was indeed the case.

As shown in Figure 27, for stimuli whose duration was within the shortest range, the bin centered on 50 ms, we found that the overall probability of missing the stimulus was roughly 55%. That is, given that a viewer had 200 ms to plan and execute a saccade to a target, if that target was then erased 50 ms later, there was a slightly better than 50/50 chance that the viewer’s eyes would miss the stimulus. This probability of missing the stimulus steadily dropped as the stimulus duration increased, reaching an asymptote at about 10% misses for stimulus durations of 850 ms. (In fact the probability of missing the stimulus dropped to 0 at stimulus durations of around 1400-1500 ms, but the numbers of observations at these stimulus durations were small and thus may be unreliable.) If we take the asymptote at 10% as our cut off, however, we see that by stimulus durations of 650 ms, the probability of missing the stimulus is about 13%, which is quite close. At a stimulus duration of 450 ms, however, the probability of missing the target is still as high as 24-25%, which is seems too high.


Figure 27: Proportion of stimuli completely missed as a function of the target stimulus duration (-200ms)

A second way in which a stimulus duration could plausibly affect the probability of missing a target is if the currently fixated stimulus (not the next target) is too short. We know from the earlier analyses of the effect of stimulus duration on the difference between expected and observed gaze ending times that if a stimulus duration is too short, it is very likely that the eyes will be late in leaving it. Thus, it is reasonable to expect that the likelihood of missing the target stimulus will vary as a function of the duration of the preceding stimulus.

In order to look at this effect, we first selected cases in which the viewer had arrived at the preceding stimulus neither too early nor too late (in this case, within –45 ms and +50 ms of the expected arrival time). By only including cases in which the viewer’s eyes had arrived at the preceding stimulus on time, we could then get a clearer picture of the effect of the preceding stimulus duration on missing the next stimulus.

As shown in Figure 28, the results we found were rather similar to those in our preceding analysis. That is, for the shortest stimulus durations (the bin centered on 50 ms), we found that the probability of missing the following stimulus was about 32%. The probability of missing the following stimulus dropped steadily as a function of the preceding stimulus duration until it reached asymptote at 10% for stimulus durations of 850 ms. (As with the previous analysis, we found the probability of missing the following stimulus continued to drop further at even longer preceding stimulus durations, but the number of cases also became increasingly small, and thus unreliable.) For preceding stimulus durations of 650 ms, the probability of missing the following stimulus was about 15%, and the probability was roughly 20% for preceding stimuli of 450 ms duration.


Figure 28: Proportion of stimuli completely missed as a function of the preceding stimulus duration (-200ms) contingent on having arrived on time at the preceding stimulus.

In summary, the analyses of the likelihood of missing a target stimulus as a function of the duration of either the target itself, or the preceding stimulus, showed that only at rather long durations, between 650-850 ms, is the likelihood of missing a stimulus minimized. As expected, this is the range of stimulus durations that also minimized the likelihood of ending a gaze late, though these two measures were based on entirely different sets of cases (i.e., cases that arrived at the stimulus, versus those that did not). As shown earlier, however, at such long stimulus durations, the likelihood of the viewer’s eyes wandering from the stimulus was increased.

Conclusions

Altogether, the above analyses suggest the following conclusions for the timing parameters used for stimulus durations in SDRSVP displays. First, we find that the predicted amount of time needed by viewers to plan a saccade and execute it, 150 ms for the former, and 50 ms for the latter, are good estimates, but may be a bit short. Rather, the combined constant saccadic planning and execution time may need to be increased from 200 ms to 270 or 300 ms. This should increase the proportion of cases in which the eyes arrive at the next stimulus on time.

Second, the portion of the total stimulus time which is allotted for the viewers to gaze on the stimulus, the predicted foveal stimulus duration, should be no lower than 450 ms, and no higher than 850 ms. Below the lower bound of 450 ms, the likelihood of the eyes arriving late and/or completely missing the stimulus is greatly increased. Above the upper bound, the likelihood of being late or missing the stimulus is virtually unchanged, but the likelihood of the eyes wandering off of the stimulus is greatly increased.

This study has left several unanswered questions, however. One variable that seems very likely to affect performance in this task is variability in stimulus duration. In the current study, stimulus durations varied from one stimulus to the next based upon the actual gaze durations of previous subjects doing the same task in free viewing conditions. The benefit of using stimulus durations that varied in this way is that it gave us a wide range to look at the effects of. Furthermore, this range was assumedly a natural one for the given task. On the other hand, it is likely that this variability in stimulus durations created uncertainty in the subjects as to an appropriate processing strategy to adopt with the stimuli, and this may have led to greater variability in their saccadic latencies, and increased the probability of missing stimuli. Thus, an important question is whether holding stimulus durations constant will reduce the variability in viewers’ eye movement performance. Currently, we are planning another study in which we plan to compare both oculomotor and memory performance in the SDRSVP memory task at different constant stimulus durations (e.g., at 250, 450, 650, and 850 ms, plus 200-300 ms for saccade planning and execution). It is hoped that this further investigation will provide a clearer picture of the ideal stimulus duration for such tasks. Given that such an ideal stimulus duration can be found, this may serve as the basis for tests with military personnel using SDRSVP as a method of rapidly presenting information on the locations of important objects in battlefield displays.

References:

  1. Gezek, S. Fischer, B., & Timmer, J. (1997). Saccadic reaction times: A statistical analysis of multimodal distributions. Vision Research, 37(15), 2119-2131.
  2. Hallet, P. E. (1986). Eye movements. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of perception and performance (Vol. I, pp. 10-1-10-112). New York, USA: Wiley & Sons.
  3. Irwin, D. E., Colcombe, A. M., Kramer, A. F., & Hahn, S. (1999). Attentional and oculomotor capture by onset, luminance, and color singletons. Manuscript submitted for publication, University of Illinois at Urbana-Champaign.
  4. Kramer, A. F., Hahn, S., Irwin, D. E., & Theeuwes, J. (1999). Age differences in the control behavior.: Do you know where your eyes have been? Manuscript submitted for publication, University of Illinois at Urbana-Champaign.
  5. Theeuwes, J. Kramer, A. F., Hahn, S., & Irwin, D. E. (1998). Our eyes do not always go where we want them to go: Capture of the eyes by new objects. Psychological Science, 9(5), 379-385.
  6. Theeuwes, J., Kramer, A. F., Hahn, S., Irwin, D. E., & Zelinsky, G.J. (in press). Influence of attentional capture on oculomotor control. Journal of Experimental Psychology: Human Perception and Performance.

April 1 - June 30, 2000 (FY00Q3)

Goals: Integrate head-tracked, head-mounted eyetracking system into ISL. (Rockwell) (Rescheduled for FY00Q4.) Begin experiment to look at the effect of spatial uncertainty of information presentation on direction of attention in time and space. (NCA&T) (Reported as incomplete FY00Q2.) Use the data collected in experiments conducted in Q1 & Q2 to determine scan path latency and risks associated with information location uncertainty. (NCAT) Refine eye tracking server communications protocol. (Rockwell) Use the data collect in the experiments conducted in Q1 and Q2 to determine scanpath latency and risks associated with information location uncertainty. (NCA&T) Continue to collect data for the second perceptual change study. (UIUC) Complete revision on the journal article which describes the results of the first perceptual change study. (UIUC) Conduct a study, which focuses on the relationship between perceptual, and memory aspects of scene representation. (UIUC) Conduct a study that investigates the short-term retention of object and location information attended while viewing a multi-object display. (UIUC)

Progress: The data collected in experiments conducted in Q1 & Q2 to determine scan path latency and risks associated with information location uncertainty is being analyzed. The data needs extensive re-scaling before any useful result can be reported. We hope to finish this in the next quarter. The result will help determine the level of visual information processing performance under uncertain and dynamic information.

The RSC Eye Tracking Server was updated to be compatible with the BDS4 client/server networking protocol. Furthermore, the Eye Tracking Server's application protocol has been expanded to serve head motion compensated eye tracking data, non head motion compensated eye tracking data, and head tracking data.

The data from experiments conducted in Q1 and Q2 is being analyzed. The data needs extensive re-scaling before any useful result can be reported. We hope to finish this in the next quarter.

Data collection has been completed on 140 subjects for the perceptual change study. A preliminary draft of a paper describing these results has been written. We are currently working on a journal article.

The paper describing the results of the first perceptual change study is completed and is in press in the journal Psychonomic Bulletin and Review.

The study on the relationship between perceptual, and memory aspects of scene representation is underway and should be completed in FY00Q4.

The purpose of the short-term retention of object/ location information experiment was to determine whether viewers' memory for objects and their locations would differ between a situation in which the objects are freely viewed while all are simultaneously present, and when those objects are viewed within a Spatially Distributed Rapid Serial Visual Presentation Display (SDRSVP). There were 6 subjects. On each trial, subjects saw a rapid serial presentation of everyday objects at different locations in a scene, and then were tested by asking them which object had been at a particular location. Previous research in our lab had shown that when subjects freely look at an array of simultaneously presented objects, their memory performance on the test was largely determined by how many different objects they had looked at after the memory target. Memory for the target declined rapidly as a function of the number of intervening objects fixated and then reached an asymptote well above chance. The question here was whether the same function would apply in a situation in which the objects were presented in a SDRSVP display with stimulus durations being the same as those of previous viewers' gaze durations in the free viewing situation. Other questions were related to differences in eye movement parameters as a function of stimulus durations in the SDRSVP display. Specifically, since previous results (see report in FY00Q2) had indicated that the stimulus duration in a SDRSVP display affected both the likelihood of fixating on an object and the likelihood of the eyes arriving at the object late or leaving the object early, we wondered if stimulus durations or the affected eye movement variables would have an impact on memory performance.

The first key result was that the number of objects presented after the target had a very large effect on memory performance. This effect was very similar to what we found previously in the context of free viewing of objects presented simultaneously in scenes, and is clear evidence of a visual recency effect in memory. On the other hand, the stimulus durations of objects in scenes had little if any impact on subjects' memory performance. Other variables that failed to affect memory performance included subjects' gaze durations on the target or the degree to which their eyes arrived on the target late or left the target early. Thus, in general, within the range of stimulus durations used (between roughly 70-3000 ms) and the resultant exposure durations derived from the eye movement measures, there was no effect on memory performance.

Nevertheless, there was one eye movement measure that had a clear impact on memory performance. That was whether the subjects actually fixated the target or not, though this effect was not as large as one might expect. Figure 29 shows subjects' memory accuracy as a function of the number of items presented after the target, and whether the target was fixated. As can be seen in the figure, having fixated the target resulted in a consistent memory advantage, and this was roughly the same no matter how many other objects were presented after the target. The surprising aspect of the graph is how well subjects did when they did not fixate the target. Since all objects that appeared on the screen were sudden onsets, we can generally assume that they involuntarily captured the viewer's attention. Thus, given that all the objects should have been attended, the only difference attributable to having fixated them is that they could then be seen with better acuity. This acuity difference, then, may account for the nearly constant but rather small advantage for fixated over unfixated targets.


Figure 29: Memory accuracy as a function of number of items presented after the target and whether the target was fixated.

In sum, the present investigation showed that there is a large recency effect on memory for the identities of objects at locations in a scene, such that the most recent objects are remembered best. Furthermore, this effect is virtually the same whether the viewer is presented with all the objects simultaneously and freely fixates them in any order he likes, or whether he simply looks at the scene in which those objects appear singly in a predetermined serial order (an SDRSVP display). When objects are presented in an SDRSVP display, while the duration of those objects makes a difference in whether and when the viewer actually fixates each object, stimulus duration within a wide range does not appear to affect memory. Nevertheless, given that the objects are attended, there is a small difference in memory attributable to whether they are fixated or not. Presumably, this difference is due to the superior acuity enjoyed by objects that are fixated versus being viewed only in peripheral vision.

In other research at RSC, the Logitech ultrasonic tracker by default exhibits a significant amount of jitter which was found to contribute quite significantly to the variance of the predicted gaze position on screen. It was experimentally verified that the variance did get reduced substantially by artificially fixing the head transformation used in the algorithm (i.e., eliminating its contribution to the variance) with the user staying relatively still with a fixed gaze. Although the Logitech tracker jitter may be reduced there is a tradeoff of increased latency; the Logitech tracker is also limited to a 50Hz update rate. In an effort to obtain more suitable head tracking data, work has begun to support the use of other head trackers, including the Ascension Flock Of Birds (100Hz update rate and reduced jitter) and the InterSense IS600 Mark II Plus (500Hz orientation and 150Hz position update rates and greatly reduced jitter).

Additional PC hardware was acquired by RSC which will facilitate the use of a new set of eye tracking boards from ISCAN. These boards will provide increased resolution (4k x 4k as opposed to 512 x 256) as well as improved robustness of the eye tracking.

July 1 - September 30, 2000 (FY00Q4)

Goals: Integrate head-tracked, head-mounted eyetracking system into ISL. (Rockwell) (Rescheduled for FY00Q4) Begin experiment to look at the effect of spatial uncertainty of information presentation on direction of attention in time and space. (NCA&T) (Reported as incomplete FY00Q2 - FY00Q3) Use the data collect in the experiments conducted in Q1 and Q2 to determine scanpath latency and risks associated with information location uncertainty. (NCA&T) (Reported as incomplete FY00Q3) Begin chapters for the ARL Human Factors book. (UIUC) Prepare written reports for the ARL Symposium on the perceptual change studies. (UIUC) Complete integration of head-tracked eye tracking systems into ISL with a client application. (Rockwell) Complete data collection from Q3 and report the findings for publication. (NCA&T) Begin studies of multi-resolutional display using eye and head tracking with large format display. (UIUC) Begin writing a chapter for the Fed Lab computer science handbook. (Rockwell)

Progress: The first milestone is addressed under the discussion of the head-tracked eye tracking system, below.

This is a continuation of data analysis collected during the period of research for FY99Q1-Q3. The experiment collected data on distribution of gaze points for fixating on targets appearing nine locations as shown in the figure below:

Since there are 8 positions, each has a probability of target appearance of 0.125.

All targets as previously reported were randomly selected from a database of military symbology. Each subject sat at a chair with the head fixed by a chin rest. The experimental group were randomly assigned to eight of the experimental conditions defined by time-location pair and two levels of cueing (no cue and with cue). The time-space combinations are:

Time/Location

Warning (cued target)

No warning (uncued target)

Time

Time window fixed (T1)

Random time target appearance (T2)

Location (space)

Location Fixed (L1)

Random location of target selected from the probability wheel of 8 positions (L2)

Each subject was exposed to 8 experimental sets: (T1,L1), (T1,L2), (T2,L1), (T2,L2) with each of the trial block under cued or uncued stimuli.

The two analysis completed do not report on the effect of cued signals. Data are summarized on mean latency times with respect to target dynamic ratio. This was previously defined as the number of dynamic targets to number of fixed targets. Exclusive of the condition of no fixed target, a dynamic ratio of 1 have equal number of fixed and dynamic targets. We experimented with the dynamic ratios of 3/1, 3/2, and 3/3.

Sample result:


The figure below shows mean latency times with respect to the experimental block conditions (time-space) combinations. As indicated by the graph, the high dynamic ratio, with the appearance of targets at random time and at random positions increases latency times to detect the primary target of interest.

In the next figure, the mean latency times are plotted for different probability location values.

Result Implication:

The result will help to determine the level of visual information processing performance under uncertain and dynamic information.

UIUC chapters for the ARL Human Factors book are underway.

We are beginning to prepare the abstracts for this years ARL symposium, since full papers are not due until FY01Q1.

The RSC eye tracking system has been integrated into the ISL-West as a part of the Rockwell Integrated Displays Testbed version 2 (IDTv2). The Eye Tracking Server utilizes a head-mounted eye gaze tracking system from ISCAN, Inc. and an ultrasonic head tracker from Logitech. RSC's head motion compensation algorithm is integrated into the server to provide 6 degree-of-freedom (6DOF) movement to the user. In the IDTv2, a Coordinate Space Transform (CST) Server has an integrated eye tracking client which receives head motion compensated eye gaze data from the Eye Tracking Server. Details are outlined in the FY00Q2 report.

We have collected some preliminary data for a series of studies that we are collaborating on with Drs. Jian Yang and Michael Miller of Kodak. Drs. Yang and Miller have developed an image filtering algorithm that produces multiresolutional images with continuous/smooth degradation from the center of interest to the periphery. Our previous studies investigating gaze contingent multiresolutional (GCMR) displays have used bi-resolutional images, in which there are only two levels of resolution: a high resolution inset, and a lower resolution surround. Multiresolutional images that are smoothly degraded have the added benefits of potentially being indistinguishable from full high resolution images, while increasing bandwidth savings due to maximal image degradation. The Kodak image filtering algorithm was engineered based on the results of numerous psychophysical studies of human contrast sensitivity as a function of spatial frequency and retinal eccentricity. Thus, their algorithm selectively cuts out spatial frequency information from an image depending on the retinal eccentricity of each area of the image. The studies we are running are extensions of previous studies we have done for ARL that investigated various parameters of GCMR displays.

In our first study, we are investigating the effects of varying image degradation at differing retinal eccentricities on both physiological (eye movement-based) and image quality judgment measures. In this study, we can make precise predictions about the effects of our levels of image degradation based on prior psychophysical studies that used grating stimuli in contrast detection and resolution tasks. Figure 30 shows six versions of an image that have been filtered with the Kodak algorithm.

October 1 - December 31, 2000 (FY01Q1)

Goals: Prepare written reports for the ARL Symposium on the perceptual change studies. (UIUC) (Reported as incomplete FY00Q4) Complete data collection from Q3 and report the findings for publication. (NCA&T) (Reported as incomplete FY00Q4) Begin studies of multi-resolutional display using eye and head tracking with large format display. (UIUC-McConkie) (Reported as incomplete FY00Q4) Complete chapters for the Human Factors Handbook. (UIUC/NCA&T) Begin to analyze data from the second and third perceptual change experiments. (UIUC – Kramer) Begin to prepare papers for the ARL Federated lab symposium. (UIUC – Kramer and McConkie) Complete analyses of studies of multi- resolutional displays using eye and head tracking with large format display. (UIUC – McConkie) Conduct additional studies on the short-term retention of object and location information attended while viewing a multi-object display. (UIUC – McConkie) Create optimized release version of eye tracking server.(Rockwell ) Complete writing chapter for Federated Laboratory Computer Science Handbook.(RSC) Incorporate changes to support an integrated Fed Lab Symposium demonstration. (UIUC) Write report for the Fed Lab Human Factors Handbook. (NCA&T)

Progress: One paper, noted below, was written for inclusion in the ARL 2001 Proceedings and presentation at the conference.

All data collection is now complete from the FY00Q2-FY00Q3 NCA&T research. We are summarizing the report for publication.

We are currently planning our first large format multi-resolutional display study using a monitor in the ISL. In it, we will exploit the larger monitor size and higher pixel resolution in order to display images with much larger fields of view in high resolution. The ultimate goal of this study will be to come up with an image-coding scheme within which wavelet decomposed gaze-contingent multi-resolutional images can be used to maximize image coding bandwidth savings. In order to achieve this goal, it will first be necessary to determine, for each level of wavelet coding, the retinal eccentricity at which its image degradation becomes imperceptible. In the coding scheme we envision, an image would be encoded with multiple nested levels of wavelet resolution, with the highest resolution at the point of gaze, and decreasing levels of resolution with further distance from that point. Within the wavelet-coding framework we are using, each level of resolution is determined by the number of sets of wavelet coefficients included in the image reconstruction, with a maximum number of 13 being full high-resolution, and a minimum of 1 being extremely degraded.

In three previous studies we have investigated the detectability of peripheral image degradation in bi-resolutional images as a function of (1) the number of sets of wavelet coefficients included in the degraded image periphery and (2) distance from the degraded periphery to the center of vision. These studies showed that when the degraded periphery was coded with as many as 7 sets of wavelet coefficients, viewers could not detect the degradation if it was 5 degrees of visual angle from the center of vision. However, when the degraded periphery contained only 4 sets of wavelet coefficients, at 5 degrees eccentricity the degradation was easily detectable. In those studies we could not present the degraded periphery further than 5 degrees from the center of vision due to constraints in our instrumentation (specifically, the area over which we could track the eyes with our Dual Purkinje eyetracker). However, using a larger monitor with higher resolution and an eye and head-tracking system capable of tracking much larger movements, we will be able to display images in which the degraded area of an image is 15 degrees from the center of vision. Using these larger images, we want to determine the eccentricity required to make image degradation imperceptible when even 1 set of wavelet coefficients is used. With this information, it should be possible to construct multi-resolutional images having nested bands of resolution, from 13 sets of wavelet coefficients at the center of vision to 1 set in the farthest periphery, and not be perceptibly different from a full high-resolution image. Such an image-coding scheme could save a great deal of bandwidth while causing no perceptual difficulties.

The first study will not use eyetracking capabilities, but, rather, will present static images very briefly at the viewers center of vision (at the center of the monitor), and ask them to press a button if they detect peripheral degradation. Once the results of this study are known, it should then be possible to do studies using the eye and head-tracking capabilities developed for use with that monitor in the ISL.

As part of this project, we are developing a mathematical model of detection of wavelet image degradation. This work is in concert with colleagues at Eastman Kodak, and makes use of existing models of visual sensitivity as a function of the spatial frequency content of images and retinal eccentricity. We are testing this model against our previous detection data and will be using it to make predictions as to the detectability of degradation in our larger format images. We will then test these predictions in our planned study.

Chris Wickens (UIUC) worked with 2.2.3.3 researchers to include their research in the Human Factors Handbook.

Data analysis is underway from the second and third perceptual change experiments. We expect to complete a journal article in early January.

Three papers were completed and submitted for the 2001 ARL Federated Lab symposium: "Perceptual effects of a gaze-contingent multi-resolutional display based on a model of visual sensitivity" was written by Lester Loschky et al; "Guidance of the eyes by contextual information and abrupt onsets", by Matt Peterson and Art Kramer; and "Object-based control of overt attention shifts", by Jason McCarley et al.

We have not yet begun the large format study due to our work on the experiments described in the above-mentioned report, "Perceptual effects of a gaze-contingent multi-resolutional display based on a model of visual sensitivity". Those experiments are described in more detail below in the section "Other Research Progress".

We have not yet conducted the short-term retention studies due to our work on the experiments described in the above-mentioned report, "Perceptual effects of a gaze-contingent multi-resolutional display based on a model of visual sensitivity".

The optimized release version of the RSC Eye Tracking Server is currently in development and will be completed in FY01Q2.

"Applying Eye Gaze as a Human-Computer Interaction Pointing Device" was completed by RSC for inclusion in the Federated Laboratory Computer Science Handbook.

As plans for the Fed Lab Symposium integrated demonstration developed, it become apparent that no additional input from this module would be necessary.

As noted above, Chris Wickens (UIUC) worked to include NCA&Ts research in the Human Factors Handbook.

Other Research Progress

Last quarter we reported on the beginnings of a study investigating the perceptual effects of a smoothly degraded gaze-contingent multi-resolutional display. In such a display, the image is high resolution at the center of vision, but becomes gradually blurrier as one moves further from the center of gaze. Furthermore, using a high precision eyetracker, the center of high resolution moves in concert with the eyes whenever they move. This work used images that were processed by an algorithm based on a model of visual sensitivity developed by the Eastman Kodak Company. The experiment involved subjects looking at monochrome photographic images while they performed a picture recognition memory task and their eyes were being tracked. There were 6 image filtering conditions: a full high-resolution control condition, 1 condition in which the level of image filtering was predicted to be roughly at the viewers' perceptual threshold, 1 condition predicted to below threshold, and 3 conditions predicted to be above threshold. After viewing each image, viewers were asked to rate its image quality. This gave us a very explicit measure of the perceptibility of the image filtering. We also used two eye movement parameters (fixation duration and saccade length) as implicit measures of the perceptibility of the filtering.

We first carried out a pilot study with four subjects. Based on our pilot results, we revised the design of our study, adding an extensive practice session and a more highly degraded filtering condition. We then carried out an experiment with six new subjects. The data we collected very clearly supported our hypotheses as to which levels of filtering were perceptible, and which were not. Figures 31, 32, and 33 below show subjects' mean image quality judgments, fixation durations, and saccade lengths as a function of level of filtering. Recall that according to the Kodak model of visual sensitivity, filtering level 2 was predicted to be just at perceptual threshold.

Figure 31 shows that as the level of filtering increased, the rated quality of the images greatly decreased. Quality ratings for filtering levels 1 and 2, predicted to be below or just at perceptual threshold, were not statistically different from the full high-resolution control condition.

Figure 31: Mean quality judgment as a function of filtering level. Error bars represent
± 1 standard error.

Figure 32 shows a similar trend for fixation durations. As level of filtering increased, so too did fixation durations. However, fixation durations in filtering levels 1 and 2 did not differ reliably from fixation durations in the full high resolution control condition.

Figure 32: Mean fixation duration as a function of filtering level. Error bars represent
± 1 standard error.

Figure 33: shows a similar trend as well for saccade lengths. That is, as level of filtering increased, saccade lengths became shorter. However, saccade lengths in filtering levels 1 and 2, predicted to be below or at perceptual threshold, were statistically no different from that of the full high-resolution control condition.

Figure 33: Mean saccade length as a function of filtering level. Error bars represent
± 1 standard error.

In sum, as level of filtering increased, viewers rated the pictures as being of lower image quality, the average time spent per fixation increased, and the average length of an eye movement decreased. However, the filtering conditions predicted to be just at or below perceptual threshold did not differ from the control condition.

Thus, our results show that it is possible to create gaze-contingent multi-resolutional images in which peripheral degradation matches the sensitivity of the human visual system so well that they cannot be perceptually discriminated, either explicitly or implicitly, from a full high-resolution image.

The RSC Head Motion Compensated Eye Tracking system was delivered to CECOM as part of the RSC Integrated Displays Testbed Version 2 (IDTv2). Eye tracking was extended to cover the large display which was partitioned into a 3 x 3 grid.

The RSC Coordinate Space Transform (CST) Server transforms pointing data into logical regions defined by the user. These transformed coordinates are then served to a client application, allowing the client to receive coordinates in its native space. The CST Server has been significantly enhanced in quarter one.

This software can now accept two simultaneous clients. An additional data format, GenericXY, has been added, supporting a wider range of pointing devices. Native support has been added for the Wacom tablet as a pointing device, including response to tablet events and pass-through of tablet messages. Functionality specific to the tablet (as well as eye-tracking-specific functionality) may now be hidden from the user via the improved configuration file. Latency to response of pointing events has been decreased.

The software is now designed to support multiple types of regions. Two types are currently defined: Linear and Subdivided. Linear regions maintain the previous behavior of the CST Server; the new Subdivided regions are also linearly scaled but may be parametrically subdivided into smaller rectangles. To support this a new server message, CSTSubregionRect has been added, conveying the transformed coordinates of the subregion at which the user is currently gazing/pointing. This functionality is highly useful for quantizing a given area, such as a large display screen, and the quantization may be changed while the program is running without disrupting the data stream.

Future improvements will include the support of an arbitrary number of simultaneous clients, as well as the addition of new region types. Possible additional region types include logarithmically (or otherwise nonlinearly) scaled regions and non-rectangular regions.

The head-motion compensation module has been slightly revised to include more comprehensive error discovery and reporting mechanisms. The user will be alerted of potential calibration errors as a result of partially corrupted head- and eye-tracking data collected during the calibration process.

January 1 - March 31, 2001 (FY01Q2)

Goals: Complete data collection from Q3 and report the findings for publication (NCA&T) (Reported as incomplete FY00Q4-FY01Q1). Complete analyses of studies of multi- resolutional displays using eye and head tracking with large format display (UIUC - McConkie) (Reported as incomplete FY01Q1). Conduct additional studies on the short-term retention of object and location information attended while viewing a multi-object display (UIUC - McConkie) (Reported as incomplete FY01Q1). Create optimized release version of eye tracking server (Rockwell ) (Reported as incomplete FY01Q1). Finish data analysis for the perceptual change papers (UIUC - Kramer). Begin to write journal articles and book chapters describing the perceptual change studies and multi- resolutional studies (UIUC - Kramer and McConkie). Begin full documentation (including purpose, functionality, and operating instructions) of eye tracking system (Rockwell). Document results of experiments on virtual cognition problem (NCA&T). Complete papers for presentation at the Fed Lab Symposium (UIUC/NCA&T/RSC). Test for and participate in an integrated Fed Lab Symposium demonstration (UIUC).

Progress: All previously incomplete data collection are now complete (FY00Q2-FY00Q3). The summarized data has been accepted for publication:

Ntuen, C. A. & Rogers, L.D.(2001). Effects of information presentation refresh rate on moving objects on saccade latency. International Journal of Cognitive Ergonomics, 5(1), 47-58.

Our priorities have shifted since we began doing collaborative research with scientists from Eastman Kodak. As noted below, we are now working with their filtering procedures that produce images that match the spatial-frequency characteristics of the human retina. We feel that following this new dimension to the work is more critical to military needs than examining large-format implementations. It requires using an eyetracker with the highest spatial and temporal resolution, allowing us to minimize eyetracking error and temporal delays, and this eyetracker cannot be used within the large-format display environment.

Due to our new opportunities to advance work on gaze-contingent multi-resolutional displays, there will not be time to complete this milestone. We believe that this method has greater potential for reducing bandwidth and computation requirements in displays than does a method in which the eyes are directed by the computer.

Optimized release versions of the RSC Six Degree-Of-Freedom Server and Eye Tracking Server have been created. The servers support a much-simplified calibration procedure whereby a user interacts with a graphical user interface in order to specify the environment configuration and data collection configuration. Configurations as well as individual users' data may be saved and reused. This adds user and environment flexibility to the head motion compensated eye tracking system. In addition, the Eye Tracking Server was enhanced with an eye data smoothing procedure, resulting in reduced jitter of the head motion compensated eye tracking data.

Data analysis has been completed on the perceptual change study and we have written a technical report to describe our results. (Heather L. Pringle (2000). "The roles of scene characteristics, memory and attentional breadth on the representation of complex real-world scenes." UIUC, doctoral dissertation)

As noted above, a technical report has also been completed on the perceptual change studies. During this quarter we also have been writing an in-depth review of the existing literature on gaze-contingent multi-resolutional displays (GCMRDs). No such review currently exists. We submitted it to the journal, Human Factors, and have subsequently revised it.

Our review shows that GCMRDs are extremely useful in saving both processing resources and time, as well as transmission bandwidth, and indicates the wide range and rapidly developing interest in this method. The review includes studies from a number of different application areas including flight simulators, virtual reality, video teleconferencing, teleoperation, and telemedicine. It integrates both applied and theoretical work done by electrical engineers and perceptual psychologists. Writing the review has helped us determine the state of the art in research and development of GCMRDs, and further identify key unanswered questions. It has also allowed us to create a framework within which to analyze and evaluate GCMRDs.

Our framework is based on the physiological underpinnings of GCMRDs. First, the human visual system is most sensitive at the center of vision (the fovea) with sensitivity rapidly decreasing with retinal eccentricity. This provides the rationale for making multi-resolutional images, which have high resolution in the area of interest (AOI), and lower resolution elsewhere. Second, the human visual system uses eye and head movements to compensate for poor peripheral resolution by moving the fovea to regions of interest. This provides the rationale for making gaze-contingent displays, in which the AOI is dynamically updated so that it always matches the center of gaze. We therefore categorized the research and development of GCMRDs in terms of 1) multi-resolutional imagery, and 2) gaze-contingent displays. In our review we have analyzed the process of making multi-resolutional images, and the stages in making displays gaze-contingent.

We have found that there are three major approaches making multi-resolutional images: 1) computer generating images based on models with multiple levels of detail, 2) using image processing algorithms to remove unnecessary detail from constant resolution images, and, 3) using multi-resolutional sensors and cameras. Each of these approaches is best-suited to making different kinds of multi-resolutional images: computer generated images, constant resolution images, or live feed video images. We have found that many individual approaches to producing multi-resolutional images are based on neurophysiological or psychophysical research, but very few of these approaches have actually gone through rigorous human factors testing.

Regarding gaze-contingent displays, we have argued that it is important to consider how the AOI location is dynamically updated, and how quickly this happens. The AOI movement can be done gaze-, head-, or hand-contingently, or predictively. We have noted that moving the AOI on a gaze-contingent basis is the most natural method, and should also cause the least perceptual problems, but it is also the most technically difficult to implement. Furthermore, there is a user-comfortability versus accuracy trade-off with gaze tracking systems. Related to this is a major trade-off between AOI placement accuracy/resolution and perceptual quality. Finally, we note that the speed of AOI position updating has important perceptual consequences. This is related, then, to a trade-off between speed of AOI updating and perceptual quality.

In general we argue that the most basic trade-off in GCMRDs is between image resolution and retinal eccentricity. Thus, to maintain perceptual quality, as one reduces peripheral resolution one must put that lower resolution further from the fovea. This trade-off can be used to compensate for other GCMRD trade-offs involving perceptual quality. Thus, we have argued that as one decreases either the accuracy of AOI placement, or the speed of AOI updating, one can compensate for this by increasing the area of high resolution. Nevertheless, we not that, so far, there is only a small amount of research that has been done to support these conclusions.

Finally, we have found that most research and development of GCMRDs has been put into developing multi-resolutional images, and much less has been done to develop gaze-contingent displays. Thus, we suggest that issues related to moving the AOI, such as the accuracy and speed of AOI updating, are in need of human factors research that can guide the making of more effective GCMRDs.

User documentation for the RSC Head Motion Compensated Eye Tracking system has been created. The documentation covers the system components, functionality, and operating instructions for the eye tracker, head tracker, Six Degree-of-Freedom Server, and Eye Tracking Server.

Further data analysis: Search and detection of targets in a virtual environment

EXPERIMENT

The purpose of this experiment was to analyze the effects of saccade latency on search and detection of targets in a virtual environment. Two levels of search difficulty was explored: single task search and detection with one object, and double task search and detection with two objects. The objects were military symbols as shown in Figure 34.

Figure 34

The objects are military symbols with the usual color of red for enemy and blue for friendly forces. Targets can move shown in any geographic direction, North, Northeast, Northwest, etc. All targets used in the experiment had the same symbol size. Symbol sizes were derived from the Military FM Handbook of Symbols (FM 105-5-1, 1985).

The participants were asked to search for most dangerous enemy target relative to a designated friendly target by geographical location (perceived distant). A friendly target was presented with a single flash that lasted between 5-8ms and returned to its normal location without cue. First, only single search tasks were presented (5 trials for 30 participants = 150 trials). In the second experiment two friendly were selected for targets (5 trials for 15 participants). Participants were randomly assigned to each task. Participants were randomly selected to experimental condition based on sign sheet. Using the same participants was avoided to remove learning effect between tasks.

PARTICIPANTS

Forty five subjects between the ages of 17 and 30 participated in the experiment. The subjects consisted of graduate and undergraduate students attending North Carolina A&T State University. Students earned extra credit for participation. At the time of the study the participants were tested for color blindness using Ishihara's Test for Color Blindness. They were given two preliminary tests to measure the extent to which they learn the symbols. The first test consisted of a written test where they were shown 8 symbols, and they had to write the name of the symbol. The second preliminary test entailed using PowerPoint presentation software. Each symbol was randomly presented at least 55 times superimposed on a digitized land map. The participant had to verbally name the symbol and tell whether it was friendly or enemy (as indicated by the red or blue color). The participant must have scored at least 90% total from both test combined. If they missed over 5 total symbols they failed the preliminary test, and were not allowed to continue. Once they passed the preliminary test satisfactorily, the subject wasbe given verbal instructions and the ISCAN eye tracking equipment would be calibrated for their eyes.

APPARATUS

The equipment used in this experiment consisted of:

  • An IBM compatible computer, monitor and keyboard – this was used to input the different displays through to the TV monitor.
  • A 32-inch TV Monitor – used to display the information to the participants.
  • Internet Explorer 3.0 – used to run the animated displays (the scenarios were animated gifs).
  • Video Recorder – used to record the display, which illustrated the participant's eye movements.
  • ISCAN Eye Movement Monitoring System Version 2.05 – used to record and measure eye movement data.
  • PowerPoint Presentation Software – used to display symbols to participants during the preliminary test.
  • Systat 8.0 statistical analysis software was used to analyze the data obtained from the experiments.

Major Hypothesis and Variables Measured

The main hypothesis is:

Does saccade latency affect detection performance?

Dependent variable. Initial latency fixation time and percentage of missed detection were the major dependent variables.

Independent variable. The independent variable was the type of search (single versus parallel).

RESULTS

Number of Saccades(msc)

The mean number of saccades under single search task was reliably shorter (N = 1.8) than those with parallel search task ( N = 2.63), student t –statistics = 2.05, p< 0.037.

Probability of Missing Object (Detection performance)

The mean percentage of missing objects under single search task was reliably better (f = 0.09) than those with parallel search task ( f = 0.1215), student t –statistics = 7.83, p< 0.015.

The relationship between saccade latency time and miss probability is shown in Figure 35 below.

Figure 35 The relationship between saccade latency time and miss probability for both single and double tasks.

Numerous papers were written for the Federated Laboratory Symposium, as listed under Publication, above.

UIUC was unable to complete the necessary eyetracking research in time to contribute to the symposium demonstration. Although the research will be completed, the involvement in the symposium will not.

Other Research

This quarter we have carried out a follow-up experiment to the one reported at the most recent Fed Lab Symposium entitled, "Perceptual effects of a gaze-contingent multi-resolution display based on a model of visual sensitivity". In that previous study we looked at the perceptual effects in gaze-contingent multi-resolutional displays (GCMRDs) of varying the drop-off of spatial frequencies as a function of retinal eccentricity, using Eastman Kodak retinally-motivated filters. We used 5 levels of spatial frequency drop-off: 1 level assumed to be below perceptual threshold, 1 level assumed to be just at perceptual threshold, and 3 levels assumed to be above perceptual threshold. These assumptions were based on previous studies with sinusoidal grating patches as visual stimuli, and in which the subjects' task was to discriminate the orientation (vertical vs. horizontal) of the gratings. Our chief question was, Would the results from such studies scale up to much more complex and natural stimuli (photographic images), viewing conditions (free viewing), tasks (scanning images for memory), and measures (including eye movement parameters). The measures we used were both explicit (image quality judgments) and implicit (eye fixation durations and saccade lengths). The results nicely fit our predictions. Those levels of spatial frequency filtering that were assumed to be at or below threshold produced results that were statistically no different from our control condition (i.e., constant high-resolution images). Those levels of filtering assumed to be above perceptual threshold produced results significantly different from the control condition.

The question we addressed in our latest study was this: How detectable is the image filtering in each of the above conditions? This question is not strictly answerable using the results from the above study. This is because, in that study, the gaze-contingent multi-resolutional images were present continuously throughout a trial. Thus, a level of filtering that might not be detectable at a given moment in time, could potentially be noticed later, perhaps based on a misalignment of the area of interest (the center of high-resolution) with the center of gaze. While such conditions would certainly be valid in terms of judging the overall perceptual quality of the GCMRD, it would not provide a pure measure of the detectability of a given level of filtering. Thus, we used an experimental procedure we had developed earlier (and reported earlier to ARL) to investigate the detectability of peripheral image filtering in a GCMRD.

The experimental procedure we used was as follows. During a trial, for most of the time, the viewer would be presented with a constant high-resolution image. However, on selected eye fixations (on randomly selected 9th, 10th, or 11th fixations) we presented the viewer with a multi-resolutional image whose area of interest (AOI) was centered where the viewer was looking. The viewer's primary task was the same as in the preceding experiment—to scan the image in order to prepare for a difficult picture recognition test. However, their secondary task was to press a button as soon as they detected peripheral image degradation. This experimental procedure allows a purer measure of detection of the peripheral image filtering because the viewer does not know when to expect the filtering to occur. Furthermore, in analyzing the data, we can delete any cases in which the AOI was not correctly aligned with the viewer's center of gaze.

We collected data from 12 subjects. After deleting cases in which the AOI was placed 2° or further from the center of gaze, we graphed the proportion of cases detected by viewers as a function of the level of image filtering. As shown in Figure 36 below, the results very strongly confirm our predictions based on our previous study, and the results of earlier experiments using grating patch stimuli and orientation discrimination tasks. The most striking aspect of the results is the extremely low detection level for filtering levels 1 & 2 assumed to be below, or at threshold, and that these two filtering conditions did not differ from each other perceptually. Thus, although the level of filtering differs quite a bit between these two conditions, they are perceptually equivalent in terms of detectability. Likewise, the detection levels for the two filtering levels assumed to be well above threshold, levels 4 & 5, did not differ from each other, but, instead, are both nearly at maximum. In contrast, the level of detection for the first filtering level assumed to be above threshold, level 3, lies almost perfectly between the lower and upper bounds. Thus, we have perceptual scaling of the levels of filtering, but this scaling has a much narrower range than that of the physical stimuli. This study begins to identify the boundary conditions for appropriate filtering levels.

Figure 36: Proportion Detection of Peripheral Filtering as a Function of Filtering Level

It is also worth considering the fact that our detection results showed a much sharper divide between the levels of filtering than the image quality judgments from our previous experiment, which are repeated as Figure 37, below. We would suggest that this difference is due to occasional misplacements of the AOI due to gaze-tracking failure. Because we eliminated such AOI placement failures from our analysis in the current study, there was no difference between filtering levels 1 & 2, which were assumed to be below, or at threshold, or filtering levels 4 & 5, which were assumed to both be well above threshold. In contrast, in our earlier study, viewers rated the perceptual quality of the entire preceding trial, and this may have included various cases of misalignment of the AOI and the viewer's center of gaze. This may explain why quality judgments for filtering level 1 (below threshold) were somewhat better than for filtering level 2 (at threshold), and level 5 is somewhat better than level 4. This would suggest that the detectability of a given level of peripheral image filtering will be influenced by the degree of AOI placement accuracy. As the level of AOI placement accuracy drops, the perceived image quality will drop for a given level of image filtering.

Figure 37: Mean Quality Judgment as a Function of Image Filtering Level

The most important conclusions to be drawn from this study for the ARL are as follows. First, we have confirmed that the level of image filtering predicted to be at threshold is indeed nearly undetectable. This level of filtering should be well-suited for use in GCMRDs in which one wants peripheral degradation to be undetectable. Second, we have preliminary evidence consistent with a trade-off between AOI placement accuracy and image quality for a given level of image filtering. This in turn suggests that, if the accuracy (or spatial resolution) of one's gaze tracking system is limited, one may need to use a lower level of image filtering to maintain perceptual quality in a GCMRD.

April 1 - June 30, 2001 (FY01Q3)

Goals: Complete chapters and journal articles describing the perceptual change studies. (UIUC-McConkie and Kramer) Complete journal articles describing multi-resolutional eye movement studies. (UIUC -McConkie) Complete revisions on previously submitted journal articles for the eye tracking and perceptual change studies. (UIUC - Kramer and McConkie) Complete full documentation (including purpose, functionality, and operating instructions) of eye tracking system. (Rockwell) Package eye tracking system software into release form. (Rockwell ) Continued from Q2, document results of experiments on virtual cognition problem. (NCA&T)

Progress: We have completed a series of manuscripts which report research supported under the current module. These manuscripts include:

  • Peterson, M.S., McCarley, J. Kramer, A.F., Irwin, D.E. & Wang, F.R. (in press). Visual search has memory. Psychological Science.
  • Peterson, M.S. & Kramer, A.F. (in press). Attentional guidance of the eyes by contextual information and abrupt onsets. Visual Cognition.
  • Peterson, M.S., Kramer, A.F., Irwin, D.E. & Hahn, S. (in press). Modulation of oculomotor control by abrupt onsets during attentionally demanding visual search. Visual Cognition.
  • Peterson, M.S. & Kramer, A.F. (in press). Contextual cueing reduces interference from task irrelevant onset distractors. Perception and Psychophysics.
  • Pringle, H., Irwin, D.E., Kramer, A.F. & Atchley, P. (in press). Relationship between attention and perceptual change detection in driving scenes. Psychonomic Bulletin and Review.
  • Kramer, A.F., Cassavaugh, N.D., Irwin, D.E., Peterson, M.S. & Hahn, S. (in press). Influence of single and multiple onset distractors on visual search for singleton targets. Perception and Psychophysics.
  • Tang, H., Beebe, D. & Kramer, A.F. (in press). A Multi-State Input Mechanism with Multimodal Feedback. International Journal of Human-Computer Studies.
  • McCarley, J.S., Kramer, A.F. & Peterson, M.S. (submitted). Overt and covert object-based attention.
  • Kramer, A.F., Scialfa, C.T., Petterson, M.T. & Irwin, D.E. (in press) Attentional capture, attentional control and aging. In C. Folk & B. Gibson (Eds.), Attentional Capture. Amsterdam: Elsevier Science.
  • Kramer, A.F. (in press). Cognitive psychophysiology in human factors and ergonomics. In W. Karwowski (Ed.), International encyclopedia of ergonomics and human factors. NY:NY: Taylor and Francis.

  • McConkie, G.W & Loschky, L.C. (in press). Change blindness. In the Encyclopedia of Cognitive Science. MacMillan/Nature Publishing Group. (10 pages).

The above paper reviews the work done regarding perceptual change. It is a concise yet comprehensive summary of what is currently known about perceptual change (and so-called 'change blindness') for an educated but general audience. It has been accepted for publication.

  • Loschky, L.C. (in press). Some things pictures are good for: An information processing perspective. Visible Language. (16 pages).

The above paper describes some of the perceptual and cognitive functions of pictures that help viewers to deal with their inherent and extreme limitations in visual attention and short term memory. This includes discussion of much work done in the ARL grant on perceptual change, visual short term memory, and multi-resolutional displays. It is written for an educated but general audience and has been accepted for publication.

  • Reingold, E.M., Loschky, L.C., Stampe, D.M., & Shen, J. (in press). An assessment of a live-video gaze-contingent multi-resolutional display. In Proceedings of the Ninth International Conference on Human-Computer Interaction (HCII 2001), (5 pages).

The above paper describes a study done in collaboration with researchers in the laboratory of Eyal Reingold, at the University of Toronto, on gaze-contingent multi-resolutional displays. It has been accepted for publication.

  • Reingold, E.M., Loschky, L.C., McConkie, G.W., & Stampe, D.M. (2001). Gaze-contingent multi-resolutional displays: An integrative review. Manuscript resubmitted for publication [Revised and resubmitted]. University of Toronto. (67 pages).

The above paper was done in collaboration with researchers at the University of Toronto. The contents of the paper were described in the last quarterly report. During this quarter, we have finished revising the paper according to the reviewer's comments and have resubmitted it to the journal "Human Factors".

Full documentation; including purpose, functionality, and operating instructions; of the RSC eye tracking system was completed. The documentation was also delivered to CECOM and published online.

The RSC eye tracking system software was packaged for release in the form of setup kits for the RSC Six Degree-Of-Freedom Server and RSC EyeTracking Server. The setup kits were delivered to CECOM.

NCA&T Progress - Document results of experiments on virtual cognition problem.

All previously incomplete data collection are now complete. The summarized data is published as a working paper (under review) as Experiments on Virtual Cognition.

All experiments use sample diagram in Exhibit 1.

Exhibit 1. Sample screen capture of the symbol display used in the study.

Project Summary

Visual cognition is the analysis of information in a visual space for the purpose of decision-making. An example of a visual cognition problem is digitized maps with military symbols used to portray troop locations, troop sizes, and troop types. Potentially, a soldier with a portable computer or a head-mounted display (HMD) can view a digitized map with the military symbols, and use the information in a tactical decision-making situation.

Visual cognition presents two interesting psychological situations (Rogers, 1999) for study. One is the impact of eye movement while tracking relevant objects of interest, when displayed on a mobile (wearable) computer or on a Head Mounted Display (HMD). The other is the impact of visual display layout on cognitive processes such as perception and recall of information.

Eye movements are characterized by temporal periods of fixations on targets of attentional spotlights or region of interest (Duncan and Humphreys, 1989; Just and Carpenter, 1976; Norton and Stark, 1971a). The process by which the eyes look at a single point in visual space in known as fixating, and the location at which the eyes are aimed is known as the fixation point. Fixation points are selected as a result of new visual information. The observer selects a fixation point, and the eyes are moved to this point. If the observer does not detect any potential targets, either foveally or in the periphery of his or her visual field, a new fixation point is immediately selected. This process continues until at least one potential target is detected (Underwood, Clews, and Wilkinson, 1989).

In contrast to fixations are saccadic eye movements. Saccades are small quick jerks of eye movements that occur at discrete phases (Norton and Stark 1971b). Saccadic eye movements allow human observers to scan and localize targets. During saccade periods, there is some evidence that one information scene is submitted on the retina for another within a very short interval of time (Checkaluk and Llewelly, 1990).

Another important derivative of eye movement is a scan path. Scan paths consist of a sequence of alternating saccades and fixations that repeat themselves when a subject is viewing a picture, scene or object. Early experiments by Norton and Stark (1971a) showed that an internal cognitive model drive eye movements in the repetitive scan path sequence. In addition, the internal cognitive model controls the active looking perceptual processes. Each subject seemed to have a typical scan path for each object (Norton and Stark, 1971b).

Treisman (1988) has shown some distinctions between the features of an object (e.g. color, size, and orientation) and the focus of attention on the object. Treisman and Sato (1990) argued that the degree of similarity between the target and the distracters is a factor influencing visual search time (because the person will have to distinguish between the two targets). Duncan and Humphreys (1989) have shown experimentally that the time taken to detect a target in a visual display depends on two major notions:

  1. Search times will be slower when the similarity between the target and non-target is increased.
  2. Search times will be slower when there is reduced similarity among non-targets. Thus, the slowest search times are obtained when non-target is dissimilar to each other but similar to the target.

Eye Movement Studies and Visual Cognition

Early experiments by Norton and Stark (1971a, 1971b) suggested that an internal cognitive model drive the eye movements in the scan path sequence. A recent postulation derived from the internal cognitive model hypothesis in known as cognitive guidance theory proposed by O'Regan and Levy-Schoen (1987). According to the cognitive guidance theory, fixations are guided by the visual system so that they tend to fall on targets that are maximally informative.

Intraub's (1992) perceptual schema has been used to provide a different explanation to the cognitive guidance theory. Specifically, Intraub (1992) showed that variations in recall rate of information are a result of attempts to integrate information from the perceptual and cognitive schema during memory filtering tasks. Intraub offered the explanation that perceptual schema content is dynamic with respect to external information display and eye movement adaptation variations in scan paths and fixation duration.

Recently, there is a new cognitive postulate relating to eye movement information processing. This is known as the eye-mind assumption (Underwood & Everatt, 1992). The eye-mind assumption considers that a fixation will continue until all of the cognitive processes activated by the fixed object have been completed. Mind in this case refers to all current cognitive processes (such as search, detection, and recognition) regardless of the direction of attention. Under this assumption cognition is determined by fixation. A single fixation duration gives a fine-grained estimate of movement-to-movement processing. It should be cautioned that it is not necessarily the case that what is fixated is what is being processed (Underwood & Everatt, 1992).

Eye Movement and Perception

Perception is the processing and translation of sensory stimulus into percepts or familiar objects. Studies in eye movement and spatial reasoning show that eye fixations, gaze, and scan paths are influenced in part by perceptual or similarity grouping of objects in the visual space. Other studies suggest that perceptual span (effective field of view) is affected by the size of the object being scanned (Rayner, Well, & Pollatsek, 1980). The total perceptual span consists of the total area from which useful information is extracted from a fixation. In eye movement studies, percept duration has been attributed to fixation duration (Sakano, 1963). Percept duration is defined as the amount of time that a given organization of the object lasts before it is replaced with its competition. An experimental variable related to percept duration is the overall time during which a given interpretation of the figure occurs in the fixation period.

A phenomenon known as perceptual schema is important to object identification, detection, and recall in information display space (Intraub, 1992). The perceptual schema hypothesis asserts that perceptual expectancies contained in the schema may become incorporated into the subject's recollection of the object in a scene.

Implications to the Present Study

The studies previewed in the previous sections raise important issues about the roles of eye movements in cognitive processes. Of particular interest to the present study are:

  1. The goal of symbolic displays is to enhance the visual correspondence of physical reality of the universe and the abstract knowledge: a concrete-concept problem space. The relationships between these two types of knowledge are rarely investigated in eye movement studies. In the military domain, symbols are used to depict troop size, configuration, location, and composition. Rasmussen (1983) notes that "the distinction between perception of information as signals, signs, or symbols is generally not dependent on the form in which the information is presented, but rather on the context in which it is perceived, that is upon the intentions and expectations of the perceiver". In the concrete-concept paradox, military symbols thus serve as a physical function to map the external world to the abstract.
  2. Understanding symbolic chunking or perceptual grouping has the potential to improve the effectiveness of information display for the soldier-on-the-move. Some of these are:

    1. Enhancing spatial awareness in orientation of troop locations;
    2. Contributing to the individual soldier's ability to estimate enemy troop directions or friendly troop locations in relation to his or her position;
    3. Providing continuous integration of a model of the physical world and the abstraction of the mental world; the paradox of the eye-mind information integration (Underwood & Everatt, 1992).

The summary of lessons learned from previous studies have shown that eye movement recordings have some data relevant to information processing that can enhance cognitive decision processes such as object search times, estimation of perceptual distance, information latency, saliency, and memory activities such as recall accuracy. These studies, however, concentrate on more issues of perception and attention with little effort expended on cognitive correspondence problems between spatio-temporal events of information display.

Previous studies help to expose at the need to study visual cognition in a virtual environment. In this study, we investigated following problems.

  1. Experiment 1: Investigate whether information presentation refresh rate and dynamicity of objects in a visual space have effects on saccade latency. This problem is important to information processing domains that require digitally coded symbols and signs in a visual space. Similar problems using eye movement studies in the domain of battlefield visualization have been investigated recently by Yeh and Wickens (1997), Wickens, Kramer, Barnett, Carswell, Fracker, Goettl, and Harwood (1985), Walrath, Gurney, and Yoss (1986).
  2. Result of the study:

    The results of the experiment show that information refresh rate has effect on saccade latency.

    As the human processes complex dynamic information, there is evidence of visual attention shift to new dynamic targets while the saccade is centered around objects perceptually judged to be of high saliency. Attention leads to space-based information prioritization of the attended objects. In this case, there was no apparent difference between saccade latency times in real versus virtual environment.

    As the number of simultaneous dynamic objects increases in a display, latency times seem to be the same irrespective of the information refresh rate.

    Similar to the findings of Bridgemen (1983), saccadic eye movements generate cognitive bias such as attentional bias, which depends on the spatio-temporal aspects of the saccade, but independent of the type of environmnt. This result is shown in Table 1.

    Table 1. Attention allocation between static and moving objects in visual space for 60Hz.

     

    Dynamic Objects

    Fixed Objects

     

    % Attention Allocation

    Dynamic object ratio

    Avg. # of Eye Fixation

    Avg. Fix. Time (msc)

    Avg. # of Eye Fixation

    Avg. Fix. Time (msc)

    Total # Fixations

    Dynamic

    Static

    (0:3)

    -

    -

    16

    7.365

    16

    -

    100

    (1:3)

    9

    7.46

    5

    3.76

    15

    60

    40

    (2:3)

    12

    10.316

    2

    1.69

    14

    86

    14

    (3:3)

    15

    11.38

    3

    2.14

    18

    83

    17

    (3:0)

    31

    20.58

    -

    -

    31

    100

    -

    (3:1)

    23

    16.45

    4

    3.41

    27

    85

    15

    (3:2)

    21

    18.33

    7

    6.36

    28

    75

    25

    Our experimental findings reveal that both object dynamicity (moving targets) and information presentation refresh rate have effect on initial latency times. And, there were statistically noticeable interactions in initial latency times when two or more targets were in motion simultaneously at different refresh rates.

    Several factors may be attributed to the increase in initial latency times as the number of dynamic target increase. For example: (a) rates of visual motion have been found to increase with increasing distance from fixation (Honda, 1995), (b) spatial attention shifts may be a result of object size, color and other geometrical properties which were under experimental control (Zelinsky & Sheinberg, 1995), and (c) the control and recording of the stimulus on-set asynchrony may have effect on latency time (Flowers, 1993). An on-set asynchrony time is the delay time between the cues (in this case, refresh rates, and target on-set (in this case, target movement)). In general, it was observed that when the number of dynamic targets increases, subjects took longer time to make their initial saccades.

  3. Experiment 2: Investigate whether there is any relationship between visual displays, information chunking, and eye movement response time.
  4. The stimuli presentation had five chunks with the symbols:

    C1 = (Infantry)

    C2 = (Mechanized Infantry)

    C3 = (Armored Calvary)

    C4 = (Infantry, Mechanized Infantry)

    C5 = (Armored Calvary, Mechanized Infantry)

    C6 = (Infantry, Mechanized Infantry, Armored Calvary)

    Note that the chunk groupings C4 – C6 are based on stimulus similarity. For example, the stimulus similarity symbol in C4 = "X"; C5 = "0" with "/" inside the cycle, and C6 = ("X", C5).

    Results Obtained:

    Figure 38 shows the mean response time by chunk grouping.

    Figure 38 Mean response Times By Chunk Group

    This suggests that latency should be affected by displays because the displays are dynamic. A one-way analysis of variance was done to determine if there was a significant different between the response latencies for the different levels of complexity. Table 2 shows a Fcal = 3.164 > F0.05 = 1.75; p-value of .000, which indicates that there is a significant difference between the response latencies at different levels of complexity.

    Table 2. ANOVA for eye movement response latency at complexity levels

     

    Sum of Squares

    df

    Mean Square

    F

    P

    Between Groups

    3.228

    17

    .190

    3.164

    .009

    Within Groups

    4.321

    72

    6.002E-02

    Total

    7.550

    89

    None of these variables show a significant effect on the participants' eye movement response latency. There is an interaction effect of density and dynamics that yields a significant result. Figure 39 illustrates the eye movement response latencies at the different levels of complexity.

    Figure 39. Eye movement response latency based on symbol size

    Figure 40 illustrates the response latencies at e different levels of object movements This graphical analysis shows that displays with three dynamic symbols elicit longer response latencies than displays with fewer moving symbols.

    This graphical analysis shows that it takes the eyes a shorter time to reach symbols when there are twelve symbols displayed, than it does when there are fewer (four and eight) symbols displayed. Both Potter (1976) and Senders (1976) have suggested that the eye tend to remain relatively fixed at the place where information is rapidly being presented in order to process new information. This suggests that the density of the display would have no affect on recall latency, because the eyes tend to fixate in places that information is being presented.

    Figure 41: Relationship between latency times and object density


    These results show that of the factors chosen that comprise display complexity (size, density, and dynamics), none elicit a significant affect on eye movement response latency. This suggest that when designing displays, if one is concerned with the amount of time it takes the eyes to reach an object, these factors will not significantly affect the results.

  5. Experiment 3: The level of complexity of a display will affect recall accuracy of information in real and virtual environment
  6. Result of study:

    Analysis of variance (ANOVA) was conducted using percentage accuracy data. The main effect of complexity was tested using two-way ANOVA. The result showed that information recall due to display complexity in virtual environment is different from information recall in real environment (Fcal = 19.56 > F 0.05 = 1.67, p =0.031). The result supports the notion that people tend to "believe" real environment than virtual environment. The average recall time for real environment was 355ms (s = 27.4ms) versus virtual environment recall average time of 429ms (s = 62.58).

  7. Experiment 4: The interaction of symbol size, symbol density, and display dynamicity has effect on information recall.
  8. Result Obtained:

    An ANOVA was uses to analyze the results of data collected. Table 3 shows the sample result.

    Table 3 ANOVA for size, density, and dynamic's affects on recall accuracy

    Source

    Sum of

    Squares

    df

    Mean

    Squares

    F

    P

    Size

    Density

    Dynamics

    size * density

    size * dynamics

    density * dynamics

    size * density * dynamics

    Error

    951.162

    931.274

    40389.582

    1816.682

    747.746

    655.265

    7350.342

    114585.257

    1

    2

    2

    2

    2

    4

    4

    468

    951.162

    465.637

    20194.791

    908.341

    373.816

    163.816

    1837.585

    244.840

    3.885

    1.902

    82.481

    3.710

    1.527

    .669

    7.505

    .049

    .150

    .000

    .025

    .218

    .614

    .000

    As shown in Table 3, there is a significant difference between the recall accuracy of displays with different military symbol sizes and that there is a significant difference in the recall accuracy of displays with different levels of number of dynamic symbols on the display. Displays with different levels of symbol density showed no significant difference in the participant's ability to accurately recall information.






Cognitive Modeling of Virtual Reality Environments
Dr. George McConkie, Dr. Arthur Kramer, Dr. Chris Wickens, Dr. David Zola (UIUC),
Dr. Celestine Ntuen (NCAT), Dr. William Marshak (Sytronics), Dr. Grayson Cuqlock-Knopp,
Dr. Laurel Allender, Dr. Paul Rose, Dr. Michael Benedict, Ms. Carolyn Dunmire, and
Ms. Brenda Thein (ARL)

April 1 - June 30, 1996 (FY96Q3)

Goals: Obtain and set up equipment.

Progress: Data collection is continuing from subjects on navigation tasks; data from previous subjects are being analyzed. Equipment is on order.

July 1 - September 30, 1996 (FY96Q4)

Goals: Complete design of virtual spaces to be used in the research.

Progress: The design of a number of virtual spaces has been completed and data is being collected to examine the issue of spatial navigation in virtual reality environments. Computers having 3-D display capability were set up for use in research on cognitive representation of virtual spaces. We also obtained and set up binocular eyetracking equipment for examining the depth plane at which the observer is attending.

Sytronics has acquired the SOAR software from the University of Michigan to support exploratory efforts in cognitive modeling under a contract with the Air Forces' Electronics Systems Division. This software runs in conjunction with Loral's MODSAF distributive interactive simulation (DIS) system software, so modeling efforts will be compatible with the DIS application later.

October 1 - December 31, 1996 (FY97Q1)

Goals: Acquire SOAR system for Silicon Graphics computer and experiment with its architecture.

Progress: The SOAR software was acquired and installed, although we are still in the process of getting Air Force funding in place. Sytronics has begun preliminary software design to implement the F-16 ground attack task specified by the Air Force. Dr. John Laird of the University of Michigan, a designer of SOAR and a Department of Defense funded researcher, has agreed to consult on the methods of introducing human cognitive performance characteristics (including environment driven degrades) into SOAR functionality. The same adaptations will be crucial to modeling soldier interaction with user interfaces in battlefield environments.

Due to funding cuts, FY97 activities will be limited to that which is funded by the Air Force, with soldier modeling deferred to FY98.

January 1 - March 31, 1997 (FY97Q2)

Goals: Acquire SOAR system for Silicon Graphics computer and experiment with its architecture.

Progress: The SOAR architecture is currently up and running on a Silicon Graphics computer. Existing SOAR code is being analyzed and some theoretic work about how to represent more detailed cognitive function of the soldier is starting.

In addition, we are working on getting a personal computer version of SOAR operating that we received from the University of Michigan. We intend to use the personal computer version for all future SOAR modeling efforts. The Air Force has expressed interest in this project and has provided additional funding to augment the ARL resources applied to the project to accelerate the basic cognitive representation work planned.

April 1 - June 30, 1997 (FY97Q3)

Goals: Complete initial study of the effect of field of view on development of cognitive representations of virtual space.

Progress: Data collection for the study of the effects of field of view on cognitive representations of virtual space is underway. Subjects view 6' by 4' projected images through goggles that restrict their view. The subjects are tested on a variety of measures to assess the quality of their cognitive representation gained from viewing the virtual space depicted in the images. The four field of view restrictions range from a view approximately the size of a computer monitor to the size provided by most head-mounted displays, to an unrestricted view.

Analysis of the data collected thus far indicates that virtual spaces viewed under less restricted conditions produce better object recall, smaller errors in estimating object locations, less time to make object location estimations, higher confidence ratings in estimated object locations, and large decreases in search time. Once more data has been gathered, data analyses will be conducted to estimate the average direction of estimated location errors and average recall performance by object category or type.

Data collection for the study on effects of field of view will conclude soon, with studies examining distance estimation, rotation, and orientation abilities to follow.

July 1 - September 30, 1997 (FY97Q4)

Goals: Complete initial study of the effect of field of view on development of cognitive representations in virtual space (Incomplete in FY97Q3). Report on existing SOAR applications and how the SOAR architecture could be used for cognitive modeling of virtual reality interfaces. Complete integration of eyetracking system with virtual space situation.

Progress: We have nearly completed the initial study on the effect of field of view for the development of cognitive representations in virtual space. In the experiment, subjects are given 30 seconds to examine a set of objects, lying at different positions in a terrain in a large (4' by 6') display. Performance on several tests (memory for objects, memory for locations of objects, and search time) are being compared when the display is examined through head-mounted viewports of different sizes. This manipulation simulates examining a large space using computer displays of different sizes. Size of the viewport makes a large difference in performance on these tasks: smaller viewports lead to longer search times and poorer object locating performance. Further studies will investigate the degree to which less complete mental representations are formed with smaller viewports or limited peripheral vision.

We have set up the headmounted EyeLink eyetracker that was acquired for this project, and are using it in our research. With this instrument, we can now present color images and other displays, and collect eye movement data. Data transformation and reduction processes are also being improved.

The SOAR applications results have far exceeded the milestone, with funding provided by the Air Force to build a SOAR-based cognitive model. A model of an F-16 pilot performing an air-to-ground strike mission, using two different cockpit configurations, was completed this quarter. The task analysis for the baseline cockpit is shown in Figure 1.


Figure 1: Functional diagram of F-16 performance while using the baseline cockpit
configuration. This analysis served as a basis for the SOAR coding.

Cockpit enhancements included overhead imagery of the target area from the Real-Time Information in the Cockpit (RTIC) system and an improved forward looking infrared system. Configurations were previously studied using human-in-the-loop simulation, Extended Air Defense Simulation Model (EADSIM), and the Micro Saint modeling tool. The ARL-funded effort was kept "in the blind" about the previous efforts, other than which dependent measures were employed.

The SOAR model was programmed and an enhanced visual processing model was introduced. Images of target KC-10 and cluttering B-52 aircraft were shown to subjects, who made signal detection judgments as to whether the aircraft was a KC-10 target. The visual model was derived from having observers look at pixel images of the targets at various simulated ranges, as shown in Figure 2.


Figure 2: Different resolution pixel photographs of a KC-10 model used to determine
forward looking infra-red target identification ranges of the SOAR cognitive model.

A total of 160 simulated strikes were "flown" using the SOAR mission model. SOAR predicted significant performance improvements using the cockpit enhancements, as compared to the baseline system (Figure 3). Target acquisition distances were somewhat overestimated because the visual model was based on observers who were not burdened by other mission aspects.


Figure 3: Plot of comparative performance between baseline and RTIC enhanced
cockpit configurations attacking clusters of three and five aircraft, based on pilot-in-the-loop (PIL), SOAR, traditional task analysis (TTA), and
cognitive task analysis (CTA) models of the systems.

The results of the SOAR model predicted the superior performance of the enhanced cockpit, and compared favorably with the human-in-the-loop simulation findings. The lessons learned during the Air Force effort will serve as the basis for next year's effort to start cognitive modeling soldier interaction with digitization systems.

October 1 - December 31, 1997 (FY98Q1)

Goals: Complete a study to separate the effect of field of view size on formation of a cognitive representation from its effect on the ability to search for objects within that space (Incomplete in FY97Q3).

Progress: Further guidance from ARL has changed the focus of the Army modeling effort from the individual soldier to a command and control emphasis. The effort is being shifted to modeling command post staff, who have decidedly more cognitive-oriented activities. The chosen focus is on the maneuver officer (S3), who operates in a highly dynamic and cognitively challenging environment. The planned model will not try to comprehensively model the S3 function, but will choose specific subtasks to be done and try to implement anthropomorphic changes to the SOAR code to better represent human performance. SOAR now operates in a human fashion, but does not impose human limits, such as the limited capability of short-term memory, and it does not degrade under stress or fatigue.

Creating a SOAR-ArmySAF software seems feasible. There are still questions as to how much programming would be needed to get SOAR to direct the Army MODSAF entities, much like AIRSAF directs aircraft. The new merging of software (SOAR and ArmySAF) would not only give Fed Lab a new product, but the resulting S3 model would automatically be Distributed Interactive Simulation (DIS) compatible.

In this quarter, further data analyses have been completed on our study in which subjects examined a large terrain display with fields of view of different sizes (a 13" monitor, a 21" monitor, and a head-mounted display, and a full view of the terrain, as seen in Figure 4). The size of the field of view had no effect on the number of objects that the observer could recall after examining the display, but it had a sizable effect on the time required to find a specified object, and on the accuracy with which the observer could specify exactly where the object was in the display. Thus, field of view size appears to primarily affect the observer's sense of the locations of objects in the display, rather than memory for the objects themselves. Part of this difference is probably due to the amount of information available from peripheral vision during the test itself. Additionally, this difference may also be attributed to the formation of a less accurate mental representation of the space, caused by a restricted field of view in which less information is available about the positions of objects relative to one another. Thus, the study currently being run attempts to separate the effects that occur during learning vs. testing. Data are currently being collected for this study, which is not yet complete.


Figure 4: Various monitor sizes were found to have some effecton the
subjects' ability to locate specific objects.

January 1 - March 31, 1998 (FY98Q2)

Goals: Complete a study to separate the effect of field of view size on formation of a cognitive representation from its effect on the ability to search for objects within that space (Q397). Complete initial report on effect of field of view size on observer's ability to form a cognitive representation of a virtual space. Explore design representation issues associated with modeling human cognitive activities in a virtual space.

Progress: In the new, digitized Army, personnel often need to view a large terrain through a small computer monitor. Only part of the area can be seen at any one time. We are conducting a series of studies to investigate the effect that the size of the viewport (the computer monitor) through which a person examines the terrain has on his/her ability to construct a mental representation of the information in the region. Last quarter, we reported a study showing people's performance on several tasks carried out with viewports of different sizes. In that study, however, the viewport size was the same during the examination and test. Therefore, we conducted a follow-up study that overcomes this limitation.

Similar to the prior study, subjects examined a large terrain display with fields of view of different sizes, simulating the view one would have with a 13" monitor, and a head-mounted display, in comparison to a full view of the terrain. This time, however, subjects were asked to alternatively wear or remove the view-restricting goggles before or after their initial exposure to the terrain. This allowed us to tease apart the effects of a restricted field of view on learning a display versus later performing evaluation tasks with the display.

Initial results indicated that the most restrictive view during test, simulating the 13" monitor, greatly impairs performance in indicating the prior locations of objects and in searching for the objects. This impairment appears to be equally strong whether the terrain was learned using the highly restrictive view or not. A larger display, simulating the field of view of a head-mounted display, produced mean times, accuracies, and confidence ratings that were only slightly worse than the full-view, control condition. In terms of learning a terrain and interacting with it, the head-mounted display possibly will present itself as a reasonable compromise between the benefit of computerized displays and the visually constraining problem they present. Additional analyses are under way to evaluate this possibility.

In general, having the field of view restricted at the time one is trying to carry out a task using previously-gained knowledge appears to be more detrimental than having the field of view restricted during the learning period. This indicates, first, peripheral vision plays an important role in these tasks; and second, people are either adept at developing a coherent mental representation of a large space when only part is visible any one time, or under unconstrained viewing conditions, do not rely much on their mental representations of the terrain.


Figure 5: Detection of Multiresolutional Displays During Selected Fixations as a Function of Size of the High-Resolution Area and Level of Peripheral Degradation (separate lines represent high-resolution area in degrees radius from center of fixation).

In the final task, we developed levels of design abstraction to map human cognitive states with decision-making performance based on information displayed in a virtual space. The levels of abstraction provide a theory for design of information display that captures the attention of the decision-maker. The draft report on this task is in progress.

April 1 - June 30, 1998 (FY98Q3)

Goals: Complete development of software and procedures for tracking eye movements with Head-mounted display.

Progress: We are still awaiting delivery of the ISCAN eyetracker mounted in a head-mounted display (HMD) which we have provided from NSF Infrastructure grant funds. We have prepared for its arrival by installing the magnetic head tracking equipment that will be used with the eyetracker. This is only the second time that ISCAN has mounted their equipment in an HMD (the first is currently be used by University of Rochester) and they are employing a modified design in ours. The company says that it should be finished shortly, and we can then install and develop the necessary software for planned research.

Our previous work has documented the effects of restricting the field of view (similar to using computer monitors of different sizes) when examining a large, virtual terrain space. We have now completed a study designed to determine the extent to which performance decrements that occur with restricted field of view are the result of limitations in forming a mental representation of the space, or of limitations in carrying out tasks such as visual search and specifying the location of objects. The results are very clear in indicating that the limitation is primarily in the latter processes. That is, when users have no access to information from peripheral vision, it appears that they are severely restricted in their ability to use information from their mental representation alone to compensate in carrying out visual tasks. Results of our first two studies were reported at the Midwestern Psychological Association Meetings.

We are now conducting a third study in this series that is examining the role of the mental representation in quickly finding objects and indicating the locations where objects previously resided in the display (an indication of the quality of their mental representation of a previously viewed domain).


Figure 6: Mean Distance Error in Pixels by Size and Phase of Viewing Restriction

July 1 - September 30, 1998 (FY98Q4)

Goals: Explore design representation issues associated with modeling human cognitive activities in a virtual space.(Q298). Complete development of software and procedures for tracking eye movements with Head-mounted display.(Q398). Demonstrate a simple, SOAR-based, individual soldier model. Conduct statistical analysis on experimental data obtained in FY98Q2 experiment. Multidimensional discriminate analysis models are used to quantify the design representation issues.

Progress: We are still awaiting delivery of the ISCAN eyetracker mounted in a head-mounted display (HMD) which we have provided from NSF Infrastructure grant funds. We will be unable to complete software development and procedures until it arrives. Despite not having received the HMD ISCAN eyetracker, we have made progress with the non-HMD ISCAN eyetracker, and have developed standardized procedures for its use and are currently testing the reliability and precision that ISCAN eyetrackers are capable of. The results of the reliability testing will be included in a paper titled "Use of Eye Movements in Human-Computer Interaction," an abstract of which has been submitted for inclusion in the FedLab 1999 Symposium Proceedings. This paper, which is being written, will incorporate a historical perspective on the use of eye movements in human-computer interaction, the capabilities of the ISCAN system, and a discussion of how the ISCAN eyetracker can be used to improve on prior attempts to incorporate eye movements in the dialog between humans and computers.

The activity on demonstrating a simple, SOAR-based, individual soldier model was severely curtailed by the funding profile problems. Some planning work for the proposed model, now focusing on the S-3 or Maneuver Officer rather than the individual soldier was done, but no code was developed during the quarter. This will be shifted into the first quarter of next FY.

Because of the complexity of the statistical analysis, final multidimensional discriminate analysis models have not been fully constructed. However the initial finding reported last quarter - that a restricted view primarily interferes with the task of finding objects rather than remembering their precise location - has remained unchanged. As shown in Figure 7, the time required to search for an object was greatly effected by a smaller view, despite an unrestricted learning exposure.


Figure 7: Change in Mean Search Time by Viewing Condition

The third study of the series examining the effect of restricted field of view on forming a mental representation of a large-scale space is on-going. Data is being collected now.

October 1 - December 31, 1998 (FY99Q1)

Goals:Complete development of software and procedures for tracking eye movements with head-mounted display. (Reported as incomplete FY98Q3). Demonstrate a simple, SOAR-based, individual soldier model. (Reported as incomplete FY98Q4). Conduct statistical analysis on experimental data obtained in FY98Q2 experiment. Multidimensional discriminant analysis models are used to quantify the design representation issues. (Reported as incomplete FY98Q4). Further develop the S3 model and integrate it with MODSAF simulation software. Complete study using high-resolution Purkinje Eyetracker that examines the accuracy with which binocular eye movement date indicates the depth at which a person is attending in a 3D display.

Progress: The milestone "Explore design representation issues associated with modeling human cognitive activities in a virtual space", which was incomplete in the FY98Q2 report, has been removed since it was to be done by NCA&T, who no longer has funding under this module.

Tracking eye movements with an HMD has not yet been completed because we have not yet received the head-mounted display from ISCAN; rapid progress will be made as soon as the equipment arrives.

Sytronics has not yet been able to complete the demonstration of a simple, SOAR-based, individual soldier model. They have chosen to concentrate on the Symposium papers and demonstrations to insure their success.

The results of statistical analysis on experimental data obtained in experiments are currently being pulled together in a paper summarizing the modeling research.

The further development of the S-3 model and integration of it with MODSAF simulation software has also not been completed. As noted, Sytronics has chosen to concentrate on the Symposium. After completion of the Symposium, Sytronics will review the state of the research effort, funding situation, and consult with ARL to determine a "get well" schedule or to give up this line of research.

The computer programs to collect binocular data from the two Purkinje eye trackers are under development at UIUC. Some progress has been made, but no data has been collected yet.

January 1 - March 31, 1999 (FY99Q2)

Goals: Complete study using high-resolution Purkinje Eyetracker that examines the accuracy with which binocular eye movement date indicates the depth at which a person is attending in a 3D display (Reported as incomplete in FY99Q1). Develop operational scenario based on Staff Group Training (SGT) examples in MODSAF to evaluate the S-3 model. Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control.

Progress: The apparatus for the study using the Purkinje Eyetracker is currently being built. The milestone is not yet complete.

SYTRONICS has found it impossible to execute the cognitive model of S-3 as originally planned. The principal reason for this has been the higher than expected cost of doing the Usability and Validation work, especially the hardware necessary to collect that data. Supplemental funding from an Air Force source originally assisted this work. It continued as a very low level of activity over the next year with some small progress. When the cognitive modeling was not selected for supplemental funding under the cognitive augment, no additional funds were forthcoming. In addition, a much greater effort in cognitive modeling was discovered elsewhere in the Army. This significantly funded project obviously overlapped and made our Fed Lab effort relatively insignificant. For these reasons SYTRONICS has decided to end the effort and will write a termination report during this quarter. Funding allocated for Cognitive modeling will be applied to the Usability and Validation effort which is of much greater importance to the overall Fed Lab effort.

The equipment required to perform the study in milestone three has not arrived yet. This milestone is incomplete.

April 1 - June 30, 1999 (FY99Q3)

Goals: Complete study using high-resolution Purkinje Eyetracker that examines the accuracy with which binocular eye movement date indicates the depth at which a person is attending in a 3D display (reported as incomplete in FY99Q1). Develop operational scenario based on Staff Group Training (SGT) examples in MODSAF to evaluate the S-3 model (reported as incomplete in FY99Q2). Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. Run the evolving S-3 model inside the MODSAF simulation to test model performance. Continue FY99Q2 #3 task with addition of display objects to arouse anxiety and emotion.

Progress: The apparatus for the study using the Purkinje Eyetracker is currently being built. The milestone is not yet complete.

Sytronics does not have the funding to complete an operational scenario and will not be doing it.

The equipment required to perform the study in milestone three has not arrived yet. This milestone is incomplete.

Sytronics does not have the funding to complete a study comparing display methods and will not be doing it.

The equipment required to perform the study in milestone five has not arrived yet. This milestone is incomplete.It may be possible to run this series of studies in the Integrated Support Lab once an eye-tracker has been integrated with the ImmersaDesk. Work on integrating an eye tracker into the ImmersaDesk should be completed this summer.

July 1 - September 30, 1999 (FY99Q4)

Goals: Complete study using high-resolution Purkinje Eyetracker that examines the accuracy with which binocular eye movement date indicates the depth at which a person is attending in a 3D display. (Reported as incomplete in FY99Q1.) Develop operational scenario based on Staff Group Training (SGT) examples in MODSAF to evaluate the S-3 model. (Reported as incomplete in FY99Q2.) Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. (Reported as incomplete in FY99Q2.) Run the evolving S-3 model inside the MODSAF simulation to test model performance. (Reported as incomplete in FY99Q3.) Continue FY99Q2 #3 task with addition of display objects to arouse anxiety and emotion. (Reported as incomplete in FY99Q3.) Conduct human vs. S-3 interactive MODSAF simulation and perform a simple Turing test of S-3 model effectiveness.

Progress: Eyetracking is very useful in a virtual environment in indicating the observer's direction of gaze, and thus the objects in the environment to which he is attending at any given moment. However, in a virtual 3D environment it is quite possible to have different objects that lie roughly in the same direction from the observer, but at different depths. Data from normal monocular eyetracking can only indicate that visual attention is being directed to some member of this cluster of objects, but cannot indicate whether the selected object is one that is near or far.

One method for providing information concerning the depth plane being attended is to monitor the position of both eyes. Since the visual system adjusts eye position to place the region of interest at corresponding positions on the two retina, the angle of gaze of the two eyes changes systematically with changes in the depth plane being attended. When attending a far distant object, the line of gaze of the two eyes is nearly parallel; when attending a very close object, the eyes rotate toward each other. Thus, a signal based on the difference between the horizontal component of the eye position signal of the two eyes can indicate the relative positions of the two eyes, which is related to the depth plane being attended.

Since the eyetracker's signal is linearly related to the angular position of the eyes, it is possible to obtain a difference signal that indicates the relative direction of gaze of the two eyes, without having to calculate the angular position itself. A mathematical model has been developed with three free parameters that represent this relationship. Once a few samples of eye position have been taken as the observer attends to objects at known distances, these parameters can be estimated with nonlinear regression methods, thus fitting the model both to the person and to the current characteristics of the eyetracking signal. Data have been collected using our high-resolution binocular Purkinje Eyetracking system, and the model fits it well.

The next step is to develop a method of estimating the reliability of a given eyetracker's binocular difference signal, and with that to be able to indicate the degree of resolution in depth that can be expected from the eyetracking system.

We have just recently received (after a wait of over a year) our head-mounted display having eyetracking capability, so that this study can now be conducted.

As noted above, the HMD with eyetracking capability just arrived, therefore the original study must be completed before display objects can be added.

October 1 - December 31, 1999 (FY00Q1)

Goals: Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. (Reported as incomplete in FY99Q2.) Implement monocular eyetracking within the BattleView environment and explore the potential benefits of using binocular eyetracking.

Progress: The development of an operational scenario based on Staff Group Training will not be completed by Sytronics, as the research has been cancelled.

The study comparing the ability to develop a cognitive model of virtual space in a limited view HMD using head movements for display change vs. using a joystick for display change has been delayed. This delay was because our head-mounted display was at the factory having eyetracking capability added. The equipment has now been returned to us and we can conduct the study. Currently we anticipate completion in FY00Q3.

The S-3 model will not be run in the MODSAF simulation since Sytronics has cancelled this research with permission of ARL.

NCAT was not funded in FY99 to finish the research outlined in FY99Q2, task 3, therefore the milestone should be removed.

Human versus S-3 interactive MODSAF simulation will not be done because Sytronics and ARL have cancelled this research.

Monocular eyetracking has been implemented within the ImmersaDesk environment, though not specifically with BattleView. It is unclear at this time if effort should be put into integration with BattleView. A study has been completed of the ability of a high-accuracy binocular eyetracker (Dual-Purkinje eyetracker) to determine the depth plane of attention (the distance an attended object lies from the subject) from the angular position of the two eyes. There is a strong relationship between the difference in angular position of the two eyes and the depth plan of attention, as seen Figure 8. A mathematical function has been fit to the relationship, and the standard deviation of the residuals of our observations are being calculated. This information is making it possible to provide the information necessary to estimate the accuracy with which it is possible to distinguish different depth planes, as a basis for determining, for a given application, whether binocular data is likely to be of value. A paper has been written to be presented at the ARL Federated Laboratories' Symposium.

Figure 8: Relationship between the difference in angular position
to the eyes and the depth plane of attention.

In other research, a demonstration for the ARL FedLab 2000 Symposium has been planned and development is underway. The plan is for the demo to include the use of eyetracking in connection with language and gesture. This demonstration is being developed in such a way that it can be used as a research environment to study the effectiveness of different modes of human-computer interaction, including eye movement recording. At this time it is unclear if we will be able to present this demonstration, due to limited funds.

January 1 - March 31, 2000 (FY00Q2)

Goals: Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. (Reported as incomplete in FY99Q2.) Implement monocular eyetracking within the BattleView environment and explore the potential benefits of using binocular eyetracking. Conduct experiment that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore.

Progress: The study comparing the ability to develop a cognitive model of a virtual space is still in progress. A 3-D Head-mounted display system has been set up, and a basic program written that allows exploration of a large space. Software is being written to allow definition of the displays to be used in the research, to allow various modes of navigation in that space, and to collect data as the subject is exploring the space.

A study was reported at the ARL Federated Laboratory Symposium, March, 2000, that examined the ability of an accurate binocular eyetracking system to discriminate the distance at which a viewer is attending. Discrimination is quite good at close distances and becomes less accurate at greater distances. For objects with virtual distances of about 6 feet or less, as is often found with large-format displays such as the Immersa-desk, use of binocular eyetracking appears to have considerable potential. For objects at greater distances, discrimination power is less, so that binocular eyetracking will be useful only when distinguishing between widely-separated depth planes. Monocular eyetracking has been implemented within a large-format (immersa-desk) environment, though not specifically with BattleView.

The experiment to examine the development over time of a mental representation has been delayed.

In other research, a review of literature related to the development of memory representations of large spaces over time is being conducted.

April 1 - June 30, 2000 (FY00Q3)

Goals: Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. (Reported as incomplete in FY99Q2-FY00Q2.) Conduct experiment that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) (Reported as incomplete FY00Q2.) Complete analysis of study of user's exploration of virtual space in head-mounted displays by various control mechanisms (joystick, head-tracking, eye-tracking, and combinations of the above). (UIUC) Write report regarding accuracy of identifying the depth plane attended from binocular eye position data. (UIUC)

Progress: Research in this module has been slowed due to multiple factors: the RA assigned to this project switched to another project, the head of our lab was on sabbatical in China, and it was necessary to find a programmer with virtual environment experience.

In general, thus far we have accomplished the following:

  1. developed the hardware and software necessary to carry out experiments on the development of cognitive models of virtual space;
  2. carried out a review of the literature on the development of cognitive models of virtual space;
  3. designed an initial study to look at the effects on the development of cognitive models of virtual space of examined with a limited field-of-view in a head-mounted display with display change in response to head movements vs. change in response to joystick control;
  4. performed a pilot study of the above experiment in order to test out the software and determine the appropriateness of the virtual environments we have developed for testing our hypotheses.

With regard to the study comparing the ability to develop a cognitive model of a virtual space, we have made headway with the development of a unique HMD system. We are using a VR4 head mounted display (HMD) with an ISCAN eyetracker mounted inside. This is connected with three PCs running on the Linux system. There is one main controlling PC, and two client PCs that are used to convey different images to each eye of the HMD in order to form stereoscopic images. The system is described in more detail below.

In order to carry out experiments exploring the development of spatial representations of virtual environments, we needed a dedicated machine capable of running a stereo HMD with high graphics performance. Since this requires two views of the scene to be rendered, one for the left eye and one for the right eye, we could not rely on a combination of commodity hardware and software and still meet our budgetary constraints. While it is possible to put two graphics cards in a single PC, each card would need to be fed by its own processor to maintain performance. Since we also need to collect data, a third processor would be required. Hence, a single PC configuration would require a quad CPU motherboard, an expensive proposition coming in at a minimum $6000. Furthermore, there would have been nontrivial software to write to make this work. Another off-the-shelf option would have been to use an SGI Onyx2. In this case we could have used existing software infrastructure, like the CAVE libraries, and gotten good performance, but the cost, at $200,000, was completely out of the question. In fact, $6000 for a quad CPU PC was already too much.

Hence, cost constraints forced us to build a unique system. As opposed to using a single computer, we use a small cluster, networked via ethernet. There are 3 computers, one for data collection and coordination, one for displaying the left eye image, and one for displaying the right eye image. We have used off-the-shelf hardware and produced a system for a total cost of $3000. The challenge was then producing software to make this system work. A pleasant side-effect of building this type of system is scalability. By adding more display computers, we could reuse the software infrastructure to, for instance, view graphics on a display wall at extremely high resolutions.

The software is composed of several distinct components. There is a piece that takes care of processing and describing 3D scenes, in effect providing a kind of graphics language suitable for broadcasting over network connections. There is a server program that acts as a transmitter for graphics information, and there is a client program that acts as a receiver for that information, displaying the results. Synchronization components, operating over the network, ensure that the graphics computers coordinate their displayed images. In order to display images for the current VR experiments, the experimental program sends drawing commands, in the graphics language, to the geometry server, which relays them to the connected display clients. Everything is robust. Individual components can fail and be restarted, in any order, without crashing the whole system. Everything is also dynamically reconfigurable, allowing displays to be added or removed even while a scene is being displayed.

We are currently carrying out a pilot study, and so far have collected data from 3 subjects. In this study, subjects wear a HMD and are asked to look around four virtual rooms. At any given time, the view is 40º horizontally, and 35º vertically. Thus, to see the complete room, it is necessary to combine information across views. This requires a mental representation of the room to be held in memory. There are 4 conditions derived from a 2x2 design in the experiment: head movement vs. no head movement, and joystick relative mode vs. absolute mode. We assume that the joystick absolute mode will facilitate the development of an egocentric spatial representation whereas the relative mode may inhibit it, at least until the viewer becomes accustomed to it. Similarly, we expect that the head movement condition should facilitate developing an egocentric spatial representation, whereas the no-head movement condition should slightly inhibit it. In the experiment, there are two dependent measures. The primary measure is the time taken to find various objects in the rooms (search reaction time). There are more than 10 objects in each room and the room layouts have been carefully arranged. We predict that the conditions that facilitate the development of an egocentric spatial representation (head movement and joystick absolute mode) should lead to shorter search times than those conditions that inhibit development of such a spatial representation. The second dependent measure is the subjects' navigation route through each room, which is recorded at a rate of 50 Hz. It is believed that the routes taken may differ between conditions of the experiment, though our analysis of this variable is of a more exploratory nature, and we do not have any a priori predictions as to these differences.

At present the study is still in an exploratory phase in which we are testing how robust the VR software is and determining how the how the layout of the virtual rooms affects subject performance on the search task and the routes taken in exploring the rooms. In order to analyze the route-taking data, we have developed an algorithm for replaying the route taken by a subject through a room. Once we have refined the design of the rooms, we will collect more data and begin the formal study in which we can test our hypotheses.

Work toward the experiment examining mental representation in a large, virtual environment is also progressing. We have been building a HMD software package including several programs necessary to run our experiments. One program is a "RoomEditor" that allows you to construct experimental environments according to ones specifications, including building different virtual rooms with different numbers of walls, placing 3D objects in various locations, manipulating wall height and width, etc. A Viewer program is then able to display the environment. A second program runs the experiments, allowing one to load an environment, run through it and collect data. While the subject is engaged in a particular task, the computer logs his position in the virtual environment at a rate of 50 Hz (20 times a second). The data is then available for off-line analysis.

Additionally, our new RA has been working on a literature review on navigation in very large virtual environments. It is near completion and should be finished by the end of this month. Based on the literature review, we have developed a design for the first experiment. Our new RA has also been working together with our programmer to develop a VR environment in which to carry out our experiments as detailed above.

The study of user's exploration of virtual space using various control mechanism cannot be completed until the first study discussed is done. We are at present adding headtracking and wand input devices to our system. Both input devices should be integrated with the system within the next two weeks. Once we have fully integrated the headtracker and wand within the system and can analyze the data from these devices, we will integrate the eyetracker.

A report on the accuracy of identifying the depth of plane attended from binocular eye position data was presented at the Advanced and Interactive Displays Federated Laboratory meeting in the form of a poster entitled "How Well Can Binocular Eyetracking Indicate the Depth Plane on which Attention is Focused?". Authors were George McConkie, Lester Loschky, and Gary Wolverton of UIUC.

July 1 - September 30, 2000 (FY00Q4)

Goals: Complete a study comparing the ability to develop a cognitive model of a virtual space as examined with a limited field-of-view, head-mounted display with display change in response to head movements vs. change in response to joystick control. (Reported as incomplete in FY99Q2-FY00Q3) Conduct experiment that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) (Reported as incomplete FY00Q2 - FY00Q3) Complete analysis of study of user's exploration of virtual space in head-mounted displays by various control mechanisms (joystick, head-tracking, eye-tracking, and combinations of the above) (UIUC) (Reported as incomplete in FY00Q3) Begin writing a chapter for the Federated Laboratory Human Factors Handbook. (UIUC) Conduct study investigating people's use of their mental representation and peripheral vision in seeking information in a complex display (UIUC)

Progress: A more complete study on the ability to develop a cognitive model of a virtual space has been conducted on this issue and is described under the head mounted display study discussed below.

The study on the development over time of a mental representation for a large, virtual environment has not yet been conducted.

A study has been conducted to investigate the effects of different control methods on people's ability to form a mental representation of a large virtual space when viewed through a restricted field of view. In all conditions, the subject wore a binocular, 3D, head-mounted display (VR-4) which has about 60 deg horizontal field of view and 40 deg vertically (Figure 9). The virtual space consisted of a room with 4, 6 or 8 walls. All the walls in a given room were of equal width, and formed equal angles with adjacent walls. In each room there were 10 pictures of common objects hung on the walls at random locations. The subjects viewed the room from a fixed position at the center of the room. By using their control device, the subjects were able to look around the room, moving the center of the viewport over a 180 deg field horizontally, which actually allowed them to see about 240 deg of the room. Movement in the vertical direction was about 90 deg. It was necessary to move the viewport both horizontally and vertically to see the available area of the room.

Figure 9: Sam Xiang runs an experiment on mental models and navigation in virtual environments.

Three control devices were used to control this movement: joystick, wand, and head-tracking. The wand is a hand-held pointing device that is tracked magnetically. Each control device was used in two different modes: absolute and relative. In absolute mode, each position of the pointing device was linked to a single viewport direction. Thus, looking in a certain area required that the control device be placed in a certain position. For example, in order to see the left part of the room, subjects using head-tracking had to turn their heads leftward; subjects using a joystick had to tilt the stick to the left. In relative mode, positions of the control device simply caused the room to rotate in the directions indicated. No matter what the current viewpoint direction was, tilting the joystick or moving the head to the left, caused the room to rotate to the right so that more of the leftward area of the room came into view. Thus, with three devices and two control modes for each, there were 6 control conditions. Twenty-four college students participated in the study, twelve using absolute control methods, twelve using relative control methods, and each examining two rooms using each control device. Room complexity (number of walls) was counterbalanced across these conditions.

Subjects were given 60 seconds to view each room while attempting to remember the locations of the various pictured objects. Immediately after viewing the room two tests were given. First, subjects were shown four objects, one at a time, and were asked to point to the location of that object, assuming that they were sitting in the middle of the room. Data were scored as deviations, measured in degrees, between the center of the location of the indicated object, and the position to which the wand was pointed. Mean deviation for the subjects was 53 degrees. Second, the subjects were shown the room but with each object replaced by a number on the wall. They were then shown four different objects, one at a time, and for each were asked to indicate the number of the position of that object. These data were simply scored as correct or incurred. Mean accuracy for the subjects was 76% where chance performance would be 10%. On both tests, subjects also indicated their confidence in the accuracy of their response, by verbally indicating a number between 1 and 5 (1=low confidence; 5=high confidence). No feedback was given concerning their accuracy in either test. The first test was designed to test for an ego-centric representation; the degree to which the person could remember where objects were located in relation to themselves. The second test could be accomplished using an external representation, in which the room was an externally-viewed object.

It was predicted that representations of virtual space would be most accurate when using control devices in absolute mode. This mode gives subjects a way of mapping directly from their own position (head position or hand position) to room locations. The results found this difference with both tests, but neither was statistically significant when tested with ANOVA: the first test showed a mean pointing error of 50% for the absolute condition and 56% for the relative condition (F(1,22 df)=1.50, p=.23), and the second test showed 78% accuracy for the absolute condition and 74% for the relative (F(1,22 df)=.50).

It was predicted that with absolute position control methods, performance on the first test would be more accurate with head and wand control than with joystick control, since head and wand require the subject to reference the absolute spatial location of the region to be viewed, either through pointing the wand to it or turning the head to it, while the mapping between joystick position and actual spatial location is much less direct. Thus, we suspected that with the joystick, subjects might compress the space, leading to greater pointing error on the first test, while still being quite accurate on the second. This prediction was not supported; there was no significant difference in pointing accuracy among the three control devices in absolute mode.

Finally, it was predicted that accuracy in both tests in absolute mode would be greatest for head-control, simply because it mimics our normal experience, turning our head to look at objects in the world. It was also predicted that accuracy in both tests in relative mode would be greatest for joystick control, since people have more experience using joystick and mouse with relative control than using pointing or head-tracking devices. In fact, people typically have no experience using head movements in this manner. However, these predictions were also not supported by the data.

The results seem quite surprising. While we need to conduct further data analyses to examine the power of the tests (to check to see whether we should collect data from more subjects), and to make sure that there are not differences that we have overlooked, it appears that the control device and mode may have much less impact on the user's ability to form a spatial representation of a virtual space than we had expected.

The handbook is being written, and we have submitted our input.

The study investigating people's use of their mental representation and peripheral vision in seeking information is currently being planned.

In other research, we have developed a flexible 3-D, head-mounted display environment that can be used for conducting research on factors that affect how people form mental representations of virtual spaces. Various input devices can be used for navigation, and a record is kept of the user's viewpoint and direction, and actions taken (button-presses).

October 1 - December 31, 2000 (FY01Q1)

Goals: Conduct experiment that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) (Reported as incomplete FY00Q2 - FY00Q4) Conduct study investigating people's use of their mental representation and peripheral vision in seeking information in a complex display (UIUC) (Reported as incomplete in FY00Q4) Begin to prepare papers for the ARL Federated lab symposium. (UIUC) Analyze data from study that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) Finish writing a chapter for the Federated Laboratory Human Factors Handbook. (UIUC)

Progress: The mental representation study has been delayed because of the need to conduct follow-up research related to the study of the effect of control devices and modes on the development of a mental representation (see discussion below). Until we understand those surprising results better, we will not be in a position to extend the research to this issue.

Do to the delay in conducting the study listed in Milestone 1, it is not likely that we will be able to carry out the mental representation and peripheral vision study before the end of the project.

Papers have been completed for the Federated Laboratory Symposium, and presentations are now being prepared.

The analysis of data from the mental representation study is delayed because the study has not yet been conducted (see explanation above).

The Human Factors chapter has been completed in conjunction with Chris Wickens.

Other Research Progress

In other research, we have conducted pilot studies and continued analysis of data related to our study of the effect of navigation control devices and control modes (absolute vs. relative) in building a mental representation of a large virtual environment when only a small part can be seen at once (as in a computer monitor). That study yielded a set of results that we find surprising: while we obtained the predicted result that absolute control methods are more effective than relative control methods, we found no effect for the type of control device used: joystick, wand or head-tracking. We had predicted that these would differ, with head-tracking producing the most accurate mental representation and joystick the least. This is a very important result if it holds up under further investigation. First, it attests to the human's ability to construct mental spatial representations from piece-meal input and raises questions about how this is accomplished. Second, it suggests that head-mounted displays and head tracking may not be needed in some situations where they might be thought to be critical.

We are pursuing two directions to better understand the basis for our finding. First, a second study is underway in which the design and testing methods are changed, in order to make sure that the earlier results are not an artifact result from characteristics of the earlier study (subjects serving in different conditions, using the wand for all trials for one of the tests). Second, we are carrying out a theoretical analysis of the types of information that different navigation control methods make available to the user (visual, proprioceptive), and the manner in which a mental spatial representation is developed over time. We also want to understanding the conditions under which different navigation control devices and modes would be expected to affect the quality of the mental spatial representation formed.

In addition, we are continuing to develop the HMD environment to facilitate the type of research being conducted. Most recently we have been working on a scheme for representing the data in a way that makes analysis more straightforward. The current data structure is very complex, requiring coordination from multiple files (viewport location, command device activity, location of objects in the environment, etc.). We are trying to create a data structure that will make the analysis of data from psychological studies easier and more straightforward. This is particularly needed in advance of our implementing eyetracking capability within the HMD environment.

We are finding the HMD environment to be a very interesting one for studying the development of a mental spatial representation.

January 1 - March 31, 2001 (FY01Q2)

Goals: Conduct experiment that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore (UIUC) (Reported as incomplete FY00Q2 - FY01Q1). Analyze data from study that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore (UIUC) (Reported as incomplete FY01Q1). Finish analyzing data from study that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore (UIUC). Finish writing paper(s) for the Federated Laboratory Symposium (UIUC).

Progress: It has been necessary to do preparatory work for the study on mental representations:

1. Software modification

Several improvements have been made to our HMD VR software system in preparation for a follow-up study dealing with the effect of navigation device on forming a mental representation of a virtual space. A "shooting" task function was added as a means of giving information about the quality of the observer's mental representation from a non-ego-centric perspective. After observers look around a virtual room, which has images of a number of objects images on the walls, subjects are presented with an identical room but without the objects images. Their task is then to use their current navigational device to move the viewport to the location where an indicated object had been located, placing a cursor at the object's location, and to press a button (or 'shoot' that location). This test does not require an ego-centric representation of the virtual space. It is now added to the search test, in which observers see frames containing numbers at the locations previously occupied by the objects and indicate the frame number at which a specified object had been located, and the pointing task, in which observers imagine themselves sitting in the center of the room and point to locations around them where the indicated object would have been located (a pure ego-centric test).

The software has also been modified to keep the image in the viewport vertical with respect to the viewport itself, even if the navigation device (head, wand) is rotated. In the first study the rotation of the image was distracting to the subjects. This also makes all navigation devices more equivalent, since the joystick, by its nature, is not able to rotate the images in a manner that was possible with head or wand control.

2. Test wand calibration and begin second navigation study: As indicated in our last report, the results of our first navigation device study did not show some of the differences we expected from using the devices tested. One possible explanation is that we used the ego-centric pointing test following the viewing of each room, which may be encouraging all subjects, regardless of navigation device used, to for ego-centric representations of the virtual environments used. A second study is now underway in which subjects are tested with the pointing test only on their final two rooms. In addition, adding the shooting test gives a finer-grained non-ego-centric recall measure to supplement the recognition measure of the search test. This study will also have more subjects in order to provide greater stability in the data, and is using a more effective pretest of spatial cognition ability and a questionnaire about individual differences in computer and computer game playing experience.

Finally, we have conducted further testing of the calibration of the wand, as used as a pointing device in the pointing test. This is intended to make the pointing data more accurate.

3. Data matrix format

Since it is clear that the line of work that we have initiated involving effects of navigation devices, viewport sizes and other variables on perception of head-mounted displays of virtual environments, we have worked out a more convenient data structure for this research. This involves four concepts: the viewer location in the virtual space, the virtual space as projected onto a sphere centered at the viewer's eyes, a labeling of the locations of objects of interest on this sphere, and a viewport as a rectangular region of the sphere that can be seen at any given time. For simplicity in our current studies we are assuming that the viewer's location is constant at the center of the room, and that the projected locations of objects in the room are predefined and stored in a table. The navigation device controls the viewport direction in the virtual space, indicating which part of the projected virtual environment image can be seen within the viewport. Knowing the projected locations of objects and the viewport location makes it possible to readily determine, at any moment, which objects are within the viewport region and can be seen (as well as the proportion of the object and the part that can be seen). Finally, since the head position is simultaneously recorded, this indicates where the viewport is located relative to the viewer, since in a HMD, moving the head changes the ego-centric location of the viewport, and, hence, changes where in the ego-centric space the seen objects are located.

The information necessary to produce this data structure is recorded at some sample rate (for example, 20 times a second).

This new data structure accomplishes several needed purposes.

  1. It allows us to determine what objects are within the viewport at any given time, indicating, over time, what objects have and have not actually appeared visually to the viewer. In addition, it gives us the basis to identify which objects have been within the viewport simultaneously, giving the possibility of visualizing their spatial relation.

  2. It allows us to determine where in the viewer's ego-centric space the seen objects appeared. This will allow us to examine whether people make head movements in mapping from non-ego-centric space to ego-centric space.

  3. It provides a basis for adding eye movement recording within the HMD environment, with the possibility of indicating toward which object(s) the gaze is being directed.

  4. In general, it provides a data structure that makes it much easier to carry out necessary statistical analyses within the 3-D virtual environment. We are not aware of a similar data structure for this type of research.

We believe that it will be a significant contribution to research in this area.

The data analysis cannot be completed until the previous milestone is complete.

A paper was presented at the Fed Lab Symposium, the citation is listed under Publications.

April 1 - June 30, 2001 (FY01Q3)

Goals: Analyze data from study that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) (Reported as incomplete FY01Q1-FY01Q2) Finish analyzing data from study that examines the development over time of a mental representation for information in a large, virtual environment requiring observer movement to explore. (UIUC) ) (Reported as incomplete FY01Q2) Finishing analyzing study investigating people's use of mental representation and peripheral vision in seeking information in a complex display (UIUC) Finish writing articles on studies conducted. (UIUC)

Progress: Initial analysis complete: Results confirm the advantage of control devices (joystick, wand or head-position) in absolute mode, as compared to relative mode, in observers' development of accurate mental representations of the locations of objects in a 3D space that must be explored by observer movement. Use of these devices in absolute mode also leads to higher observer confidence in their indications of the locations of objects during testing.

Analysis of critical data in the pointing task is still underway. Methods for handling this type of data have been developed and required software modifications completed.

Article being written.