Successful VR systems depend on the continuous generation of detailed 3D models which is a very labor intensive process. As an entertainment medium, the appeal of a virtual environment (VE) is limited primarily by the quality/complexity and creation rate of new models. Automatic extraction of 3D models from video will provide a cost effective means to continually populate VEs with new content. Video analysis technology also enables the use of MPEG-4 FBA coding for very low bitrate visual communication within VEs. The VR industry will be enhanced by the low bitrate visual communication tools provided by MPEG-4. In particular, virtual human models will be generated automatically from video and driven by real speakers in "3D chat rooms." This new form of compelling entertainment content will be viewable by anyone with an MPEG-4 enabled PC at sub-28.8 kilobit/second rates.
Initially, commercial enterprises (networked game venders, call centers,arcades) will acquire computer vision systems for MPEG-4 encoding of talking heads. Eventually, entire VEs will be generated automatically on consumer premises for use in shared VEs. Some applications will require high fidelity modeling of humans and environments while many applications will only require high entertainment appeal. Concerns with privacy, anonymity, and personal security may dictate fictitious human representations especially when children are involved.
The use of MPEG-4 FBA coding for automatic remote control of synthetic face models at very low bitrates (~ 2 kilobits/second) from video has been demonstrated. The ability to automatically and accurately model a human face or body has yet to be demonstrated and is an active area of research. MPEG-4 provides a comprehensive set of object and animation descriptors and compression tools which will eventually enable communication of shared high fidelity virtual environments.