Phone: (914) 784-6066 (O)
Fax: (914) 784-7455
1. Context Sensitive Visual Analytics
2. Multimodal, Intelligent Human Computer Interaction
3. Machine Learning and Pattern Recognition
4. Image/Video Processing and Multimedia System
5. Computer Graphics
HONOR AND AWARDS:
1. IBM Research Division Award, 2005
2. Best paper award in IUI 2005
ForthPlace at the V. Dale Cozad Business Plan Competition for business plan on face animation, 2000-2001
4. Samsung scholarship to best undergraduate student,
, 1998 Tsinghua University
5. Outstanding undergraduate scholarships,
, 1994 - 1998 Tsinghua University
· Ph.D. in Computer Science, University of Illinois at Urbana Champaign, USA. Thesis advisor: Professor Thomas Huang. GPA 3.97/4.0, 2000 – 2004,
· M.S. in Computer Science, University of Illinois at Urbana Champaign, USA. August 1998 – January 2000
· B.S. in Computer Science, Tsinghua University, Beijing, P.R. China. GPA 92/100, September 1993 – July 1998.
· IBM T. J. Watson Research Center, Hawthorn, NY
2004 ~ present: Research Staff Member in Intelligent Multimedia Interaction Group, Manager: Dr. Michelle Zhou.
Working on a context-sensitive multimodal information seeking project. This project addresses the problem of information seeking using contextual information (e.g. task, user information) and integrated information presentation of related information in multiple modalities. Partial results were presented in IUI 2005 (best paper award) and InfoVis 2005.
· University of Illinois at Urbana-Champaign,
Summer 1999- 2004: Research Assistant
Image Formation and Processing Group, Advisor: Prof. Thomas Huang
1. Face Motion Modeling, Analysis and Synthesis:
My dissertation research investigates face motion analysis and synthesis, which is important for intelligent Human Computer Interaction (HCI) and video analysis. For the two main existing approaches, geometric-based methods are robust to variations of lighting and subjects, but are not effective for motion details such as wrinkles which are important perceptual visual cues. In contrast, appearance-based methods can handle details but are not robust to changes of lighting and people. In this thesis, a novel framework is proposed to integrate the two types of approaches such that they complement each other. On one hand, appearance-based motion model augments geometric-based motion model to improve effectiveness. On the other hand, novel illumination effects modeling and unsupervised adaptation are introduced for robustness of the appearance model. The efficacy of this framework in face motion analysis is demonstrated in face tracking and expression recognition. Face synthesis can be used to enhance visual information in avatar-based HCI, audio-visual intelligent hearing aid, and psychiatry treatment. Partial results have been presented in ICCV'03, CVPR'03.
The goal of this project is to contribute to the development of an HCI environment in which the computer monitors the user's emotional, motivational, cognitive and task states, and initiates communications based on this knowledge. It is a collaboration of researchers in vision, speech, machine learning, psychology and education. The test-bed is teaching children scientific principles via LEGO games. I work on improving the robustness and accuracy of 3D face tracking in order to estimate users' states. Besides, I use the animated avatar as interface to interact with users. Initial findings show that the avatar helps to engage children in a Wizard-of-Oz instruction setting. Tracking results were presented at ICCV 2003, and demonstrated to NSF.
3. Synthetic Talking Face for Psychological and Psychiatry Studies:
The goal of this project is to use face animation as visual stimuli in psychological and medical applications. The advantage is that synthetic stimuli can be more easily created and manipulated than natural stimuli. Moreover, the feedback from applications provides guidance for further improvement in face animation. We collaborate with researchers in Speech and Hearing Science to augment speech-only hearing-aid with visual information to better help hard-of-hearing people in noisy environments. In collaboration with researchers in psychiatry, we use the face animation for treating autistic children. Partial results have been presented in ICME'02, ICIP'02 and CVPR'03.
4. Synthetic Talking Face for Low Bit-rate Communication:
In noisy, dynamic, low-bandwidth environments such battle fields, visual information helps human to better understand speech-only communication. In this project, we developed a comprehensive iFACE system to build and animate 3D face avatars for any given people. The system can be used augment communication with visual facial information in a variety of conditions, e.g. text only, one-way speech only, two-way speech only communication. For real-time two way communication, a novel neural network based algorithm is proposed to animate face using speech with a delay less than 100 ms. Results were presented in ICME'01, ACM Multmeida'01, IEEE Transactions on Neural Networks and International Journal on Image and Graphics. The system was demonstrated at Army Research Lab FedLab symposium'00 and symposium'01.
· IBM T. J. Watson Research Center, Hawthorn, NY
5/2003 ~ 8/2003: Research Intern in Composite Media Group, Manager: Dr. Michelle Kim.
Worked with Dr. Michelle Kim and other team members at the Multimedia system department on the Knowledge-based Runtime Framework for Adaptive Rich Media Composition project. This project addressed adaptive rich media provisioning for diverse environments. We propose a formalized knowledge-based framework to design and deploy MPEG-4 based rich media adaptation engine. At runtime, an efficient engine based on IBM policy toolkit is used to map environmental conditions to appropriate media adaptation plan. For design and deploy the adaptation policies used by the engine, we use ontology to organize the domain knowledge. For a new given media application, applicable knowledge can be retrieved from ontology to compose the policies. The framework is developed under Eclipse 2.0. Results were presented in the IBM summer student poster session.
· Microsoft Research,
5/2002 ~ 8/2002: Research Intern in Communication, Collaboration and Signal Processing Group, Mentor: Dr. Zicheng Liu.
Worked with Dr. Zicheng Liu and Dr. Michael Cohen at the Communication, Collaboration and Signal Processing Group on the Low Bit-rate Face Video Streaming for Face-to-face Teleconference project. This project proposes a novel way to incorporate prior knowledge about face to improve the efficiency and quality of face video streaming. At low bit-rate standard video codec could blur facial motion details which are important cues for perception. Based on face detection technique, we propose spatially varying error criteria, which preserve more details for perceptually important facial areas. To further reduce bit-rate based on the fact that facial motion is highly structured, we design an efficient multi-reference frame face coding technique. The reference frames are selectively updated such that most face appearance can be reconstructed within bounded memory and CPU usage. Compared with state-of-the-art coding standard H.26L, our system achieves similar PSNR, but with higher visual qualities and faster processing speed. A US patent has been filed by Microsoft for this technique.
· Microsoft Research,
11/2001 ~ 12/2001: Visiting Student
I worked on learning facial motion models from motion capture data for analysis and synthesis. To enable flexible local motion analysis and synthesis (such as mouth tracking and animation), a parts-based algorithm is designed for learning parts-based facial motion model. This work was done in collaboration with Dr. Harry Shum of Microsoft Research Asia. Results were presented in one of chapters in the book "3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body", published by Idea Group Inc.
· Microsoft Research,
Worked with Dr. Zicheng Liu, Dr. Zhengyou Zhang in the Collaboration and Multimedia System Group on Fast Face Avatar Modeling. A personalized 3D face model can be constructed from a single video stream. Using image-based methods, we propose novel efficient techniques to model motion details for facial expressions; and lighting change effects. An image editing system is also developed to interactively edit the lighting effects in a single input face image (or texture). Results were presented in CVPR'03.
· Rockwell Science Center (now Rockwell Scientific Company), Thousand Oaks, CA
5/2000 ~ 8/2000: Summer Intern in Human Computer Interaction Group; Mentor: Dr Michael Chan (now at GE global research);
Worked with Dr. Michael Chan in the Human Computer Interaction Group on Lip Tracking and Animation for Audio-Visual Speech Recognition. I ported a real-time contour-based lip tracking and audio-visual speech recognition system from SGI IRIX to Windows, using OPENGL and Visual C++. The system was extended to use appearance-based features along with contour-based features to improve recognition. To enhance speech only communication with visual information, I designed and implemented a dynamic programming algorithm to drive a synthetic talking face using lip tracking results. Results were presented in Picture Coding Symposium'01 and Army Research Lab FedLab Symposium'01.
· University of Illinois at Urbana-Champaign: Urbana, IL
Spring 2002, 2003: Guest lecturer
For graduate/undergraduate course "Multimedia Signal Processing" (ECE 371TSH) by Prof. Tom Huang.
Fall 2001, 2002, 2003: Guest lecturer
For graduate course "Image Processing" (ECE 447) by Prof. Tom Huang.
Pending: User Behavior-driven Visual Recommendation, 2008
Pending: Methods and Apparatus for Dynamic Data Transformation for Visualization, 2007
Issued: System and method for peer-to-peer multi-party voice-over-IP services, 2006
Issued: A Context-Aware, Adaptive Approach to Information Selection for Interactive Information Analysis, 2006
Pending: Optimization-Based Framework for Visual Context Management, IBM, 2005
Pending: Optimization-Based Media Allocation, IBM, 2004
Issued: A System and Method for Low Bandwidth Video Streaming for Face-To-Face Communication, Microsoft, 2003
[B1] Zhen Wen, Thomas Huang, "3D Face Processing", Springer, 2004.
REFEREED JOURNAL/BOOK CHAPTER ARTICLES:
[J7] Zhen Wen, Michelle Zhou, "Evaluating the Use of Data Transformation for Information Visualization", in IEEE Transactions on Visualization and Computer Graphics, volume 14, number 6, page 1309-1316, 2008.
[J6] Yang Wang, Lei Zhang, Zicheng Liu, Gang Hua, Zhen Wen, Zhengyou Zhang, Dimitris Samaras. Face Re-Lighting from a Single Image under Arbitrary Unknown Lighting Conditions. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear.
[J5] Zhen Wen,
[J4] Pengyu Hong, Zhen Wen, and Thomas S. Huang, "Speech driven face animation", in MPEG-4 Facial Animation - The standard, implementations, and applications, John Wiley & Sons, 2002.
[J3] Pengyu Hong, Zhen Wen, Yang Li, Thomas S. Huang, "Avatars", Computer Science Handbook of the Army Research Lab ADID FedLab, 2000.
[J2] Pengyu Hong, Zhen Wen, and Thomas S. Huang, "Real-time speech driven expressive synthetic talking faces using neural networks", in IEEE Transactions on Neural Network, vol. 13, no. 4, page 916-927, April 2002.
[J1] Pengyu Hong, Zhen Wen, and Thomas S. Huang, “iFACE: A 3D SYNTHETIC TALKING FACE", International Journal of Image and Graphics, vol. 1, no. 1, page 1-8, 2001.
[C16] Zhen Wen, Michelle Zhou, “An Optimization-based Approach to Dynamic Data Transformation for Smart Visualization”, in Proc. of ACM International Conference on Intelligent User Interfaces (IUI), page 70-79, 2008. (Acceptance rate 15%)
[C15] Yang Wang, Zicheng Liu, Gang Hua, Zhen Wen, Zhengyou Zhang, Dimitris Samaras, “Face Re-Lighting from a Single Image under Harsh Lighting Conditions”, in Proc. of CVPR 2007. (Acceptance rate: 28%)
[C14] Zhen Wen, Michelle
Zhou and Vikram Aggarwal,
“Context-Aware, adaptive information retrieval for investigative tasks”, in
Proc. of International Conference on Intelligent User Interfaces (IUI),
[C13] Xiaohui Gu, Zhen Wen, ChingYung Lin, and Philip S. Yu, “ViCo: an adaptive distributed video correlation system", in Proc. of ACM Multimedia, 2006. (Acceptance rate: 17%)
[C12] Zhen Wen, Michelle Zhou and Vikram Aggarwal, "An Optimization-based Approach to Dynamic Visual Context Management", in Proc. of IEEE Symposium on Information Visualization (InfoVis), to appear, Minneapolis, USA, 2005. (Acceptance rate: 27%)
[C11] Michelle Zhou, Zhen Wen and Vikram Aggarwal, "A Graph-Matching Approach to Dynamic Media Allocation in Intelligent Multimedia Interfaces", in Proc. of International Conference on Computer Vision (IUI), (Best Paper Award) page 114-121, San Diego, USA, 2005. (Acceptance rate: 28%)
[C10] Zhen Wen and Thomas
Huang, "Capturing Subtle Facial Motions in 3D Face Tracking", in Proc.
of International Conference on Computer Vision (ICCV), page 1343-1350,
[C9] Zhen Wen, Zicheng Liu and Thomas Huang, "Face Relighting with Radiance Environment Maps", in Proc. of International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, page 158-165, Madison, WI, USA, 2003. (Acceptance rate: 24%)
[C8] Jilin Tu, Zhen Wen, Tao Hai, and Thomas Huang, “Coding Face at Very Low Bit Rate via Visual Face Tracking”, in Proc. of Picture Coding Symposium’03, page 301-304, Saint Malo, France, 2003.
[C7] Zhen Wen, Thomas Huang and Zicheng Liu, "On Recovering Detailed Face Deformation under General Lighting Using Height from Shading", in Proc. of International Conference on Multimedia and Expo (ICME), vol. 1, page 465-468, Lausanne, Switzerland, 2002.
[C6] Zhen Wen, Thomas Huang and Zicheng Liu, "Model-based Face Image Coding Using Spherical Harmonics", in Proc. of International Conference on Image Processing (ICIP), vol.1, page 185-188, Rochester, NY, USA, 2002.
[C5] Pengyu Hong, Zhen Wen, Thomas Huang, “An Integrated Framework for Face Modeling, Facial Motion Analysis and Synthesis”, in Proc. of ACM Multimedia, Ottawa, Canada, 2001.
[C4] Zhen Wen, Pengyu Hong, Thomas Huang, "Real Time Speech Driven
Facial Animation Using Formant Analysis", in Proc. of International
Conference on Multimedia and Expo (ICME),
[C3] Pengyu Hong, Zhen Wen, Thomas Huang, "Real-time Speech Driven Avatar with Constant Short Time Delay", in Proc. of International Conference on Augmented, Virtual Environments and 3D Imaging, Greece, 2001.
[C2] Zhen Wen, Michael Chan, Thomas Huang, "Face Animation Driven by Contour-Based Visual Tracking", in Proc. of 22nd Picture Coding Symposium, page 263-266, Korea, April 2001.
[C1] Pengyu Hong, Zhen Wen, Thomas Huang, Michael T. Chan, "Speech Driven Avatars", Army Research Lab Symposium, College Park, MD, USA, 2001.
[T6] “Context-sensitive Information Seeking", University of Sydney, Australia, December 2006.
[T5] “Illumination Effects for Face Analysis and
Synthesis", Microsoft Research
[T4] Invited lectures on “Neural Networks” and “Face
Analysis and Synthesis” (with Prof. Tom Huang),
[T3] “Combining Shape and Texture for
Facial Motion Analysis and Synthesis”, NEC Labs
[T2] “3D Face Modeling, Analysis and
Synthesis”, Microsoft Research
[T1] “3D Face Modeling, Analysis and
· TPC member, ACM IUI 2009.
· TPC member, ACM Multimedia 2007.
· TPC member, short paper track, ACM Multimedia 2005.
· Organizing committee, IEEE Workshop on Multimedia Signal Processing 2005
· Reviewer for ACM Multimedia, ACM CHI, ACM IUI, IEEE InfoVis, IEEE CVPR, IEEE ICCV