Research


My research interests include Multimedia analysis and fusion for indexing and retrieval in video databases, statistical pattern recognition with applications to speech and video data and graphical probabilistic models for recognition of feature-level and semantic patterns. I am actively involved in various research issues related to the semantic indexing of video using multiple modalities in a statistical framework.  Here at Beckman, we are trying to bridge the gap between low-level physical features and high-level semantics. The focus of my research is Semantic Video Indexing. This is also the topic of my fellowship in the Computational Sciences and Engineering Department at the University of Illinois. We have proposed a novel probabilistic graphical framework for semantic video indexing using probabilistic multimedia objects (multijects) and a network of such multijects (multinet). Probabilistic graphical models like Bayesian networks and factor graphs offer an excellent architecture to capture the relationship between the semantics and low-level features and the uncertainty that comes along with this representation. We are therefore interested in investigating the following directions in order to make this framework comprehensive.

Overview

1.        Video Content Representation for Efficient Access

2.        Supervised Pattern Recognition techniques applied to multimedia data (mainly video and audio but not excluding text) for modeling low-level feature-space representation of high-level semantics.

3.        Probabilistic Graphical Networks for fusing multiple heterogeneous features. The features may belong to different media, or they may live in feature-spaces of different types (low-level features, visual templates, other high-level semantic features). The idea is to capture the static and dynamic interaction between semantic concepts using these networks. We have been actively engaged in the use of Bayesian networks and factor graphs with some iterative probability propagation algorithms for training and inference.

4.        Unsupervised techniques to alleviate the laborious task of labeling data in large numbers for training

5.        Video filtering, search and summarization: Interface issues

We have demonstrated the feasibility of the probabilistic architecture for semantic video indexing. We have developed models for several semantic objects, sites and events in audio and video. Prominent examples include Explosion, Waterfall, Sky, Water-body, Forest, Rocky terrain, Helicopter, Gunshots, Human-speech, Music, Outdoor etc.

Publications, Patents and Talks

Interesting Links related to this research

 

Recent Demos

 

Contact