Research
My research
interests include Multimedia analysis and fusion for indexing and retrieval in
video databases, statistical pattern recognition with applications to speech
and video data and graphical probabilistic models for recognition of feature-level
and semantic patterns. I am actively involved in various research issues
related to the semantic indexing of video using multiple modalities in a
statistical framework. Here at Beckman, we are trying to bridge the gap
between low-level physical features and high-level semantics. The focus of
my research is Semantic Video Indexing. This is also the
topic of my fellowship in the Computational Sciences and Engineering Department
at the University of Illinois. We have proposed a novel probabilistic graphical
framework for semantic video indexing using probabilistic multimedia objects (multijects) and a network of such
multijects (multinet). Probabilistic graphical models like Bayesian networks
and factor graphs offer an excellent architecture to capture the relationship
between the semantics and low-level features and the uncertainty that comes
along with this representation. We are therefore interested in investigating the
following directions in order to make this framework comprehensive.
1.
Video
Content Representation for Efficient Access
2.
Supervised
Pattern Recognition techniques applied to multimedia data (mainly video and
audio but not excluding text) for modeling low-level feature-space
representation of high-level semantics.
3.
Probabilistic
Graphical Networks for fusing multiple heterogeneous features. The features
may belong to different media, or they may live in feature-spaces of different
types (low-level features, visual templates, other high-level semantic
features). The idea is to capture the static and dynamic interaction between
semantic concepts using these networks. We have been actively engaged in the
use of Bayesian networks and factor graphs with some iterative probability
propagation algorithms for training and inference.
4.
Unsupervised techniques
to alleviate the laborious task of labeling data in large numbers for training
5.
Video
filtering, search and summarization: Interface issues
We have demonstrated the feasibility of the probabilistic architecture
for semantic video indexing. We have developed models for several semantic
objects, sites and events in audio and video. Prominent examples include Explosion,
Waterfall, Sky, Water-body, Forest, Rocky terrain, Helicopter, Gunshots,
Human-speech, Music, Outdoor etc.
Publications, Patents and Talks
Interesting Links related to
this research