**Research Interests**

Integration of multiple sources of information
(heterogeneous and homogeneous) plays a key role in solving many real world
problems. Fusion of audio and visual information plays a critical role in
multimedia analysis. Recent results have shown that improved performance (in
multimedia tasks) can be achieved by combining information from different
sources (e.g. Audio and Video) as against relying on single modality. Although,
use of multiple modalities is promising, it poses a challenging problem of
learning in high dimensional spaces. Results in learning theory can be used to
analyze learning problems in high dimensional spaces; however, it falls short
of giving practical guarantees. The focus of my research is in solving
multimedia applications by fusing information from multiple modalities and
theoretical analysis of machine learning algorithms used there in.

Over the past four years, I have worked on a number of
applications in the field of multimedia information processing. Both generative
models (probabilistic networks) and discriminative classifiers (support vector
machines, winnow, perceptron)
are used to solve these and other related problems. In all cases, good
performance was obtained even though at times theoretical support to the used
algorithms was weak. I have worked on
extending some of the theoretical results to explain the observed performances
and at the same time developed improved learning algorithms.

I have looked into a number of interesting problems in
the wide areas of multimedia analysis and bio-informatics. In solving these
tasks, the key difficulty lies in combining the information from heterogeneous
sources. The information may be asynchronous, may require analyzing it at
different levels of temporal abstraction and may have long term temporal
dependence which needs to be captured.

As part of my research, I have mainly concentrated on
the use of probabilistic models for solving these tasks. Learning probabilistic
models involve choosing the appropriate architecture that can capture the
interesting aspects of the task at hand. Once the architecture has been chosen,
the other task is to learn the parameters that describe the network. Depending
upon the problem at hand, different network structures may be more suited then
others. I have introduced duration dependent input output Markov models for
modeling events in videos, layered hidden Markov models for activity detection
in office environments, a variant for factorial hidden Markov models for audio
visual speech recognition. These architectures are general and their
application is not limited to multimedia. I employed the input hidden Markov
model architecture (originally introduced for event detection) for combining
the predictions of experts for the task of gene annotation. The results
obtained clearly demonstrate the wide applicability and generalization
capability of these models.

On the theoretical front, I have investigated the
discrepancy between the results typically predicted by the theoretical analysis
and the ones observed (in many applications observed performance is much better
than the one predicted by theory). When using probabilistic networks, a number
of conditional independence assumptions are made which may not be true. I have
investigated probabilistic networks in detail. In particular I have analyzed
the dependence of the classification performance on the properties of the
underlying distribution and the assumptions made during learning. The results
obtained are then used to derive improved learning algorithms. I have proposed
a new learning algorithm for hidden Markov models based on maximizing the
mutual information between the hidden and the observed variables and the
results obtained clearly demonstrate the superiority of this algorithm.

From the perspective of discriminative classifiers,
the results on applications related to face recognition and object detection
again highlighted the gap between the theoretical results and observed
performance. In many practical cases, when learning is done in high dimension,
the margin with which the classification is done is still small however the
distribution of points is such that most of the points are far from the
classifier. I made use of this observation and have developed a classification
version of the random projection lemma to obtain data dependent bounds based on
margin distribution. These bounds also have an algorithmic aspect and I have developed
a new algorithm that directly optimizes the margin distribution. Empirically,
it has much better performance when compared with the state-of-the art
algorithms on the same tasks. I have also extended the theory of coherent
concepts initially introduced by Roth et. al. I argue that in cognitive learning scenario, learning is
not done in isolation and a number of concepts have to agree with one another
(similar to assumptions made in co-training framework). The theoretical results show that under such
scenarios, the burden on the sample complexity is reduced.

In the future, I plan to continue on these lines-
working on interesting applications, studying the underlying theoretical issues
and developing improved learning algorithms. In particular, I am very
interested in working on some of the fundamental issues that I have encountered
while working on multimedia applications. In problems related to speech
recognition, the transcription is invariably noisy. Similarly, when labeling
the data, there are inherent errors that cannot be avoided. Although some of
the learning algorithms are robust enough to learn in presence of noisy data,
they cannot make use of the knowledge of noise explicitly and thus there
performance degrades. I plan to develop algorithms that explicitly make use of
the knowledge of noise to improve the performance. Another related problem
which has attracted attention of a number of people deals with learning with
labeled and unlabeled data. This problem is especially relevant when developing
a system to detect rare/abnormal events as by definition, the data
corresponding to rare events may not be available. I am looking into the
methods that I have developed for event detection and activity recognition to
detect abnormal activities in airports and other public places. I also plan to
extend these results for developing gene sequencing and alignment tools.

To summarize, I am currently interested in working on
the following problems:

·
Characterizing
and detecting rare and abnormal events.

·
Developing
algorithms using probabilistic networks for gene alignment and sequencing.

·
Developing
theoretical framework to learn in presence of noisy data. This would allow one
to specifically take into account the noise that may be present in the labels.

·
Theoretical
analysis of learning with unlabeled data. PAC type bounds when unlabeled data
is used along with the labeled data.

·
Learning
probabilistic and discriminative classifiers in an online fashion. I am looking
at the problem of online learning and have some interesting preliminary results
on learning probabilistic classifiers.