Integration of multiple sources of information (heterogeneous and homogeneous) plays a key role in solving many real world problems. Fusion of audio and visual information plays a critical role in multimedia analysis. Recent results have shown that improved performance (in multimedia tasks) can be achieved by combining information from different sources (e.g. Audio and Video) as against relying on single modality. Although, use of multiple modalities is promising, it poses a challenging problem of learning in high dimensional spaces. Results in learning theory can be used to analyze learning problems in high dimensional spaces; however, it falls short of giving practical guarantees. The focus of my research is in solving multimedia applications by fusing information from multiple modalities and theoretical analysis of machine learning algorithms used there in.
Over the past four years, I have worked on a number of applications in the field of multimedia information processing. Both generative models (probabilistic networks) and discriminative classifiers (support vector machines, winnow, perceptron) are used to solve these and other related problems. In all cases, good performance was obtained even though at times theoretical support to the used algorithms was weak. I have worked on extending some of the theoretical results to explain the observed performances and at the same time developed improved learning algorithms.
I have looked into a number of interesting problems in the wide areas of multimedia analysis and bio-informatics. In solving these tasks, the key difficulty lies in combining the information from heterogeneous sources. The information may be asynchronous, may require analyzing it at different levels of temporal abstraction and may have long term temporal dependence which needs to be captured.
As part of my research, I have mainly concentrated on the use of probabilistic models for solving these tasks. Learning probabilistic models involve choosing the appropriate architecture that can capture the interesting aspects of the task at hand. Once the architecture has been chosen, the other task is to learn the parameters that describe the network. Depending upon the problem at hand, different network structures may be more suited then others. I have introduced duration dependent input output Markov models for modeling events in videos, layered hidden Markov models for activity detection in office environments, a variant for factorial hidden Markov models for audio visual speech recognition. These architectures are general and their application is not limited to multimedia. I employed the input hidden Markov model architecture (originally introduced for event detection) for combining the predictions of experts for the task of gene annotation. The results obtained clearly demonstrate the wide applicability and generalization capability of these models.
On the theoretical front, I have investigated the discrepancy between the results typically predicted by the theoretical analysis and the ones observed (in many applications observed performance is much better than the one predicted by theory). When using probabilistic networks, a number of conditional independence assumptions are made which may not be true. I have investigated probabilistic networks in detail. In particular I have analyzed the dependence of the classification performance on the properties of the underlying distribution and the assumptions made during learning. The results obtained are then used to derive improved learning algorithms. I have proposed a new learning algorithm for hidden Markov models based on maximizing the mutual information between the hidden and the observed variables and the results obtained clearly demonstrate the superiority of this algorithm.
From the perspective of discriminative classifiers, the results on applications related to face recognition and object detection again highlighted the gap between the theoretical results and observed performance. In many practical cases, when learning is done in high dimension, the margin with which the classification is done is still small however the distribution of points is such that most of the points are far from the classifier. I made use of this observation and have developed a classification version of the random projection lemma to obtain data dependent bounds based on margin distribution. These bounds also have an algorithmic aspect and I have developed a new algorithm that directly optimizes the margin distribution. Empirically, it has much better performance when compared with the state-of-the art algorithms on the same tasks. I have also extended the theory of coherent concepts initially introduced by Roth et. al. I argue that in cognitive learning scenario, learning is not done in isolation and a number of concepts have to agree with one another (similar to assumptions made in co-training framework). The theoretical results show that under such scenarios, the burden on the sample complexity is reduced.
In the future, I plan to continue on these lines- working on interesting applications, studying the underlying theoretical issues and developing improved learning algorithms. In particular, I am very interested in working on some of the fundamental issues that I have encountered while working on multimedia applications. In problems related to speech recognition, the transcription is invariably noisy. Similarly, when labeling the data, there are inherent errors that cannot be avoided. Although some of the learning algorithms are robust enough to learn in presence of noisy data, they cannot make use of the knowledge of noise explicitly and thus there performance degrades. I plan to develop algorithms that explicitly make use of the knowledge of noise to improve the performance. Another related problem which has attracted attention of a number of people deals with learning with labeled and unlabeled data. This problem is especially relevant when developing a system to detect rare/abnormal events as by definition, the data corresponding to rare events may not be available. I am looking into the methods that I have developed for event detection and activity recognition to detect abnormal activities in airports and other public places. I also plan to extend these results for developing gene sequencing and alignment tools.
To summarize, I am currently interested in working on the following problems:
· Characterizing and detecting rare and abnormal events.
· Developing algorithms using probabilistic networks for gene alignment and sequencing.
· Developing theoretical framework to learn in presence of noisy data. This would allow one to specifically take into account the noise that may be present in the labels.
· Theoretical analysis of learning with unlabeled data. PAC type bounds when unlabeled data is used along with the labeled data.
· Learning probabilistic and discriminative classifiers in an online fashion. I am looking at the problem of online learning and have some interesting preliminary results on learning probabilistic classifiers.