RESEARCH

 


·        Robotic Intelligent Behavior Learning

·        Visual Concept Learning and Object Recognition

·        Robust Object Tracking

·        Context-aware Key Material Detection in Natural Images

·        Closed Captioned Text-based Video News Categorization and Grouping

·        Shape Deformation and Medical Image Segmentation


Robotic Intelligent Behavior Learning (with S. Levinson)

As opposing to hardwiring the robot with skills, a method used by most robotic researchers, my research interest is to enable the agent to develop intelligent behaviors such as goal-oriented navigation and hand-eye coordination, autonomously via interacting with the world. The fundamental methodology for robotic action learning is Reinforcement Learning (RL). An essential challenge of this methodology is the convergence of the learning in continuous state spaces. To address this issue, I have proposed two methods, one is called hybrid state-partitioning method aiming at robot navigation learning and the other is called PQ-learning, which is for general purposed action learning.

 

·        Hybrid State-Partitioning for Robot Navigation Learning

Traditional RL method requires a discrete learning state space, which is usually obtained by partitioning the continuous feature space. High partitioning resolutions may lead to a large conceptual state space and a long learning time, while low resolutions tend to introduce aliasings among learning states, i.e., one learning state is mapped to a portion of feature space that, in reality, requires different optimal action policies for different regions. In this project, we have proposed a novel hybrid state-partitioning method, which combines the merits of both static and dynamic state assigning strategies, to solve the state-partitioning challenge in the navigation-learning task. Specifically, the continuous feature space is first statically partitioned into a small-sized learning state space, where aliasings might exist. Then, ambiguities among the aliasing states are effectively eliminated in learning via a recursive state-splitting process. The proposed method has been applied on both a simulated and real robot learning system with satisfactory result achieved.  

Publication: [IJCNN2001]        Demo: Illy learns to get the can
 

·        PQ-Learning Algorithm

PQ-learning is a general efficient reinforcement learning method for robot intelligent behavior acquisition. This method uses an action value propagation technique, including spatial propagation and temporal propagation, to achieve fast learning convergence in large state spaces. Compared with the approaches in literature, the proposed method offers three benefits for robot learning. First, this is a general method, which should be applicable to most reinforcement learning tasks. Second, the learning is guaranteed to converge to the optimum with a much faster converging speed than the traditional Q and Q(l)-learning methods. Third, it supports both self and teacher-directed learning, where the help from the teacher is directing the robot to explore, instead of explicitly offering labels or ground truths as in the supervised-learning regime. The proposed method had been tested with a simulated robot navigation-learning problem. The results show that this method significantly outperforms the Q(l)-learning algorithm in terms of the learning convergence speeds in both self and teacher-directed learning regimens.  

 

Publication: [IAS-7]      Preprint of the improved algorithm is available upon request.     
 

Back to top


Visual Concept Learning and Object Recognition (with S. Levinson)

·        Semantic Scene Concept Learning

Existing visual learning systems, such as object recognition, work in “black box” manner, in which each object instance is assigned a unique label in training and the learning is to find a classifier to discriminate between the objects with different labels according to the extracted feature vectors. This learning paradigm possesses two drawbacks. First, it can only deal with one fixed concept, such as shape, color or individual objects of interest. For multiple concepts, it is required to explicitly specify the nature of each label assigned. For instance, the label “red” is a color concept while “round” is a shape concept. Second, this learning paradigm ignores the intrinsic semantic meaning of individual visual information components since the algorithm of object recognition usually treat all visual cues as a whole in building the classifier. Consequently, the agent may be able to distinguish between object A and B; however, it is unlikely that it may tell where and how they differ. The visual concept learning system we have proposed is able to learn multiple visual concepts from different concept domains, such as color, shape, texture, and individual objects of interest, without explicitly specifying the nature of each concept. For example, the agent may receive label “blue”, “cylinder” or “Pepsi” when seeing a Pepsi can, and “red”, “cylinder”, or “Coke” when seeing a Coke can, etc. Through learning, in addition to distinguishing between “Pepsi” and “Coke”, “red” and “blue”, the agent is also able to tell that “blue” is a color concept while “Pepsi” is an object concept although this information is not provided in advance. The capability of semantic-level visual concept learning is important for an agent to understand the semantic meaning of the world. It is particularly significant to automatic language acquisition.

Publication: Preprint is available upon request

·        Multi-view Object Recognition

Appearance-based object recognition is challenging where an essential issue is how to extract features. In this project, we proposed an edge orientation-based algorithm for object recognition. This algorithm uses the distribution of edge point orientations, combined with the normalized second moments, to represent and index object shapes. Given a testing object sample, a set of likelihood weights for potential candidate objects is computed according to the distances in the feature space. A convincing coefficient is introduced to evaluate the confidence of the best match, according to which the robot may automatically take another view if the best match is insufficiently convincing. In the experiments, our method has achieved an average of 91.5% recognition accuracy under the 5-view scheme for 320 testing images taken from eight natural objects. 

 

Publication: [ICPR2000]


Back to top


Robust Object Tracking (with S. Wang, R-S. Lin and S. Levinson)

An essential challenge in feature matching-based object tracking is the detection of the incorrect feature correspondences. In this project, we have proposed an efficient algorithm to detect and remove incorrect feature matches in two images even if they account for over 50% of the data. The proposed algorithm has been successfully used in our rigid object tracking system, in which the object-tracking problem is modeled as discovering the affine transforms of object images in consecutive frames according to the detected feature correspondences, where false feature correspondences (outliers) are effectively detected and removed iteratively using the proposed algorithm. The tracking system we developed is particularly suitable for tracking objects under low video frame rates, which is the case in our robot system. In addition to affine transform, this algorithm can also be used for general purposed epipolar constraint estimation. Our preliminary result shows that the performance, in terms of computational cost and outlier detection accuracy, of this algorithm is superior to the well-known Least Median Square error method if the cost function is linear.

Publication: [CVPR2001]

Back to top


Context-aware Key Material Detection in Images (with A. Singhal and J. Luo)

Material detection refers to the problem of identifying key semantic material types such as sky, grass, foliage, water, and snow in images. In the previous work, Kodak research has developed several material detectors, each aimed at classifying a specific type of material in an image. A major problem with using these individual material detectors is the large number of false positives that occur due to the similarities among various material types. This prevents the using of these material detectors in applications where precision is of high importance. To deal with this challenge, we need to incorporate scene context information to regularize the material detection problem. Towards this end, we have proposed a context-aware approach that combines the output of individual material detectors and spatial context constraint models to reduce the number of false positives. Preliminary results showed that the spatial context-aware models had promisingly improve the accuracy of materials detection by 10-12% over using the individual material detectors only.

 

Back to top


Multimedia Video News Processing (with C. Toklu and S-P. Liou)

·        Closed-Captioned Text-based Automatic Video News Categorization

With the rapid growth of the access to the Internet and developments in image/video processing technology, digital news video is becoming a convenient media for news acquisition. Video requires watching and listening, which are more time consuming than reading. Therefore, there is a strong interest for video summarization, especially automatic video segmentation and categorization, in the video/image processing and computer vision societies. In this project, we proposed a novel statistical approach, called the weighted voting method, for automatic news story categorization based on the closed captioned text. News video is initially segmented into stories using the demarcations in the closed captioned text; then a set of keywords is extracted to create feature vectors for further processing. The categorization is achieved by computing the likelihood score for each category and the knowledge base is updated incrementally in linear time. We have used the proposed method to categorize 425 news stories from CNN and compared the categorizing performance with SNoW and Bayes inference method. For the varying size of training examples, our approach achieved the highest categorization accuracy among the three approaches.

Publication: [ICME2001]

·        Keyword Concurrency-based Relevant Story Grouping

The goal of this project is to automatically group relevant stories in digital video news. A keyword concurrency-based relevant story grouping method was proposed. The method could provide users with the suggested group indices for each new story based on the existing collection of the stories in the database; meanwhile, the user could interact with the computer via the developed interface to confirm or modify the label suggestions provided by the computer. The performance of our algorithm is satisfactory with the 425 testing pieces of news stories from CNN.

 

Back to top


Shape Deformation and Medical Image Segmentation (with S. Wang and Z-P. Liang)

In this project, we proposed a novel landmark-based shape deformation method for non-rigid object tracking and medical image segmentation. This method effectively solves two problems inherent in landmark-based shape deformation: (a) identification of landmark points from a given input image, and (b) regularized deformation of the shape of an object defined in a template. The second problem is solved using a new constrained support vector machine regression technique, in which a thin-plate kernel is utilized to provide non-rigid shape deformations. This method offers several advantages over the existing landmark-based methods. First, it has a unique capability to automatically select the best landmark point among multiple candidate points in an input image to improve landmark detection. Second, it can handle the case of missing landmarks, which often arises in dealing with occluded images. The proposed method has been applied to extract the scalp contours from brain cryosection images with very encouraging results.

Publication: [ICCV2001]          Demo: Brain image contour tracking

Back to top