·
Robotic
Intelligent Behavior Learning
·
Visual Concept Learning and
Object Recognition
·
Context-aware Key Material Detection in Natural
Images
·
Closed Captioned Text-based Video News Categorization and
Grouping
·
Shape Deformation and Medical Image Segmentation
Robotic Intelligent Behavior Learning (with S. Levinson)
As opposing to hardwiring the
robot with skills, a method used by most robotic researchers, my research
interest is to enable the agent to develop intelligent behaviors such as
goal-oriented navigation and hand-eye coordination, autonomously via
interacting with the world. The fundamental methodology for robotic action
learning is Reinforcement Learning (RL). An essential challenge of this
methodology is the convergence of the learning in continuous state spaces. To
address this issue, I have proposed two methods, one is called hybrid
state-partitioning method aiming at robot navigation learning and the other is
called PQ-learning, which is for general purposed action learning.
·
Hybrid State-Partitioning for Robot Navigation
Learning
Traditional RL method requires a
discrete learning state space, which is usually obtained by partitioning the
continuous feature space. High partitioning resolutions may lead to a large
conceptual state space and a long learning time, while low resolutions tend to
introduce aliasings among learning states, i.e., one learning state is mapped
to a portion of feature space that, in reality, requires different optimal
action policies for different regions. In this project, we have proposed a
novel hybrid state-partitioning method, which combines the merits of both
static and dynamic state assigning strategies, to solve the state-partitioning
challenge in the navigation-learning task. Specifically, the continuous feature
space is first statically partitioned into a small-sized learning state space,
where aliasings might exist. Then, ambiguities among the aliasing states are
effectively eliminated in learning via a recursive state-splitting process. The
proposed method has been applied on both a simulated and real robot learning
system with satisfactory result achieved.
Publication: [IJCNN2001] Demo: Illy
learns to get the can
·
PQ-Learning Algorithm
PQ-learning is a general efficient reinforcement learning method for
robot intelligent behavior acquisition. This method uses an action value
propagation technique, including spatial propagation and temporal propagation,
to achieve fast learning convergence in large state spaces. Compared with the
approaches in literature, the proposed method offers three benefits for robot
learning. First, this is a general method, which should be applicable to most
reinforcement learning tasks. Second, the learning is guaranteed to converge to
the optimum with a much faster converging speed than the traditional Q and Q(l)-learning methods. Third, it supports both self and teacher-directed
learning, where the help from the teacher is directing the robot to explore,
instead of explicitly offering labels or ground truths as in the
supervised-learning regime. The proposed method had been tested with a
simulated robot navigation-learning problem. The results show that this method
significantly outperforms the Q(l)-learning
algorithm in terms of the learning convergence speeds in both self and
teacher-directed learning regimens.
Publication: [IAS-7] Preprint of the
improved algorithm is available upon request.
Visual Concept
Learning and Object Recognition (with S. Levinson)
·
Semantic Scene Concept Learning
Existing visual learning
systems, such as object recognition, work in “black box” manner, in which each
object instance is assigned a unique label in training and the learning is to
find a classifier to discriminate between the objects with different labels
according to the extracted feature vectors. This learning paradigm possesses
two drawbacks. First, it can only deal with one fixed concept, such as shape,
color or individual objects of interest. For multiple concepts, it is required
to explicitly specify the nature of each label assigned. For instance, the
label “red” is a color concept while “round” is a shape concept. Second, this
learning paradigm ignores the intrinsic semantic meaning of individual visual
information components since the algorithm of object recognition usually treat
all visual cues as a whole in building the classifier. Consequently, the agent
may be able to distinguish between object A and B; however, it is
unlikely that it may tell where and how they differ. The visual concept
learning system we have proposed is able to learn multiple visual concepts from
different concept domains, such as color, shape, texture, and individual
objects of interest, without explicitly specifying the nature of each concept.
For example, the agent may receive label “blue”, “cylinder” or “Pepsi” when
seeing a Pepsi can, and “red”, “cylinder”, or “Coke” when seeing a Coke can, etc.
Through learning, in addition to distinguishing between “Pepsi” and “Coke”,
“red” and “blue”, the agent is also able to tell that “blue” is a color concept
while “Pepsi” is an object concept although this information is not provided in
advance. The capability of semantic-level visual concept learning is important
for an agent to understand the semantic meaning of the world. It is
particularly significant to automatic language acquisition.
Publication: Preprint is available upon request
·
Multi-view Object Recognition
Appearance-based object recognition is challenging
where an essential issue is how to extract features. In this project, we
proposed an edge orientation-based algorithm for object recognition. This
algorithm uses the distribution of edge point orientations, combined with the
normalized second moments, to represent and index object shapes. Given a
testing object sample, a set of likelihood weights for potential candidate
objects is computed according to the distances in the feature space. A
convincing coefficient is introduced to evaluate the confidence of the best
match, according to which the robot may automatically take another view if the
best match is insufficiently convincing. In the experiments, our method has
achieved an average of 91.5% recognition accuracy under the 5-view scheme for
320 testing images taken from eight natural objects.
Publication: [ICPR2000]
Robust Object Tracking (with S. Wang, R-S. Lin and S. Levinson)
An essential challenge in feature matching-based object tracking is the
detection of the incorrect feature correspondences. In this project, we have proposed
an efficient algorithm to detect and remove incorrect feature matches in two
images even if they account for over 50% of the data. The proposed algorithm
has been successfully used in our rigid object tracking system, in which the
object-tracking problem is modeled as discovering the affine transforms of
object images in consecutive frames according to the detected feature
correspondences, where false feature correspondences (outliers) are effectively
detected and removed iteratively using the proposed algorithm. The tracking
system we developed is particularly suitable for tracking objects under low
video frame rates, which is the case in our robot system. In addition to affine
transform, this algorithm can also be used for general purposed epipolar
constraint estimation. Our preliminary result shows that the performance, in
terms of computational cost and outlier detection accuracy, of this algorithm
is superior to the well-known Least Median Square error method if the cost
function is linear.
Publication: [CVPR2001]
Context-aware Key
Material Detection in Images
(with A. Singhal and J. Luo)
Material detection refers to the problem of
identifying key semantic material types such as sky, grass, foliage, water, and
snow in images. In the previous work, Kodak research has developed several
material detectors, each aimed at classifying a specific type of material in an
image. A major problem with using these individual material detectors is the
large number of false positives that occur due to the similarities among
various material types. This prevents the using of these material detectors in
applications where precision is of high importance. To deal with this
challenge, we need to incorporate scene context information to regularize the
material detection problem. Towards this end, we have proposed a context-aware
approach that combines the output of individual material detectors and spatial
context constraint models to reduce the number of false positives. Preliminary
results showed that the spatial context-aware models had promisingly improve
the accuracy of materials detection by 10-12% over using the individual
material detectors only.
Multimedia Video News
Processing (with C. Toklu and S-P. Liou)
·
Closed-Captioned Text-based Automatic Video News
Categorization
With the rapid growth of the access to the Internet
and developments in image/video processing technology, digital news video is
becoming a convenient media for news acquisition. Video requires watching and
listening, which are more time consuming than reading. Therefore, there is a
strong interest for video summarization, especially automatic video
segmentation and categorization, in the video/image processing and computer
vision societies. In this project, we proposed a novel statistical approach,
called the weighted voting method, for automatic news story categorization
based on the closed captioned text.
News video is initially segmented into stories using the demarcations in the
closed captioned text; then a set of keywords is extracted to create feature
vectors for further processing. The categorization is achieved by computing the
likelihood score for each category and the knowledge base is updated
incrementally in linear time. We have used the proposed method to categorize
425 news stories from CNN and compared the categorizing performance with SNoW
and Bayes inference method. For the varying size of training examples, our approach
achieved the highest categorization accuracy among the three approaches.
Publication: [ICME2001]
·
Keyword Concurrency-based Relevant Story Grouping
The goal of this project is to automatically
group relevant stories in digital video news. A keyword concurrency-based
relevant story grouping method was proposed. The method could provide users
with the suggested group indices for each new story based on the existing
collection of the stories in the database; meanwhile, the user could interact
with the computer via the developed interface to confirm or modify the label
suggestions provided by the computer. The performance of our algorithm is
satisfactory with the 425 testing pieces of news stories from CNN.
Shape Deformation and
Medical Image Segmentation (with S. Wang and Z-P. Liang)
In this project, we
proposed a novel landmark-based shape deformation method for non-rigid object
tracking and medical image segmentation. This method effectively solves two
problems inherent in landmark-based shape deformation: (a) identification of
landmark points from a given input image, and (b) regularized deformation of
the shape of an object defined in a template. The second problem is solved
using a new constrained support vector machine regression technique, in which a
thin-plate kernel is utilized to provide non-rigid shape deformations. This
method offers several advantages over the existing landmark-based methods.
First, it has a unique capability to automatically select the best landmark
point among multiple candidate points in an input image to improve landmark
detection. Second, it can handle the case of missing landmarks, which often
arises in dealing with occluded images. The proposed method has been applied to
extract the scalp contours from brain cryosection images with very encouraging
results.
Publication: [ICCV2001] Demo: Brain image contour tracking