- What is relevance feedback?
- A list of references on relevance feedback.
- MARS relevance feedback
illustration,
- Our latest review paper:
Xiang Sean Zhou, Thomas S. Huang,
“Exploring the Nature and Variants of Relevance Feedback”, in Proc.
IEEE CVPR’01 Workshop on Content-Based Access of Image and Video
Libraries,
- What is BiasMap?
“Happy families are all alike, every unhappy family is unhappy
in its own way.”
—Leo
Tolstoy, Anna Karenina
- Demo by our group-mate Munehiro
Nakazato, using BiasMap as the core algorithm.
- Publication on BiasMap:
Xiang Sean Zhou,
T. S. Huang, “BiasMap for Small Sample Learning
during Multimedia Retrieval”, in Proc. IEEE CVPR,

The
decision surfaces of BiasMap using kernels (or,
KBDA), KDA, and SVM for highly non-linear configurations. The open
circles are positive examples and crosses negative. The gray level indicates
distance to the positive centroid in the non-linearly
transformed space: the brighter, the closer. Notice the spill-over effect of
KDA and SVM.
|
Examples of Face
Non-faces |
|
Face and non-face classification
(1000 faces + 1000 nonfaces): Precision in top 1000
returns. Number of positive training examples = 100, and the horizontal axis
shows the changing number of negative examples from 1 up to 300. RBF kernel
with s = 100; SVM returns the
points with larger margins first. BiasMap (or BDA) performs better than KDA and SVM with
small number of (thus unrepresentative) negative examples.
|
- What is SVM and kernel machines?—Try
a on-line demo from Lucent Bell Lab
- Publications on one-class SVM
and kernel discriminant analysis for relevance
feedback:
|
|
|
Y. Chen, Xiang Sean Zhou, T. S.
Huang, “One-class SVM for Learning in Image Retrieval”, IEEE
Int’l Conf. on Image Proc. (ICIP'2001), Thessaloniki,
Greece, October 7-10, 2001 [PDF] [PS]
Xiang Sean Zhou, T. S. Huang,
“Comparing Discriminate Transformations and SVM for Learning during
Multimedia Retrieval,” ACM Multimedia’2001, Sept. 30-
[Back to Research Project List]
We
propose structural features for content-based image retrieval (CBIR), especially edge/structure features extracted from edge
maps. The feature vector is computed through a “Water-Filling
Algorithm” applied on the edge map of the original image. The purpose of
this algorithm is to efficiently extract information embedded in the edges. The
new features are more generally applicable than texture or shape features.
Experiments show that the new features can catch salient edge/structure
information and
improve the retrieval performance.
|
-
What is Water-filling
algorithm?
|
|
|
- Publications:
Zhou, X. S., T. S. Huang,
"Edge/Structural Features for Content Based Image Retrieval," Pattern
Recognition Letters, Vol 22/5, Apr. 2001. pp 457-468 [PDF: 0.6MB]
Zhou, X. S., Y. Rui, and T. S. Huang, "Water-filling: a novel way for
image structural feature extraction", Proc. IEEE Inter. Conf. on Image
Processing,
[Back
to Research Project List]
In
this project we propose a novel image modeling scheme for object detection and
localization. Object appearance is modeled by the joint distribution of k-tuple salient point feature vectors which are factorized
component-wise after an independent component analysis (
Diagram for the object detection and localization task
implemented in this paper
|
|
|
Synthetic test images and a
detection example.
(a) The synthetic test image of 20 objects from COIL;
(b) The rotated and occluded version of (a);
(c) The likelihood map for detecting “piggy
bank” in (b). The white dots are the interest points.

Detecting Leopard and Tigers
The likelihood maps are multiplied by the corresponding
original images to reveal the detected (high likelihood) local structure.
- Publications:
Xiang Sean Zhou, B. Moghaddam, T. S. Huang, “ICA-based
probabilistic local appearance models”, ICIP’2001,
[Back
to Research Project List]
Under low bit rate channel constraints, we adopt the streaming of nonlinearly sampled video frames (i.e., key-frame slideshow) synchronized with the audio stream. Given the channel and buffer limits, we wish to obtain a set of sampled frames that is not only feasible (i.e., no frame drop) but also optimal in terms of maximal information flow (given that the semantic information contents of each frame can be quantified in a way either automatically or manually). Different application scenarios are considered and modeled in a principle way, for which we propose computationally efficient algorithms for finding the global optimal solution. The algorithms are linear with respect to the total number of frames thus computationally efficient. The contributions of this work include the novel modeling scheme for channel and buffer limits in the video temporal sampling problem; the development of the corresponding efficient algorithms for finding the global optimal solution; and the extension and analysis of these algorithms for practical application scenarios. The proposed algorithms have made possible the automated production of the new form of video streaming over low bit rate channels for devices with limited memory and storage capability.
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nonlinear vs. uniform sampling of a news video segment of 900 frames. The segment reports on the “in-your-face” advertising strategy and how far it has gone: ads above the urinal in the men’s room, on the pool table, on walkways and steps, and on the bar at the parking entrance. Uniform sampling wasted bandwidth on similar frames, and missed the critical shot changes and key-frames. Nonlinear sampling is capable of capturing much more information by selecting key-frames and by sampling meaningful panning/tilting sequence.
|
A CNN news segment of 5
minutes (125MB, SIF, MPEG good quality) is automatically and optimally sampled
for 3-level streaming for different channel capabilities and preroll settings. You need RealPlayer to
play back the following segments. |
|
1. To
satisfy a low bit rate channel limit, one can compress the video further by
sacrificing quality and/or resolution, the following is the result: Compressed
continuous video in .rm format (1.4 MB) 2. Or one
can use our optimal sampling algorithm to nonlinearly sample the frames,
simultaneously maximizing the channel usage and information flow. The
saliency of each frame is measured in terms of color histogram differences
and motion estimation. The following are three choices of sampling results,
based on varying bandwidths: SMIL
streaming video: level 1, 216 out of 9902 frames sampled (4.5 MB) SMIL
streaming video: level 2, 67 out of 9902 frames sampled (1.4MB) SMIL
streaming video: level 3, 41 out of 9902 frames sampled (0.87MB) 3. One can certainly argue that 1 and 2 can be combined side-by-side to present the user with both a “coarse” overall presentation and some “highlights” or “details” of the original video. |
- Publications:
Xiang Sean Zhou, S-P Liou, “Optimal Nonlinear Sampling for Video Streaming
at Low Bit Rates,” Accepted as a Transaction Paper for IEEE Trans. on
Circuits and Systems for Video Technology, Special Issue on Wireless Video to
be published in 2002 (in final revision stage). [PDF]
[PS]
[Back
to Research Project List]
The performance of a content-based image retrieval (CBIR) system is inherently constrained by the use of the low-level features, and can not give satisfactory retrieval results in many cases; especially when the high-level concepts in the user’s mind is not easily expressible by the low-level features. In this work we explore the unification of keywords and visual contents for image retrieval. We propose a seamless joint querying and relevance feedback scheme based on both keywords and low-level visual contents incorporating keyword similarities. We propose a pseudo-classification algorithm for the learning of the word similarity matrix during user interaction. This learned similarity matrix, specific to the dataset as well as the users, can facilitate keyword semantic grouping, thesaurus construction, and soft query expansion during intelligent image retrieval with user-in-the-loop.
|
Concept similarity matrix for 30 words in vocabulary, 5000 images in database, up to 3 keywords per image. Concept similarity matrix after 20 rounds of training. This reveals the effectiveness of the WARF formula. |
|
|
An intelligent retrieval system that can learn from the user interactions and understand the semantics of words and contents. |
|
|
|
|
- Publications:
Xiang Sean Zhou, T. S.
Huang, “Unifying Keywords and Visual Contents in Image Retrieval,”
IEEE Multimedia magazine, April-June Issue, 2002 [PDF:0.7MB] [PS:
5.8MB]
Xiang Sean Zhou, T. S.
Huang, “Unifying Keywords and Contents for Image Retrieval,”
International Workshop on Content-Based Multimedia Indexing,
[Back
to Research Project List]
The 4DI system, a real time
three-dimensional (3-D) imager, is a laser range data sensing system for making
continuous geometric measurements of 3-D surfaces. This paper concentrates on
the objective of improving the resolution and accuracy of this system using a
camera-turntable arrangement. A fast and inexpensive way to capture dense range
data for 360o viewing of an object is provided. After the system calibration
procedure and parameter estimation operations, the method to improve the data
resolution by object rotation is derived. By oversampling
the object surface and then using a 3-D smoothing and resampling
operation, the accuracy of the data can be improved maintaining a given spatial
resolution. The method of removing the low credibility data points is also
discussed.
|
|
|
|
|
3-D point cloud for Vase
|
|
|
|
3-D point cloud for Engine
Blade (joint project for GE Aircraft Engine)
- Publications:
Xiang Sean Zhou, Ruihua
Yin, Wenjian Wang, Xiaogu
Wu, William G. Wee, "Calibration, parameter estimation, and accuracy
enhancement of a 4DI camera turntable system", Optical Engineering, Vol.
39(01), Jan. 2000, 170-174
[Back
to Research Project List]
Several adaptive order statistic filters (OSF) are developed and compared for channel characterization and noise suppression in images and 3-D CT data. Emphasis has been put on the situation when a noise-free reference image is not available but instead we can have a sequence of two noisy versions of the same image (or a 3-D data slice). One of the noisy images is used as the reference in the OSF. The adaptive updating formula for the filter coefficients is derived. It is shown theoretically that if noises are not correlated, the expected values of the derived filter coefficients will be equal to those coefficients derived using a noise-free reference. Experiments using the noisy reference image yield comparable results to those methods using a noise-free reference image and also better results than those of median, Gaussian, averaging and Wiener filters.
|
|||||
|
|
|
|
||
- Publications:
Xiang Sean
Zhou, William G. Wee, "Adaptive Order Statistic Filters for Noise
Characterization and Suppression without A Noise-free Reference",
ICC’98, SPIE’98, in review for IEEE Tran. on Image Processing.
[Back
to Research Project List]
Speech
processing (click here to LISTEN to the results!...)
|
Pitch detection |
Linear delta modulation |
|
|
|
|
LPC reconstruction |
A zoom-in view |
|
|
|
[Back
to Research Project List]