Research Projects      Sean Zhou’s home


Learning and Relevance Feedback in Multimedia Retrieval:

 

  • Review and Analysis

- What is relevance feedback?

- A list of references on relevance feedback.

- MARS relevance feedback illustration,

- Our latest review paper:

Xiang Sean Zhou, Thomas S. Huang, “Exploring the Nature and Variants of Relevance Feedback”, in Proc. IEEE CVPR’01 Workshop on Content-Based Access of Image and Video Libraries, Hawaii, Dec. 2001 [PDF: 0.3MB] [PS: 1.3MB]

 

 

  • BiasMap and its Application in Relevance Feedback

- What is BiasMap?

- Intuition behind BiasMap:

Happy families are all alike, every unhappy family is unhappy in its own way.”

—Leo Tolstoy, Anna Karenina

- Demo by our group-mate Munehiro Nakazato, using BiasMap as the core algorithm.

- Publication on BiasMap:

Xiang Sean Zhou, T. S. Huang, “BiasMap for Small Sample Learning during Multimedia Retrieval”, in Proc. IEEE CVPR, Hawaii, Dec. 2001 [PDF: 0.7MB] [PS: 3.8MB]

 

The decision surfaces of BiasMap using kernels (or, KBDA), KDA, and SVM for highly non-linear configurations. The open circles are positive examples and crosses negative. The gray level indicates distance to the positive centroid in the non-linearly transformed space: the brighter, the closer. Notice the spill-over effect of KDA and SVM.

 

 

Examples of Face

 

Non-faces

Face and non-face classification (1000 faces + 1000 nonfaces): Precision in top 1000 returns. Number of positive training examples = 100, and the horizontal axis shows the changing number of negative examples from 1 up to 300. RBF kernel with s = 100;  SVM returns the points with larger margins first. BiasMap (or BDA) performs better than KDA and SVM with small number of (thus unrepresentative) negative examples.

 

 

  • SVM and Other Kernel Machines for Relevance Feedback

 

- What is SVM and kernel machines?—Try a on-line demo from Lucent Bell Lab

- Publications on one-class SVM and kernel discriminant analysis for relevance feedback:

   

Y. Chen, Xiang Sean Zhou, T. S. Huang, “One-class SVM for Learning in Image Retrieval”, IEEE Int’l Conf. on Image Proc. (ICIP'2001), Thessaloniki, Greece, October 7-10, 2001 [PDF] [PS]

Xiang Sean Zhou, T. S. Huang, “Comparing Discriminate Transformations and SVM for Learning during Multimedia Retrieval,” ACM Multimedia’2001, Sept. 30-Oct 5, 2001, Ottawa, Ontario, Canada, 2001 [PDF: 4.2MB]

 

 

 

 

[Back to Research Project List]


Image Structural Analysis and Representation:

 

  • Summary:        

 

We propose structural features for content-based image retrieval (CBIR), especially edge/structure features extracted from edge maps. The feature vector is computed through a “Water-Filling Algorithm” applied on the edge map of the original image. The purpose of this algorithm is to efficiently extract information embedded in the edges. The new features are more generally applicable than texture or shape features. Experiments show that the new features can catch salient edge/structure information and  improve the retrieval performance.

 

  • Water-filling algorithm for edge-based structural feature extraction

-          What is Water-filling algorithm?

 

 

- Publications:

Zhou, X. S., T. S. Huang, "Edge/Structural Features for Content Based Image Retrieval," Pattern Recognition Letters, Vol 22/5, Apr. 2001. pp 457-468 [PDF: 0.6MB]

Zhou, X. S., Y. Rui, and T. S. Huang, "Water-filling: a novel way for image structural feature extraction", Proc. IEEE Inter. Conf. on Image Processing, Kobe, Japan. Oct. 25-29, 1999

 

 

 

 

[Back to Research Project List]

 

 


ICA-based Probabilistic Appearance and Structure Models (PASM):

  • Summary:        

 

In this project we propose a novel image modeling scheme for object detection and localization. Object appearance is modeled by the joint distribution of k-tuple salient point feature vectors which are factorized component-wise after an independent component analysis (ICA). Also, we propose a distance-sensitive histograming technique for capturing spatial dependencies. The advantages over existing techniques include the ability to model non-rigid objects (at the expense of modeling accuracy) and the flexibility in modeling spatial relationships. Experiments show that ICA does improve modeling accuracy and detection performance. Experiments in object detection in cluttered scenes have demonstrated promising results.

 

  • PASM for Images

 

Diagram for the object detection and localization task implemented in this paper

 

 

Synthetic test images and a detection example.

(a) The synthetic test image of 20 objects from COIL;

(b) The rotated and occluded version of (a); 

(c) The likelihood map for detecting “piggy bank” in (b). The white dots are the interest points.

 

 

Detecting Leopard and Tigers

The likelihood maps are multiplied by the corresponding original images to reveal the detected (high likelihood) local structure.

 

- Publications:

Xiang Sean Zhou, B. Moghaddam, T. S. Huang, “ICA-based probabilistic local appearance models”, ICIP’2001, Greece, 2001 [PDF] [PS]

 

 

 

 

[Back to Research Project List]

 

 


Video Streaming for Wireless Devices and Low Bit Rate Channels:

 

  • Project Summary:

 

Under low bit rate channel constraints, we adopt the streaming of nonlinearly sampled video frames (i.e., key-frame slideshow) synchronized with the audio stream. Given the channel and buffer limits, we wish to obtain a set of sampled frames that is not only feasible (i.e., no frame drop) but also optimal in terms of maximal information flow (given that the semantic information contents of each frame can be quantified in a way either automatically or manually). Different application scenarios are considered and modeled in a principle way, for which we propose computationally efficient algorithms for finding the global optimal solution.   The algorithms are linear with respect to the total number of frames thus computationally efficient. The contributions of this work include the novel modeling scheme for channel and buffer limits in the video temporal sampling problem; the development of the corresponding efficient algorithms for finding the global optimal solution; and the extension and analysis of these algorithms for practical application scenarios. The proposed algorithms have made possible the automated production of the new form of video streaming over low bit rate channels for devices with limited memory and storage capability.

 

  • Optimal non-linear video sampling for key-frame slideshow streaming

 

 

 

 

Nonlinear vs. uniform sampling of a news video segment of 900 frames. The segment reports on the “in-your-face” advertising strategy and how far it has gone: ads above the urinal in the men’s room, on the pool table, on walkways and steps, and on the bar at the parking entrance. Uniform sampling wasted bandwidth on similar frames, and missed the critical shot changes and key-frames. Nonlinear sampling is capable of capturing much more information by selecting key-frames and by sampling meaningful panning/tilting sequence.

 

 

 

  • Demo video segments:

 

A CNN news segment of 5 minutes (125MB, SIF, MPEG good quality) is automatically and optimally sampled for 3-level streaming for different channel capabilities and preroll settings.

You need RealPlayer to play back the following segments.

 

1. To satisfy a low bit rate channel limit, one can compress the video further by sacrificing quality and/or resolution, the following is the result:

Compressed continuous video in .rm format (1.4 MB)

 

2. Or one can use our optimal sampling algorithm to nonlinearly sample the frames, simultaneously maximizing the channel usage and information flow. The saliency of each frame is measured in terms of color histogram differences and motion estimation. The following are three choices of sampling results, based on varying bandwidths:

SMIL streaming video: level 1, 216 out of 9902 frames sampled (4.5 MB)

SMIL streaming video: level 2, 67 out of 9902 frames sampled (1.4MB)

SMIL streaming video: level 3, 41 out of 9902 frames sampled (0.87MB)

 

3. One can certainly argue that 1 and 2 can be combined side-by-side to present the user with both a “coarse” overall presentation and some “highlights” or “details” of the original video.

 

 

 

- Publications:

Xiang Sean Zhou, S-P Liou, “Optimal Nonlinear Sampling for Video Streaming at Low Bit Rates,” Accepted as a Transaction Paper for IEEE Trans. on Circuits and Systems for Video Technology, Special Issue on Wireless Video to be published in 2002 (in final revision stage). [PDF] [PS]

 

 

 

 

[Back to Research Project List]

 

 


 

Unifying Keywords and Visual Contents in Image Database Retrieval:

 

  • Project Summary:

 

The performance of a content-based image retrieval (CBIR) system is inherently constrained by the use of the low-level features, and can not give satisfactory retrieval results in many cases; especially when the high-level concepts in the user’s mind is not easily expressible by the low-level features. In this work we explore the unification of keywords and visual contents for image retrieval.  We propose a seamless joint querying and relevance feedback scheme based on both keywords and low-level visual contents incorporating keyword similarities. We propose a pseudo-classification algorithm for the learning of the word similarity matrix during user interaction. This learned similarity matrix, specific to the dataset as well as the users, can facilitate keyword semantic grouping, thesaurus construction, and soft query expansion during intelligent image retrieval with user-in-the-loop.

 

  • “WARF”: Word Association via Relevance Feedback

 

Concept similarity matrix for 30 words in vocabulary, 5000 images in database, up to 3 keywords per image. Concept similarity matrix after 20 rounds of training. This reveals the effectiveness of the WARF formula.

 

An intelligent retrieval system that can learn from the user interactions and understand the semantics of words and contents.

 

 

 

 

- Publications:

Xiang Sean Zhou, T. S. Huang, “Unifying Keywords and Visual Contents in Image Retrieval,” IEEE Multimedia magazine, April-June Issue, 2002 [PDF:0.7MB] [PS: 5.8MB]

Xiang Sean Zhou, T. S. Huang, “Unifying Keywords and Contents for Image Retrieval,” International Workshop on Content-Based Multimedia Indexing, Italy, September 19-21, 2001 [PDF: 0.9MB] [PS: 5.4MB]

 

 

 

 

[Back to Research Project List]

 

 


Accurate 3-D Reconstruction Using a 4DI-Turntable System:

  • Project Summary:

 

The 4DI system, a real time three-dimensional (3-D) imager, is a laser range data sensing system for making continuous geometric measurements of 3-D surfaces. This paper concentrates on the objective of improving the resolution and accuracy of this system using a camera-turntable arrangement. A fast and inexpensive way to capture dense range data for 360o viewing of an object is provided. After the system calibration procedure and parameter estimation operations, the method to improve the data resolution by object rotation is derived. By oversampling the object surface and then using a 3-D smoothing and resampling operation, the accuracy of the data can be improved maintaining a given spatial resolution. The method of removing the low credibility data points is also discussed.

 

  • 3-D reconstruction from structured laser lights, accuracy enhanced using a turntable system

 

 

3-D point cloud for Vase

 

3-D point cloud for Engine Blade (joint project for GE Aircraft Engine)

 

 

- Publications:

Xiang Sean Zhou, Ruihua Yin, Wenjian Wang, Xiaogu Wu, William G. Wee, "Calibration, parameter estimation, and accuracy enhancement of a 4DI camera turntable system", Optical Engineering, Vol. 39(01), Jan. 2000, 170-174

 

 

 

[Back to Research Project List]

 

 


 Adaptive Order Statistic Filtering for 2-D Image and 3-D Medical Data:

 

  • Project Summary:

 

Several adaptive order statistic filters (OSF) are developed and compared for channel characterization and noise suppression in images and 3-D CT data. Emphasis has been put on the situation when a noise-free reference image is not available but instead we can have a sequence of two noisy versions of the same image (or a 3-D data slice). One of the noisy images is used as the reference in the OSF. The adaptive updating formula for the filter coefficients is derived. It is shown theoretically that if noises are not correlated, the expected values of the derived filter coefficients will be equal to those coefficients derived using a noise-free reference.  Experiments using the noisy reference image yield comparable results to those methods using a noise-free reference image and also better results than those of median, Gaussian, averaging and Wiener filters.

 

  • Adaptive Order Statistic Filters for Noise Characterization and Suppression without a Noise-Free Reference

 

 

 

    

- Publications:

Xiang Sean Zhou, William G. Wee, "Adaptive Order Statistic Filters for Noise Characterization and Suppression without A Noise-free Reference", ICC’98, SPIE’98, in review for IEEE Tran. on Image Processing.

 

 

 

 

[Back to Research Project List]

 


Speech processing (click here to LISTEN to the results!...)

Pitch detection                                     

Linear delta modulation

 

LPC reconstruction                                   

 

A zoom-in view                                     

 

 

 

 

 

[Back to Research Project List]

 


Demo : Image Retrieval and Relevance Feedback


 

Construction in progress…Please check back soon!

 

 

[Back to Research Project List]