Multimedia Analysis and Retrieval Systems

[Problem] [Demo] [Approaches]

This is an integrated Relevance Feedback Architecture for Content-Based Image Retrieval (CBIR). This interactive image retrieval system achieves the relevance feedback at all levels: feature representation and similarity measure. User's query is automatically refined by the system based on user's relevance feedback.

A list of general references in CBIR can be found here.

Problem Statement

Two main challenges existed for Content-Based Image Retrieval (CBIR): (1) The gap between high-level concepts and low-level features; (2) Subjectivity of human perception of visual content. Relevance feedback based on interactive retrieval approach was proposed to taken into account the above two characteristics in CBIR. Our work in MARS mainly addresses the above two problems.

Back to top


Our MARS is implemented by C/C++ and user interface is written in Microsoft Visual C++ language. The following examples are a few screen shots from MARS user interface.

Screen shot 1: Random Browsing

Screen shot 2: System Setting

Screen shot 3: Top 20 retrieved images given the top left image as the initial query.
    The retrieval is based on global image features from Corel image database.

Screen shot 4: Top 20 retrieved images given the top left image as the initial query.
    The retrieval is based on local image features identified by wavelet-based salient points. The salient points are imposed on the original images.

Screen shot 5(a): Salient points are imposed on the original images
Screen shot 5(b): Salient points only

Screen shot 6(a): Top 20 retrieved images for tiger using salient points(the top left one is query)
Screen shot 6(b): Top 20 retrieved images for tiger using global features (the top left one is query)

Screen shot 7: Feature evaluations of scale variations.
    An example of retrieval results using HSV color space, color moments and Tamura texture features.

Screen shot 8: System Setting
    Choose from different color spaces (HSV, LUV, LAB), different color features (color moments, color histogram. etc), and different texture features (Tamura, wavelet, MSAR) and different image scales

Back to top


[User Modeling] [Learning-DEM] [Learning-SVM]
[Multi-resolution] [Spatial] [Dimension Reduction]

Our Approaches for multimedia retrieval lie in the range of low-level feature presentation and evaluation, mid-level on-line learning during the relevance feedback and hybrid visual/semantic models to bridge the gap between the high-level concepts and low-level visual features.

On-line learning during the relevance feedback

Multi-resolution and spatial/local analysis

  • Image retrieval using wavelet-based salient points. (Details)
  • Combining tile-based spatial layout and user-defined region-of-interest. (Details)
Feature representation and evaluation

  • Feature evaluations of image scale variations: in theory and in the context of CBIR.
  • Feature subset selection and dimension reduction using principal feature analysis (PFA) vs. pricinpal component analysis (PCA). (Details)

  • Discriminant Analysis with EM Algorithm in Image Retrieval
      Although relevance feedback incrementally supplies more information for fine retrieval, two challenges exist: (1) the labeled images from the relevance feedback are still very limited compared to the large unlabeled images in the image database. (2) Relevance feedback does not offer a specific technique to automatically weight the low-level feature. In this approach, image retrieval is formulated as a transductive learning problem by combining unlabeled data in supervised learning to achieve better classification. Experimental results show that the proposed approach has a satisfactory performance for image retrieval applications.


    Back to top

  • Update Relevant Image Weight Using Support Vector Machines (SVM)
      In the Content-Based Image Retrieval (CBIR) with relevance feedback, the user interacts with the system by selecting the most relevant images and providing weights for the relevant images to denote their preference. By dynamically updating low-level feature weights based on the feedback, the system then tries to capture the high-level query concepts that the user has in mind and the perception subjectivity of the user.

      Most of the current approaches only use the positive feedbacks (relevant images). However, not all high-level concepts can be fully expressed using the positive feedbacks only. Moreover, sometimes it is hard for the users to relate the preference weights and the concepts they are looking for. In this work, we propose a novel approach that offers an option for the users to provide both positive and negative feedbacks. The positive and negative feedbacks are learned via Support Vector Machines. The learning results help the system update the preference weights for relevant images. Experimental results show that the proposed approach has reasonable improvement over relevance feedback alone.

      Incremental learning is going to be implemented in our program. By incremental learning, (a)every time the learning is performed on all of the positive and negative examples that are selected in the history. (b)the support vectors calculated in the last feedback is combined with the positive and negative examples selected in the current feedback to decide a new optimal hyperplane.


    Back to top

  • Wavelet-based Salient Points Approach


    Back to top

  • Tile-based spatial layout and user-defined region-of-interest


    Back to top

  • Feature Subset Selection and Dimension Reduction

      Dimensionality reduction of a feature set is a common preprocessing step used for pattern recognition and classification applications. It has been employed in compression schemes as well. Principal component analysis is one of the popular methods used, and can be shown to be optimal in some sense. However, it has the disadvantage that measurements of all of the original features are needed for the analysis. We proposed a novel method for dimensionality reduction of a feature set by choosing a subset of the original features that contains most of the essential information, using the same criteria as the PCA. We call this method Principal Feature Analysis (PFA). The proposed method is successfully applied to two different applications for choosing the principal features in face tracking and content-based image retrieval (CBIR) problems. Satisfactory results are obtained which shows the potential of the proposed method.

      Figure 6 shows an example of selection of the important points that should be tracked on a human face in order to account for the non-rigid motion. This is a classic example of the need to do feature selection since it can be very expensive, and maybe impossible to track many points on the face reliably. Figure 6 (b) shows the PFA selection results.

      Figure 6 (a) Example of facial expressions

      Figure 6 (b) Principal Motion Features chosen for the facial points. The arrows show the motion direction chosen (Horizontal or Vertical)


    Back to top

    to my homepage
    Any comments and suggestions?

    Copyright © 1997-2001
    Last Updated: January 2001