IEEE International Conference on Multimedia & Expo 2000
New York, New York
Electronic Porceedings
©l 2000 IEEE


Update Relevant Image Weights for Content-Based Image Retrieval Using Support Vector Machines

Qi Tian, Pengyu Hong, Thomas S. Huang

IFP Group, Beckman Institute
University of Illinois at Urbana-Champaign
Urbana, IL 61801
e-mail: {qitian, hong, huang}@ifp.uiuc.edu


Abstract

Relevance feedback [1] has been a powerful tool for interactive Content-Based Image Retrieval (CBIR). During the retrieval process, the user selects the most relevant images and provides a weight of preference for each relevant image. User’s high level query and perception subjectivity can be captured to some extent by dynamically updated low-level feature weights based on the user’s feedback. However, in MARS [2] only the positive feedbacks, i.e., relevant images are considered. In this paper, a novel approach is proposed by providing both positive and negative feedbacks for Support Vector Machines (SVM) learning. The SVM learning results are used to update the weights of preference for relevant images. Priorities are given to the positive feedbacks that have larger distances to the hyperplane determined by the support vectors. This approach releases the user from manually providing preference weight for each positive example, i.e., relevant image as before. Experimental results show that the proposed approach has reasonable improvement over relevance feedback with possible examples only.


Table of Contents


1. Introduction

Content-Based Image Retrieval (CBIR) has become a very active research area since the 1990’s [3, 4]. Most of the early research effort focused on finding the “best” image features or representations. Retrieval was performed by summing the distance of individual feature representation with fixed weights. Although this isolated approach establishes the basis of CBIR, the usefulness of such systems was limited. A human-computer interactive approach to CBIR based on relevance feedback was proposed in [1].

 Unlike the isolated approach, where the user has to precisely decompose the information need into precise weights, the interactive approach allows the user to submit a coarse initial query and continuously refine his information need via relevance feedback. This approach greatly reduces the user’s effort of composing a query and captures the user’s information need more precisely.

 However, only using the positive examples as feedback [1, 2] may lose the information implied by the negative examples.  It is our expectation that by utilizing both positive and negative feedbacks, the user’s perception subjectivity can be captured more accurately than relevance feedback with positive examples only. Negative examples are also used in FourEyes [5], where the user chooses the both positive and negative examples. FourEyes looks at all of the models and determines which model or combination of models best describe the positive examples chosen by the user, while satisfying the constraints of the negative examples. In MARS [2], manually providing preference weights of the relevant image may not be accurate and is sometimes a burden for the user. The Support Vector Machine approach will be applied in this paper to release the user from the above burden and automatically generate preference weights of the relevant images.

 The rest of paper is organized as follows. Relevance feedback and similarity measure in MARS [2] is discussed in Section 2. Section 3 describes Support Vector Machines (SVM) with its application to update the weights of the relevant images in CBIR. Experimental results and conclusions are given in Section 4 and 5, respectively.

<-- Back to Table of Contents>


2. Relevance Feedback in CBIR
2.1 Relevance Feedback

Relevance feedback is a process of automatically adjusting an existing query using information fed-back from the user about the relevance of previous retrieved document [6]. In MARS [2], the user tells the system about which images of the previous retrieved results are relevant to what he or she wants and provides a preference weight for each relevant image.  Query weights for each low-level feature, i.e., color and texture, etc., are dynamically updated based on the user’s feedback. This release the user from the burden. The user is no longer required to specify a precise weight for each low-level feature at the query formulation stage. Based on user’s feedback, the high level concepts implied by the query weights are automatically refined.

2.2 Similarity Measurement

 The overall similarity  for the jth image in the database is obtained by linearly combining individual feature, ’s similarity distance, :

                                  (3)

where N is the total number of the images in the database. Mahalanobis distance is used as the similarity measurement:

       (4)

where is the ith feature vector of the jth image in the database,  is the feature vector of the query, and  is the covariance matrix of the ith feature components of the relevant images.  The element of is defined as:

         (5)

where V(k) is the preference weight for the kth relevant image (positive feedback), andare the mth and nth component values of the ith feature of the kth relevant image, respectively. and  are the mth and nth component values of the ith feature of the query, respectively. NR is the total number of positive feedbacks and . is an identity matrix if .

 Low-level feature weight  in Eq. (3) is updated by Eq. (6) and (7).

                                            (6)

                                                             (7)

 Higher weight is given to the feature that has smaller average distance , based on the relevant images.  This is because the relevant images are more similar (have smaller distance) in this feature than in other feature.

 In this approach, the user manually provides his or her preference weight, i.e.,in Eq. (5) for the relevant images. The preference weights denote the degree of how much the user likes the images.  A query is then the weighted average, i.e., centroid of the relevant images in the feature space. However, in some cases, there exist examples that are not desired by the user but closer to the query than some of the relevant images in the feature space. These examples are considered as negative examples in this work. Usually, the problem to separate the negative examples from the positive examples turns out to be finding a nonlinear classifier. Classification can’t be done from the positive feedbacks alone. Therefore, the negative examples will still be mixed with the positive examples after the relevance feedback. Moreover, sometimes it is hard for the user to tell the system about his or her subjectivity by providing preference weight for each relevant image. The query may be harmed by improper assignments of preference weights. Therefore, a weak classification of preference, e.g., like or dislike, instead of giving specific preference weights would be more appropriate in this sense.

 In this work, the user provides not only the positive feedbacks, but also the negative feedbacks. The new preference weights for the relevant image are obtained from the SVM learning on both kinds of feedbacks.  

<-- Back to Table of Contents>


3. Support Vector Machines

Support Vector Machines (SVM) [7] is an approximate implementation of the structural risk minimization (SRM) principle. It creates a classifier with minimized Vapnik-Chervonenkis (VC) dimension.  SVM minimizes an upper bound on the generalization error rate. The error rate is bounded by the sum of the training-error rate and a term that depends on the VC dimension.  The SVM can provide a good generalization performance on pattern classification problems without incorporating problem domain knowledge.

 Consider the problem of separating the set of training vectors belonging to two classes, e.g., image retrieval problem, +1 denotes positive example, -1 denotes the negative example.

                       (8)

whereis an input pattern, e.g., feature vector in image retrieval, for the ith example and yi is the label. If the two classes are linearly separable, the hyperplane that does the separation is:

                                              (9)

where  is an input vector, w is a weight vector, and b is a bias. The goal of a support vector machine is to find the parameter wo and bo for a optimal hyperplane to maximize the distance between the hyperplane and the closest data point:

                     (10)

For a given wo and bo, the distance of a pointfrom the optimal hyperplane defined in (10) is

                                 (11)

 

A linear separable example in 2D is illustrated in Figure 1. If the two classes are non-linearly separable, the input vectors should be nonlinearly mapped to a high-dimensional feature space by an inner-product kernel function. Table 1 shows three typical kernel functions [8]. An optimal hyperplane is constructed for separating the data in the high-dimensional feature space. This hyperplane is optimal in the sense of being a maximal margin classifier with respect to the training data.

Table 1. Types of kernel functions

 

The distance from the hyperplane determined by the support vectors can be used to measure how much an example belonging to one class is different from the other class. This motivates us to use SVM for automatically generating preference weights for relevant images. In this paper, we use positive and negative examples for image retrieval. The positive examples are preferred images, while the negative examples are undesirable images. Intuitively, the farther the positive examples from the hyperplane, the more distinguishable they are from the negative examples. Thus, when we decide their preference weights, they are assigned with larger weights. In numerical calculation, the distances of the positive examples to the hyperplane are used to decide the weights of the relevant images, i.e., V(k) in Eq. (5). In our experiments, the distance is normalized to the range 10-100. Those negative examples will be removed from the retrieved image list by simply setting their distance similarity value (section 2.2) to a very large number.

<-- Back to Table of Contents>


4. Results

To validate the effectiveness of the SVM approach for CBIR, we have performed experiments over COREL dataset. COREL dataset contains more than 17,000 images. In our current retrieval system, the visual features used include color (color moments), texture (wavelet moments) and structure.

Figure 2. Retrieval results using positive feedbacks only (rank from left to right and from top to bottom, the top 4 are positive feedbacks)

Figure 3. Retrieval results using SVM learning (rank from left to right and from top to bottom, the top 3 and the 6th image are the positive feedbacks, four negative examples are used and not shown in the retrieval results, polynomial kernel function with  is used)

Figure 2 shows an example of the flower retrieval results using positive feedbacks only. Figure 3 shows the flower retrieval results using SVM approach. Same four positive feedbacks (top 4 in Fig. 2 and 1st, 2nd, 3rd and 6th image in Fig. 3) are used for this experiment. Other four negative examples are selected for negative feedback. Through SVM learning, two positive examples, i.e., the 3rd and 6th image (Fig.3) and two negative examples are selected as supported vectors. Table 2 shows the relevant image weights updated by SVM learning.

 

    Table 2 Weights for relevant images through SVM learning

Image ID

1st

2nd

3rd

6th

Weights

100

30.5

10

10

 

Since the 3rd and 6th image in Figure 2 are selected as support vectors, less weights are assigned. The 1st image has the largest distance to the hyperplane determined by the support vectors and therefore is most distinguishable from the negative examples. The largest weight 100 is assigned to it. Compared to Fig.2, clearly, more flowers are returned in Fig. 3 in the top 20 returned images.

 Figure 4 shows the number of hits in top 20 returned images for SVM approach and relevance feedback for different image classes, e.g., car, flower, airplane, etc. Five positive feedbacks are used for relevance feedback and five negative feedbacks are used for SVM approach. Polynomial kernel function with is use for SVM.  Clearly, in most of cases, SVM approach has improvement over using positive feedback alone.

Figure 4. Compare SVM approach and relevance feedback with positive examples

<-- Back to Table of Contents>


5. Conclusions

In this paper, a novel approach for CBIR to automatically update relevant image weights through SVM learning based on the both positive and negative feedback is proposed.  This not only releases the users from providing accurate preference weight for each relevant image but also utilizes the negative information.   Reasonable better results are obtained compared to that of positive feedbacks only.
However, in order for this approach to be effective, a certain number (at least 4 for each) of positive and negative feedbacks is needed for SVM learning. This is the major limit of this approach.  Therefore for small number of query images, i.e., less than 4, we still use the positive feedbacks only in the first run. After more relevant images are returned, the proposed approach can be performed to get more refined results. There are two possible improvements of this approach that will involve incremental learning: (1) Every time the learning is performed on all of the positive and negative examples that are selected in the history; or  (2) The support vectors calculated in the last feedback is combined with the positive and negative examples selected in the current feedback to decide a new optimal hyperplane.  Optimal selection of SVM kernel functions still needs to be investigated in the future.

<-- Back to Table of Contents>


6. Acknowledgements

 This work was supported in part by National Science Foundation Grant CDA 96-24396, and in part by National Science Foundation Grant EIA-99-75019

<-- Back to Table of Contents>


References

[1]     Y. Rui, T. S. Huang, et al., “Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval”, IEEE Trans. on Circuits and Video Tech., Special Issue on Segmentation Description, and Retrieval of Video Content, 8(5), Sep. 1998

[2]     Y. Rui, T. S. Huang, S. Mehrotra, “Content-Based Image Retrieval With Relevance Feedback in MARS”, Proc. Of IEEE Intl. Conf. on ICIP’97, Santa Barbara, CA, 1997

[3]     M. Flickner, H. Sawhney, et al., “Query by Image and Video Content: The QBIC System”, IEEE Computer, 1995

[4]     A. Pentland, R. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of Image Databases”, Intl. Journal of Computer Vision, 1996

[5]     R. Picard, “A Society of Models for Video and Image Libraries”, TR No.360, Media Lab, MIT, 1996.

[6]     C. Buckley and G. Salton, “Optimization of relevance feedback weights”, in Proc. Of SIGIR’95

[7]    V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995

[8]    M. O. Stitson, J. A. E. Weston, et al., “ Theory of Support Vector Machine”, Technical report, CSD-TR-96-17, Univ. of London

<-- Back to Table of Contents>