IEEE International Conference on Multimedia & Expo 2000
New York, New York
Electronic Porceedings
©l 2000 IEEE
Qi Tian, Pengyu Hong, Thomas S. Huang
Abstract
Relevance feedback [1] has been a powerful tool for interactive Content-Based Image Retrieval (CBIR). During the retrieval process, the user selects the most relevant images and provides a weight of preference for each relevant image. User’s high level query and perception subjectivity can be captured to some extent by dynamically updated low-level feature weights based on the user’s feedback. However, in MARS [2] only the positive feedbacks, i.e., relevant images are considered. In this paper, a novel approach is proposed by providing both positive and negative feedbacks for Support Vector Machines (SVM) learning. The SVM learning results are used to update the weights of preference for relevant images. Priorities are given to the positive feedbacks that have larger distances to the hyperplane determined by the support vectors. This approach releases the user from manually providing preference weight for each positive example, i.e., relevant image as before. Experimental results show that the proposed approach has reasonable improvement over relevance feedback with possible examples only.
Content-Based Image Retrieval (CBIR) has become a very active research area since the 1990’s [3, 4]. Most of the early research effort focused on finding the “best” image features or representations. Retrieval was performed by summing the distance of individual feature representation with fixed weights. Although this isolated approach establishes the basis of CBIR, the usefulness of such systems was limited. A human-computer interactive approach to CBIR based on relevance feedback was proposed in [1].
Unlike the isolated approach, where the user has to precisely decompose the information need into precise weights, the interactive approach allows the user to submit a coarse initial query and continuously refine his information need via relevance feedback. This approach greatly reduces the user’s effort of composing a query and captures the user’s information need more precisely.
However, only using the positive examples as feedback [1, 2] may lose the information implied by the negative examples. It is our expectation that by utilizing both positive and negative feedbacks, the user’s perception subjectivity can be captured more accurately than relevance feedback with positive examples only. Negative examples are also used in FourEyes [5], where the user chooses the both positive and negative examples. FourEyes looks at all of the models and determines which model or combination of models best describe the positive examples chosen by the user, while satisfying the constraints of the negative examples. In MARS [2], manually providing preference weights of the relevant image may not be accurate and is sometimes a burden for the user. The Support Vector Machine approach will be applied in this paper to release the user from the above burden and automatically generate preference weights of the relevant images.
The rest of
paper is organized as follows. Relevance feedback and similarity measure in
MARS [2] is discussed in Section 2. Section 3 describes
Support Vector Machines
(SVM) with its application to update the weights of the relevant images in
CBIR. Experimental results and conclusions are given in Section 4 and 5,
respectively.
<-- Back to Table of Contents>
2. Relevance Feedback in
CBIR
2.1 Relevance Feedback
The overall similarity
for the jth
image in the database is obtained by linearly combining individual feature,
’s similarity distance,
:
(3)
where N is the total number of the images in the database. Mahalanobis distance is used as the similarity measurement:
![]()
(4)
where
is the ith feature vector of the jth image in
the database,
is the feature vector
of the query, and
is the covariance
matrix of the ith feature components of the relevant images. The element of
is defined as:
(5)
where V(k) is the
preference weight for the kth relevant image (positive feedback),
and
are the mth and nth component values of the ith
feature of the kth relevant image, respectively.
and
are the mth
and nth component values of the ith feature of the query,
respectively. NR is the total number of positive feedbacks and
.
is an identity matrix if
.
Low-level feature weight
in Eq. (3) is updated
by Eq. (6) and (7).
(6)
(7)
Higher weight is given to the feature that
has smaller average distance
, based on the relevant images. This is because the relevant images are more similar (have
smaller distance) in this feature than in other feature.
In this approach, the user manually provides
his or her preference weight, i.e.,
in Eq. (5) for the relevant images. The preference weights
denote the degree of how much the user likes the images. A query is then the weighted average, i.e.,
centroid of the relevant images in the feature space. However, in some cases,
there exist examples that are not desired by the user but closer to the query
than some of the relevant images in the feature space. These examples are
considered as negative examples in this work. Usually, the problem to separate
the negative examples from the positive examples turns out to be finding a
nonlinear classifier. Classification can’t be done from the positive feedbacks
alone. Therefore, the negative examples will still be mixed with the positive
examples after the relevance feedback. Moreover, sometimes it is hard for the
user to tell the system about his or her subjectivity by providing preference
weight for each relevant image. The query may be harmed by improper assignments
of preference weights. Therefore, a weak classification of preference, e.g.,
like or dislike, instead of giving specific preference weights would be more
appropriate in this sense.
In this work, the user provides not only the positive feedbacks, but also the negative feedbacks. The new preference weights for the relevant image are obtained from the SVM learning on both kinds of feedbacks.
<-- Back to Table of Contents>
Consider the problem of separating the set of training vectors belonging
to two classes, e.g., image retrieval problem, +1 denotes positive example, -1
denotes the negative example.
(8)
where
is an input pattern, e.g., feature vector in image retrieval,
for the ith example and yi is the label. If the two
classes are linearly separable, the hyperplane that does the separation is:
(9)
where
is an input vector, w is a weight
vector, and b is a bias. The goal of a support vector machine is to find
the parameter wo and bo for a optimal
hyperplane to maximize the distance between the hyperplane and the closest data
point:
(10)
For
a given wo and bo, the distance of a point
from the optimal hyperplane defined in (10) is
(11)
A linear separable example in 2D is illustrated in
Figure 1. If the two classes are non-linearly separable, the input vectors
should be nonlinearly mapped to a high-dimensional feature space by an
inner-product kernel function
. Table 1 shows three typical kernel functions [8]. An
optimal hyperplane is constructed for separating the data in the
high-dimensional feature space. This hyperplane is optimal in the sense of
being a maximal margin classifier with respect to the training data.
Table 1. Types of
kernel functions

The distance from the hyperplane determined by the
support vectors can be used to measure how much an example belonging to one
class is different from the other class. This motivates us to use SVM for
automatically generating preference weights for relevant images. In this paper,
we use positive and negative examples for image retrieval. The positive
examples are preferred images, while the negative examples are undesirable
images. Intuitively, the farther the positive examples from the hyperplane, the
more distinguishable they are from the negative examples. Thus, when we decide
their preference weights, they are assigned with larger weights. In numerical
calculation, the distances of the positive examples to the hyperplane are used
to decide the weights of the relevant images, i.e., V(k) in Eq. (5). In
our experiments, the distance is normalized to the range 10-100. Those negative
examples will be removed from the retrieved image list by simply setting their
distance similarity value (section 2.2) to a very large number.
<-- Back to Table of Contents>

Figure 2. Retrieval results using positive feedbacks only (rank from left to right and from top to bottom, the top 4 are positive feedbacks)

Figure 3.
Retrieval results using SVM learning (rank from left to right and from top
to bottom, the top 3 and the 6th image are the positive feedbacks, four
negative examples are used
and not shown in the retrieval results, polynomial kernel function with
is used)
Figure 2 shows an example of the flower retrieval results using positive feedbacks only. Figure 3 shows the flower retrieval results using SVM approach. Same four positive feedbacks (top 4 in Fig. 2 and 1st, 2nd, 3rd and 6th image in Fig. 3) are used for this experiment. Other four negative examples are selected for negative feedback. Through SVM learning, two positive examples, i.e., the 3rd and 6th image (Fig.3) and two negative examples are selected as supported vectors. Table 2 shows the relevant image weights updated by SVM learning.
Table 2 Weights for relevant images
through SVM learning
|
Image ID |
1st |
2nd |
3rd |
6th |
|
Weights |
100 |
30.5 |
10 |
10 |
Since the 3rd and 6th image in Figure 2 are selected
as support vectors, less weights are assigned. The 1st image has the largest
distance to the hyperplane determined by the support vectors and therefore is
most distinguishable from the negative examples. The largest weight 100 is
assigned to it. Compared to Fig.2, clearly, more flowers are returned in Fig. 3
in the top 20 returned images.
Figure 4 shows
the number of hits in top 20 returned images for SVM approach and relevance
feedback for different image classes, e.g., car, flower, airplane, etc. Five
positive feedbacks are used for relevance feedback and five negative feedbacks
are used for SVM approach. Polynomial kernel function with
is use for SVM.
Clearly, in most of cases, SVM approach has improvement over using
positive feedback alone.

Figure 4. Compare
SVM approach and relevance feedback with positive examples
<-- Back to Table of Contents>
<-- Back to Table of Contents>
This work was supported in part by National Science Foundation Grant CDA 96-24396, and in part by National Science Foundation Grant EIA-99-75019
<-- Back to Table of Contents>
[1]
Y.
Rui, T. S. Huang, et al., “Relevance Feedback: A Power Tool for Interactive
Content-Based Image Retrieval”, IEEE Trans. on Circuits and Video Tech.,
Special Issue on Segmentation Description, and Retrieval of Video Content,
8(5), Sep. 1998
[2]
Y.
Rui, T. S. Huang, S. Mehrotra, “Content-Based Image Retrieval With Relevance
Feedback in MARS”, Proc. Of IEEE Intl. Conf. on ICIP’97, Santa Barbara,
CA, 1997
[3]
M.
Flickner, H. Sawhney, et al., “Query by Image and Video Content: The QBIC
System”, IEEE Computer, 1995
[4]
A.
Pentland, R. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of
Image Databases”, Intl. Journal of Computer Vision, 1996
[5]
R.
Picard, “A Society of Models for Video and Image Libraries”, TR No.360, Media
Lab, MIT, 1996.
[6] C. Buckley and G. Salton,
“Optimization of relevance feedback weights”, in Proc. Of SIGIR’95
[7]
V.
Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New
York, 1995
[8]
M.
O. Stitson, J. A. E. Weston, et al., “ Theory of Support Vector Machine”, Technical
report, CSD-TR-96-17, Univ. of London
<-- Back to Table of Contents>