IEEE
International Conference on Multimedia & Expo 2001
Tokyo, Japan
Electronic Proceedings
© 2001
IEEE
In traditional content-based image retrieval (CBIR), the retrieved images are displayed in order of decreasing similarities from the query and can be considered as a 1-D display. In this paper, a novel optimized technique is proposed to visualize the retrieved images not only in order of their decreasing similarities but also according to their mutual similarities visualized on a 2-D screen. Principle Component Analysis (PCA) is first performed on the retrieved images to project the images from the original high dimensional feature space to 2-D screen. The result of PCA analysis is denoted as a PCA Splat. To minimize the overlap between images, a constrained nonlinear optimization approach is used. The experimental results show a more perceptually intuitive and informative visualization of the retrieval results. The proposed technique not only provides a better understanding of the query results but also aids the user in forming a new query.
With the advances in technology to generate, transmit and store large amounts of digital images and video, research in content-based image retrieval (CBIR) has gained more attention recently. In CBIR, images are indexed by their visual contents such as color, texture, etc. Many research efforts have been made to extract these low level features [1, 2,3], evaluate distance metrics [4, 5] and look for efficient searching schemes [6, 7].
Traditional image retrieval systems display the returned images as a list, sorted by decreasing similarity to the query. The traditional display has one major drawback. The images are ranked by similarity to the query, and relevant images can appear at separate places in the list. Often the user would like to have a global view of the returned images in a way that reflects the relations among the images in the returned set. Only having an idea of surroundings can offer an indication of where to go next. The wider the horizon, the more efficient the new query will be formed. In [8], Rubner proposed a 2-D display technique based on multi-dimensional scaling (MDS) [9]. A global view of the images is achieved that reflects the relations among the images in the retrieved images.
MDS is a nonlinear transformation that minimizes the stress between high dimensional feature space and low dimensional display space. However, MDS is rotation invariant, non-repeatable, and especially slow to implement. These drawbacks make MDS unattractive for real time browsing or visualization of images.
In this paper, we propose an alternative 2-D display scheme based on Principle Component Analysis (PCA) [10]. Moreover, a novel optimized technique is proposed based on the PCA result, which provides a more perceptually intuitive and informative visualization of the retrieved images.
The rest of paper is organized as follows. Our content-based image retrieval system will be discussed in Section 2 and the optimized content-based visualization technique will be presented in Section 3. Results will be shown in Section 4. Summary will be given in Section 5.
Multimedia Analysis and Retrieval System (MARS) is an integrated relevance-feedback architecture for content-based image retrieval [11, 12].
There are three visual features used in our system: color moments [1], wavelet-based texture [2], and
water-filling edge feature [3]. The color space we use is HSV because of its de-correlated coordinates and its perceptual uniformity
[1]. We extract the first three moments (mean, standard deviation and skewness) from the three-color channels and therefore have a
color feature vector of length
.
For wavelet-based texture, the original image is fed into a wavelet filter bank and is decomposed into 10 de-correlated sub-bands. Each sub-band captures the characteristics of a certain scale and orientation of the original image. For each sub-band, we extract the standard deviation of the wavelet coefficients and therefore have a texture feature vector of length 10.
For water-filling edge feature vector, we first pass the original images through an edge detector to generate their corresponding edge map. We extract eighteen (18) elements from the edge maps, including max fill time, max fork count, etc. For a complete description of this edge feature vector, interested readers are referred to [3].
Figure 1 shows an example of the retrieved images in MARS system before relevance feedback. The database is a collection of 534 images. The 1st image (building) is the query. The other 9 relevant images are ranked in 2nd, 3rd, 4th, 5th, 9th, 10th, 17th, 19th and 20th places, respectively.
![[IMG]](index_files/figure1_mars.jpg)
Figure 1.
The retrieved images in MARS system (ranked from left to right, from top to bottom, the top left 1 is the query)
The purpose of our proposed content-based visualization of the retrieved images is augmenting a user's perception so as to grasp a large information space that cannot be perceived by traditional sequential display in rank order of their visual similarities. Traditional sequential display is a simple 1-D visualization. In this section, we propose an alternative technique to MDS in [8] that displays mutual similarities on a 2-D screen based on visual features extracted from images. The retrieved images are displayed not only in ranked order of similarity from the query but also according to their mutual similarities, so that similar images are grouped together rather than being scattered along the entire returned 1-D list of images.
To create such a display, Principle Component Analysis (PCA) [10] is first performed on the retrieved images to project the images from the high dimensional feature space to 2-D screen. Image thumbnails are placed on the screen so that the screen distances reflect as closely as possible the similarities between the images. If the computed similarities from the high dimensional feature space agree with our perception, and if the resulting feature dimension reduction preserves these similarities reasonably well, then the resulting display should be informative and intuitive.
In our experiments, the 37 visual features (9 color moments, 10 wavelet moments and 18 water-filling features) are pre-extracted from the image database and stored off-line. The 37 visual features can be formed into a single feature vector and projected to the 2-D screen based on Principle Component Analysis (PCA). PCA is a very fast linear transformation that achieves the maximum distance preservation from the original high dimensional feature space to 2-D screen among all the linear transformations [10]. This layout is denoted as a PCA Splat.
Figure 2(a) shows an example of PCA Splats for the top 20 retrieved images shown in Figure 1. Clearly the relevant images are close to each other in
the PCA Splat while these images are not consecutive to each other in the 1-D display in order of the decreasing similarities in Figure 1. PCA Splats
also convey mutual distance information about all
similarities between images while ranked 1-D display in Fig. 1 does
not provide this perceptually informative visualization to the user; neither does the 2-D random display in Figure 2(b).
![[IMG]](index_files/figure2a_pca.jpg)
(a) PCA Splat
![[IMG]](index_files/figure2b_random.jpg)
(b) 2-D random display
Figure 2.Different display schemes
However, one drawback of PCA Splat is that some images are partially or totally overlapped which makes it difficult to view all the images at the same time. The overlap will be even worse when the number of retrieved images becomes larger. To solve the overlapping problem between the retrieved images, a novel optimized technique is proposed in this section.
Given the sets of the retrieved images and their corresponding sizes and positions, our optimizer tries to find a solution that places the images at
the appropriate positions while deviating as little as possible from their initial PCA Splat positions. Assume the number of images is N. The image
positions are represented by their center coordinates
,
, and the initial image
positions are denoted as
,
. The minimum and maximum coordinates of
the 2-D screen are
. The image size is represented by its radius
for simplicity,
and the maximum and minimum image size is
and
in radius, respectively. The initial image size is
,
.
To minimize the overlap, one can move the images away from each other to decrease the overlap between images, but this will increase the deviation of
the images from their initial positions. Large deviation is certainly undesirable because the initial positions provide important information about
mutual similarities between images. So there is a trade-off problem between minimizing overlap and minimizing deviation. Without increasing the overall
deviation, an alternative way to minimize the overlap is to simply shrink the image size as needed, down to a minimum size limit. The image size will
not be increased in the optimization process because this will always increase the overlap. For this reason, the initial image size is assumed to be
.
The total cost function is designed as a linear combination of the individual cost functions taking into account two factors. The first factor is to keep the overall overlap between the images on the screen as small as possible. The second factor is to keep the overall deviation from the initial position as small as possible.
(1)where
is the cost function of the overall overlap and
is the cost function
of the overall deviation from the initial image positions, S is a scaling factor which brings the range of
to
the same range of
and S is chosen to be
.
is a weight and
. When
is zero, the deviation of
images is not considered in overlapping minimization. When
is less than one, minimizing overall overlap is more
important than minimizing overall deviation, and vice versa for
is greater than one.
The cost function of overall overlap is designed as
(2)
(3)
where
, u is a measure of overlapping. When
, there is no overlap
between the ith image and the jth image, thus the cost is 0. When
, there is partial overlap between
the ith image and the jth image. When
, the ith image and the jth image are totally
overlapped.
is a curvature-controlling factor.
Figure 3 shows the plot of
. With the increasing value of u
, the cost
of overlap is also increasing.
![[IMG]](index_files/figure3_overlap.jpg)
Figure 3.Cost function of overlap function f(p)
From Fig. 3,
in Eq. (3) is calculated by setting T=0.95 when
.
(4)
The cost function of overall deviation is designed as
(5)
(6)
where
, v is the measure of deviation of the ith images from its initial position.
is a curvature-controlling factor.
and
are the
optimized and initial center coordinates of the ith image, respectively,
.
Figure 4 shows the plot of g(p). With the increasing value of v, the cost of deviation is also increasing.
![[IMG]](index_files/figure4_deviation.jpg)
Figure 4.Cost function of function g(p)
From Fig. 4,
in Eq. (6) is calculated by setting T=0.95 when
. In our
work, maxsep is set to be
.
(7)
The optimization process is to minimize the total cost J by finding a (locally) optimal set of size and image positions. The nonlinear optimization method was implemented by an iterative gradient descent method (with line search). Once converged, the images are redisplayed based on their optimized sizes and positions.
Figure 5 (a) shows the optimized PCA Splat for Fig. 2(a). Clearly, the overlapping is minimized while the relevant images are still close to each other to allow a global view. With such a display, the user can see the relations between the images, better understand how the query performed, and subsequently express future queries more naturally.
![[IMG]](index_files/figure5a_pca_opt.jpg)
(a) Optimized PCA Splat
![[IMG]](index_files/figure5b_random_opt.jpg)
(b) Optimized random display
Figure 5.Optimized displays
An improved visualization can also be achieved when this technique is applied to the random display of Fig. 2(b) simply because the overlapping is minimized as shown in Figure 5(b).
In this paper, we proposed an optimized content-based visualization technique to generate a 2-D display of the retrieved images for content-based image retrieval. The results show a more perceptually intuitive and informative visualization of the retrieval results is achieved by the proposed technique, which not only provides a better understanding for the query results but also aids user forming a new query.
The proposed content-based visualization method can be easily applied to project the images from high-dimensional feature space to a 3-D space for more advanced visualization and navigation. Features can be multi-modal, expressing individual visual features, e.g., color alone, audio features and semantic features, e.g., keywords, or any combination of the above. The proposed layout optimization technique is also a quite general approach and can be applied to avoid overlapping of any type of images, windows, frames and boxes.
For the future work, relevance feedback can be achieved based on the visualization of the optimized PCA Splat. By grouping the relevant images together, a new user-modeling technique will be proposed.
This work was partially supported by Mitsubishi Electric Research Laboratory (MERL) and NSF Grant CDA 96-24396, EIA-99-75019. The authors would like to thank Xiang Zhou for helpful discussions regarding PCA Splats.