Kaushik Chakrabarti

E-mail: sharad@ics.uci.edu
Supporting Spatial Index Structures as Access Methods in a Database System

Abstract not available.





IDFL
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim

E-mail: sharad@ics.uci.edu
Approximate Query Answering Using Wavelets

Abstract
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's Decision Support Systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This, of course, guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques (1) provide approximate answers of better quality than either sampling or histograms, (2) offer query execution-time speedups of more than two orders of magnitude, and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data.




IDFL
Kaushik Chakrabarti
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Phantom Protection in Multidimensional Access Methods

Abstract
The access methods supported by existing database management systems lag far behind the requirements posed by emerging database applications. Once of the primary reasons is the lack of efficient techniques to provide transactional access to data using multidimensional access methods. Concurrent access to data through a multidimensional data structure introduces the problem of protecting ranges specified in the retrieval from phantom insertion and deletions (phantom problem). Existing approaches for phantom protection in B-trees (namely, key-range locking) cannot be easily generalized to multidimensional data structures since they rely on a total order over the key space on which the B-tree is designed. This paper presents four approaches to phantom protection in R-trees that illustrate a tradeoff between algorithmic complexity, degree of concurrency and the locking overhead. Although the approaches are presented in the context of R-trees, they are applicable to other tree-based multidimensional access methods as well.




IDFL
Kaushik Chakrabarti
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Dynamic Granular Locking Approach to Phantom Protection in R-trees

Abstract
Over the last decade, the R-tree has emerged as one of the most robust multidimensional access methods. However, before the R-tree can be integrated as an access method to a commercial strength data management system, efficient techniques to provide transactional access to data via R-trees need to be developed. Concurrent access to data through a multidimensional data structure introduces the problem of protecting ranges specified in the retrieval from phantom insertion and deletions (phantom problem). Existing approaches to phantom protection in B-trees (namely, key-range locking) cannot be applied to multidimensional data structures since they rely on a total order over the key space on which the B-tree is designed. This paper presents a dynamic granular locking approach to phantom protection in R-trees. To the best of our knowledge, this paper provides the first solution to the phantom problem in multidimensional access methods based on granular locking.




IDFL
Kaushik Chakrabarti
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Concurrency Control in R-trees

Abstract
The nature and types of information in a dynamic battlefield environment ranges from georeferenced satellite images and terrain elevation to maps containing terrain features like roads, enemy unit deployment, activities and targets to spatio-temporal objects like logistic, tactical, and collection management plans. Efficient processing of queries over such objects in a database requires support of access paths using an effective multidimensional data structure. While a large body of research exists on multidimensional data structures (grid files, R-trees, hB-trees to name a few), none of them have been fully integrated into any commercial-strength database management system. One of the primary reasons is the lack of effective techniques to support concurrent access and manipulations to data using these data structures. This paper identifies problems in supporting concurrent operations over multidimensional data structures and sketches solutions in the context of R-trees. R-tree is a popular multidimensional data structure that has been incorporated into Illustra data management system and has also been implemented as part of the Shore object management system.




IDFL
Kaushik Chakrabarti
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Local Dimensionality Reduction: a new approach to indexing high dimensional spaces

Abstract
Many emerging application domains require database systems to support efficient access over highly multidimensional data sets. The current state-of-the-art technique for indexing high dimensional data is to first reduce the dimensionality of the data using principal components analysis and then index the reduced-dimensionality space using a multidimensional index structure. The above technique, referred to as global dimensionality reduction (GDR), works well when the data set is globally correlated—that is, when most of the variation in the data can be captured by a few dimensions. In practice, data sets are often not globally correlated. In such cases, reducing the data dimensionality using GDR causes significant loss of distance information, resulting in a large number of false positives and, hence, a high query cost. Even when a global correlation does not exist, there may exist subsets of data that are locally correlated. In this paper we propose a technique called local dimensionality reduction (LDR) that tries to find local correlations in the data and performs dimensionality reduction on the locally correlated clusters of data individually. We develop an index structure that exploits the correlated clusters to efficiently support point, range, and k-nearest neighbor queries over high dimensional data sets. Our experiments on synthetic as well as real-life data sets show that our technique (1) reduces the dimensionality of the data with significantly lower loss in distance information compared to GDR and (2) significantly outperforms the GDR, original space indexing and linear scan techniques in terms of the query cost for both synthetic and real-life data sets.




IDFL
Kaushik Chakrabarti, Sharad Mehrotra, Michael Ortega, and Kriengkrai Porkaew

E-mail: sharad@ics.uci.edu
WebMARS: a Multimedia Search Engine for the World Wide Web

Abstract
With advances in digital multimedia and Web technologies, there is an increasing trend in documents over the Web to incorporate several media types such as text, images, audio and video. While some prototype as well as commercial Web search engines (e.g., Alta Vista) have started to incorporate image search extensions, few systems have properly explored the use of all available information in different media types for retrieval into a single integrated framework. To address this limitation, in the MARS project, we have developed a web search engine, WebMARS, that exploits both textual information and visual information for HTML document retrieval. The integrated retrieval framework allows WebMARS to exploit both the textual (e.g., words) and visual content (e.g., images) of Web pages in answering user's queries. WebMARS not only searches over multiple media types but also utilizes cross-media information to improve the search performance.




IDFL
Kaushik Chakrabarti, Sharad Mehrotra, Michael Ortega, Kriengkrai Porkaew,
and Robert Winkler


E-mail: sharad@ics.uci.edu
Processing Uncertainty Queries in Database Management Systems

Abstract
Emerging applications, including many military applications, require explicit mechanisms to represent and process uncertainty in queries and in the data stored in databases. Most current approaches to supporting uncertainty in queries layer a reasoning component on top of existing relational database management systems (DBMSs) which resolves the uncertainty in queries outside of the DBMS. While the layered approach is attractive due to its simplicity and since it requires minimal extensions to existing DBMS technology, it has some fundamental shortcomings which limit its usefulness to only simplistic applications. This paper proposes an extended relational model together with a suitably extended relational algebra as an alternative mechanism to incorporating uncertainty in queries. In contrast to the layered approach, the proposed model allows uncertainty to permeate database processing overcoming many of its limitations. The paper identifies challenging research issues that we are currently addressing in developing the proposed framework.




IDFL
Kaushik Chakrabarti,
Michael Ortega-Binderberger,
Kriengkrai Porkaew, Peng Zuo, and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Similar Shape Retrieval in MARS

Abstract
This paper presents a novel approach to representing 2-d shapes that adaptively models different portions of the shape at different resolutions, having higher resolution where it improves the quality of the representation and lower resolution elsewhere. The proposed representation is invariant to scale, translation and rotation. The representation is amenable to indexing using existing multidimensional index structures and can thus support efficient similarity retrieval. Our experiments show that the adaptive resolution technique performs significantly better compared to the fixed resolution approach previously proposed in the literature.




IDFL
Kaushik Chakrabarti, Kriengkrai Porkaew and Sharad Mehrotra

E-mail: sharad@ics.uci.edu
Efficient Query Refinement in Multimedia Databases

Abstract
Increasing application demands are pushing database management systems (DBMSs) towards providing adequate and efficient support for content-based retrieval over multimedia objects (e.g., images, video, audio, time-series, spatial and spatio-temporal data). Recently, several powerful models for multimedia similarity retrieval have been proposed. An important aspect of these models is the notion of query refinement: a technique that allows the users to interactively specify their information need to the system by providing relevance ranking on example objects. Query refinement has several motivations. First, the 'starting' query may only partially capture the user's information need. The user may find better examples among the answers returned to the starting query which then become the basis of the 'refined' query. Second, multimedia objects are represented as a collection of features. The relative importance of these features in computing the similarity between objects (inter-feature weights) as well as the notion of similarity (or distance) with respect to each of these features (intra-feature weights) is subjective to the user and is not known right away. Query refinement helps the system to dynamically tune these weights to best suit the user's perception based on her feedback. While there has been a lot of research on models for similarity retrieval and query refinement, to the best of our knowledge, no research exists on developing techniques to support query refinement efficiently in a DBMS. Done naively, each 'refined' query can be treated as a 'starting' query and evaluated from scratch. This paper explores alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. As a result, most of the execution cost can be saved by appropriately exploiting the information generated during the previous iterations of the query. Our experiments over a large image/text collection (COREL dataset) show that the proposed techniques provide significant improvement in performance.




IDFL
Kaushik Chakrabarti, Kriengkrai Porkaew and Sharad Mehrotra

E-mail: sharad@ics.uci.edu
Refining Top-K Selection Queries based on User Feedback

Abstract
In many applications, users specify target values for certain attributes or features, without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries; that is, which objects are relevant to the user and which are not depends on the perception of the user. Because of the subjective nature of top-k queries, the answers returned by the system to a user query often does not satisfy the user's need right away; either because the weights and the distance functions associated with the features do not accurately capture the user's perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each refined query can be treated as a starting query and evaluated from scratch. This paper explores alternative approaches that significantly reduce the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia dataset show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naïve approach and is more than an order of magnitude faster than a simple sequence scan.




IDFL
Kaushik Chakrabarti,
Kriengkrai Porkaew, Michael Ortega
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Evaluating Refined Queries in Top-k Retrieval Systems

Abstract
In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of ``top k'' objects that best match the specified feature values. User subjectivity is an important aspect of such queries i.e. which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-$k$ queries, the answers returned by the system to an user query often do not satisfy the user's need right away; either because the weights and the distance functions associated with the features do not accurately capture the user's perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each `refined' query can be treated as a `starting' query and evaluated from scratch. This paper explores alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia dataset show that the proposed techniques save more than 80% of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan.




IDFL
Kexiang Hu, Sharad Mehrotra, and Simon Kaplan

E-mail: sharad@ics.uci.edu
An Optimized Two-Safe Approach to Maintaining Remote Backup Systems

Abstract
In a remote backup system, transaction processing takes place at the primary and the log records generated at the primary are propagated to the remote backup which uses them to reconstruct a recent state of the database at the primary. In the event of a primary failure, the backup system takes over transaction processing without causing users to observe a breach in service. Existing remote backup algorithms can be classified as either 2-safe or 1-safe based on whether or not the commitment of the transaction at the primary is synchronized with the propagation of its log records to the backup. Both approaches have limitations. While 1-safe approaches offer higher primary system throughput and lower transaction response time, unlike 2-safe approaches, they do not guarantee persistence of transactions during backup takeover. In this paper, we present an optimized 2-safe approach to maintaining remote backup systems that combines the desirable features of both 1-safe and 2-safe approaches - high throughput (similar to 1-safe) without risking the loss of transactions (similar to 2-safe). Algorithms are developed for multiple processor primary and backup system environments.




IDFL
Thomas S. Huang, Sharad Mehrotra,
and Kannan Ramchandran


E-mail: huang@ifp.uiuc.edu
Multimedia Analysis and Retrieval System (MARS) Project

Abstract
To address the emerging needs of applications that require access to and retrieval of multimedia objects, we have started a Multimedia Analysis and Retrieval Systems (MARS) project at the University of Illinois. The project brings together researchers interested in the fields of computer vision, compression, information management and database systems with the singular goal of developing an effective multimedia database management system. As a first step towards the project, we have designed and implemented an image retrieval system. This paper describes the novel approaches towards image segmentation, representation, browsing, and retrieval supported by the developed system. Also described are the directions of future research we are pursuing as part of the MARS project.




IDFL
Thomas S. Huang, Sharad Mehrotra,
and Yueting Zhuang


E-mail: huang@ifp.uiuc.edu
A Conceptual Framework for Multimedia Reasoning

Abstract
Up to now, AI technology is dominated by the Physical Symbolic System(PSS), in which symbolic information is used as the medium for reasoning. In these approaches, information other than symbol, such as image, graphics, and even video should first be represented by symbols, and after reasoning, the symbolic result is again changed into its original media form.

In this paper, we will propose a new form of reasoning method called multimedia reasoning (MR), a kind of reasoning that is based on the different media such as text, image, video, audio and whatever. By introducing the concept of multimedia transformation theory (MTT), it presents a conceptual framework for multimedia reasoning. In the end, it discusses the importance and potentials of MR in military application.




IDFL
Jitendra Kothari, Ed Grossman,
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Neighborhoods: A Framework for Enabling Web-based Synchronous Collaboration and Hierarchical Navigation

Abstract
The World-Wide Web (WWW) is an extremely effective mechanism for sharing information throughout the world via a web of links. These links allow anyone with a connection to the Internet to unearth large amounts of information on multitudes of topics. However, access to this information is asynchronous, with no way for users to interact with each other in real time. We have developed Neighborhoods to support synchronous interaction between users. Related documents can be grouped together into a Neighborhood, which acts as a meeting place where Web users can find and contact others with similar interests. Neighborhoods are based on underlying protocols for creating collections of documents and for establishing collaborative sessions. The collection protocol allows material on the Web to be organized into categories and documents. The neighborhoods protocol adds to this a method for specifying and establishing new collaborative sessions, as well as locating existing ones.




IDFL
I. Lazaridis, S. Mehrotra, K. Porkaew,
and R. Winkler


E-mail: sharad@ics.uci.edu
Database Support for Situational Awareness

Abstract
Providing adequate database support for interactive 3D Visualization and Situational Awareness (SA) is one of the keys to making such systems scale to the complexity of real-world scenarios. It is desirable that SA systems be highly realistic and intuitive to use. From a database perspective these requirements translate into (i) support for large amount of heterogeneous spatio-temporal data (to provide a convincing virtual environment, approximating the complexity of reality), (ii) support for highly dynamic situations in which the data may change frequently, (iii) support for mechanisms to deal with inherent imprecision in the underlying information, and, (iv) powerful query interfaces that allow users to pose varied and interesting queries that are answered with minimum time delay. In this chapter, we outline emerging research results in the field, especially in the context of the SATURN project, pointing to their direct applicability to the Situational Awareness task. We describe techniques for the representation of dynamic and mobile objects and their integration into a database system by reviewing appropriate indexing and query processing techniques. We also deal with spatial aggregate queries and present new techniques to compute their answers in a time-critical environment.




IDFL
Iosif Lazaridis and Sharad Mehrotra

E-mail: sharad@ics.uci.edu
Incorporating Aggregate Queries in Interactive Visualization

Abstract
Spatial aggregate queries involve specifying a region of space and asking for the value of some aggregate function of a quantity for which we have measurements for this given region. An example of such a query is "average rainfall in California". The aggregate function is average, the interesting quantity is rainfall and the spatial region is California. In the context of Virtual Geographic Information System (VGIS) such queries can be extremely interesting. As the user performs a fly-through over the virtual world, it may be worthwhile to pose queries like "highest terrain elevation in my vicinity", "average wind velocity for the next 100 miles of my flight path at a 1 mile resolution" or "total number of vehicles within 10 miles of my position". Since such queries have to aggregate over a significant amount of data while the visualization goes on, it is imperative that they be answered extremely fast so that their outcome can be presented to the user concurrently with his navigation in the virtual environment.

This paper discusses a new data structure, Multi-Resolution-Aggregate Tree (MRA-tree) that can be used to approximately answer spatial aggregate queries effectively. Information about the quantity of interest is stored at various levels of a spatial hierarchy introduced by a space-partitioning tree data structure. Nodes of this structure (MRA-tree) are visited iteratively whenever a query is posed. Our technique handles all the common SQL-type aggregates (MIN, MAX, SUM, COUTN, AVG). We specify how to estimate the aggregate using the nodes of the MRA-tree we have visited, and how to give tight 100% intervals of confidence on the actual value of the aggregate. We also propose a tree traversal strategy so as to reduce the error maximally as more tree nodes are explored. In the experiments we have conducted using an MRA-quadtree using both real and synthetic data sets, we have shown the validity of our approach for fast computation of spatial aggregates even for exact answering, indicating that it can be used in the context of a performance sensitive visualization environment. We are currently incorporating the technique developed into VGIS.




IDFL
Jing Liu and Sharad Mehrotra

E-mail: sharad@ics.uci.edu
Performance Evaluation of SATURN's Index Structures

Abstract
In this document, we discuss improvement on the performance of some commonly arising queries in visualizationn and analysis of flight simulation dataset. The dataset consists of motion information of airplanes and events that occur during the simulation. The simulation datasets are large, consist of complex dynamic spatio-temporal information, and visualization of these datasets requires complex spatio-temporal queries over motion information of objects. Existing data management techniques of indexing and query processing have numerous limitations when dealing with such spatio-temporal datasets since they do not provide native support for representation, indexing and querying of motion information. To address some of these limitations, in the Saturn system being developed at UCI, we have been exploring techniques to represent and index mobile objects. The goal of the Saturn system is to provide scaleable technology for handling queries over mobile objects such as those that arise in the visualization of the Boeing data set.

To illustrate the improvements that may result by utilizing Saturn techniques, in this paper we provide a performance comparison of some queries. The data represented and indexed using data structures and access methods developed in Saturn, is compared to a representation and indexing using existing commercial strength database management system such as Informix.

The remainder of the document is developed as follows. In the following section we discuss the Boeing data set and the types of queries that arise during visualization of the queries considered. We choose 2 queries to illustrate the improvements in performance that may result by using Saturn's techniques. The following section discusses the implementation of the queries using both Informix and Saturn's data structures. Finally, the last section provides a performance comparison..




IDFL
Sharad Mehrotra, Yong Rui,
Kaushik Chakrabarti,
M. Ortega-Binderberger,
and Thomas Huang


E-mail: sharad@ics.uci.edu
Multimedia Analysis and Retrieval System

Abstract
With the advances in storage technology and the advent of the World Wide Web, there has been an explosion in the amount and complexity of digital information being generated, analyzed, stored, accessed and transmitted. Most of this data is multimedia in nature, including digital images, video, audio and simple text data. To make use of this vast amount of multimedia data, we need to develop techniques to efficiently retrieve information over large multimedia repositories based on their content. Due to the difficulty in capturing the content of multimedia objects using textual annotations and the non-scalability of the approach to large data sets (due to a high degree of manual effort required in defining the annotations), the approach based on supporting content-based retrieval over visual features has become a promising research direction. This is evidenced by several prototypes and commercial systems that have been built recently.

Most existing approaches take a "vision-centric" view in which objects are represented using automatically extracted visual features and the system allows users to retrieve objects based directly on the extracted features (e.g., retrieve all images containing a lot of "blue"). In contrast, in the Multimedia Analysis and Retrieval System (MARS) being built within our group at the University of Illinois, we have adopted a more "information-centric" approach. A user expresses his information needs in the form of a higher-level query (e.g., retrieve images similar in content to a given set of images) which is then mapped to lower-level image features by the system possibly with the help of user interaction. Mapping the user query to a set of features extracted from textual documents have been extensively studied in the information retrieval (IR) literature. In developing the MARS system, we have generalized these techniques for content-based retrieval over image features. In the remainder of the paper, we describe some of the problems that arise in such a generalization and the approaches we have taken to address the problems. We conclude with the directions of future research being pursued in the MARS project.




IDFL
Sharad Mehrotra, Yong Rui,
M. Ortega-Binderberger,
and Thomas Huang


E-mail: sharad@ics.uci.edu
Supporting Content-Based Queries over Images in MARS

Abstract
To address the emerging needs of applications that require access to and retrieval of multimedia objects, we have started a Multimedia Analysis and Retrieval Systems (MARS) project at the University of Illinois. As a first step towards the project, we have designed and implemented an image analysis and retrieval system. In this paper, we concentrate on the retrieval subsystem of MARS and its support for content-based queries over image databases. Content-based retrieval techniques have been extensively studied for textual documents in the area of automatic information retrieval. This paper describes how these techniques can be adapted for ranked retrieval over image databases. Specifically, we discuss the ranking and retrieval algorithms developed in MARS based on the Boolean retrieval model and describe the results of our experiments in determining the resulting retrieval effectiveness.




IDFL
Michael Ortega, Kaushik Chakrabarti,
Kriengkrai Porkaew,
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Cross Media Validation in a Multimedia Retrieval System

Abstract
Conventional retrieval systems are commonly measured by their performance in terms of precision and recall. While the merit of these metrics is under debate, they enjoy widespread use. A central problem with these metrics is the implicit requirement of a small collection for which few experts can construct queries and determine relevant and non relevant sets. Increased collection sizes force us to investigate alternate metrics to determine the quality of retrieval. The Multimedia Analysis and Retrieval (MARS) research group at the University of Illinois explores multimedia retrieval. While information retrieval is not a solved problem, some domains have significant progress in retrieval quality. We propose to exploit spatial and temporal associations between media to use a "reference" media and a "test" media. An automatic technique can compare the results and determine how much agreement exists between test and reference media, thus validating the test media retrieval algorithm under a correctness assumption of the reference media.




IDFL
Michael Ortega, Kriengkrai Porkaew
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Retrieval of Documents using Multiple Media Types

Abstract
While there are many textual and image retrieval systems, few have explored the granularity of the retrieval unit and the use of all available information for retrieval. This paper presents our work on using textual and image retrieval, fusing the results and providing document retrieval that uses visual and textual information from documents. A query refinement technique is also shown that blurs the line between browsing and searching and integrates both into the same framework.





IDFL
Michael Ortega, Kriengkrai Porkaew
and Sharad Mehrotra


E-mail: sharad@ics.uci.edu
Information Retrieval over Multimedia Documents

Abstract
While there are many textual and image retrieval systems, few have explored the granularity of the retrieval unit and the use of all available information for retrieval. This paper presents our work on using textual and image retrieval, fusing the results and providing document retrieval that uses visual and textual information from documents. A query refinement technique is also shown that blurs the line between browsing and searching and integrates both into the same framework.




IDFL
Michael Ortega, Yong Rui,
Kaushik Chakrabarti, Sharad Mehrotra, and Thomas Huang


E-mail: sharad@ics.uci.edu
Supporting Similarity Queries in MARS

Abstract
To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) in our group at the University of Illinois. In this paper, we concentrate on the retrieval subsystem of MARS and its support for content-based queries over image databases. Content-based retrieval techniques have been extensively studied for textual documents in the area of automatic information retrieval. This paper describes how these techniques can be adapted for ranked retrieval over image databases. Specifically, we discuss the ranking and retrieval algorithms developed in MARS based on the Boolean retrieval model and describe the results of our experiments that demonstrate the effectiveness of the developed model for image retrieval.




IDFL
M. Ortega-Binderberger, K. Chakrabarti, S. Mehrotra

E-mail: sharad@ics.uci.edu
Database Support for Multimedia Applications

Abstract not available.





IDFL
Michael Ortega-Binderberger

E-mail: sharad@ics.uci.edu
WEBMARS: A Multimedia Search Engine for The World Wide

Abstract not available.





IDFL
K. Porkaew, K. Chakrabarti,
and S. Mehrotra


E-mail: sharad@ics.uci.edu
Query Refinement for Multimedia Retrieval and its Evaluation Techniques in MARS

Abstract
Advances in image processing, database management, and information retrieval has resulted in content-based multimedia retrieval to emerge as an important area of research. Typical content-based retrieval systems allow users to specify queries by providing examples of objects similar to the ones they wish to retrieve. Due to the subjective nature of retrieval, it is unlikely that the answers to the 'starting query' will satisfy the user's information need. Rather, among answers retrieved, the user may find one or more objects that are closer to what she has in mind compared to the original examples.

In the Multimedia Analysis and Retrieval System (MARS), we have explored query refinement techniques to modify the query based on the relevance feedback of the user on the retrieved objects. Query refinement in MARS consists of query reweighting (QR) and query modification (QM) techniques. QR learns the user's notion of similarity between objects and adjusts the weights of different components of the query. QM, on the other hand, uses the feedback information to change the query representation to better suit the user's information need. In [5, 2] query point movement (QPM) approach to QM is explored in which a query is represented by a single point in each feature space. At each iteration, the query point is moved to the centroid of the points marked relevant by the user. In this paper, we study a different approach to QM based on query expansion (QEX) which, at each iteration, uses a clustering technique to identify a set of (one or more) objects to be added to the query representation. We study efficient query processing techniques to implement the QEX approach as well as efficient techniques to execute refined queries for both QEX and QPM models. Our experimental results show that query expansion significantly outperforms query point movement both in terms of retrieval effectiveness and execution cost for all visual features used in MARS.




IDFL
Kriengkrai Porkaew

E-mail: sharad@ics.uci.edu
Database Support for Similarity Retrieval and Querying Mobile Objects

Abstract
Increasingly emerging applications require data management systems to support novel access mechanisms over complex data types. This dissertation studies two such access mechanisms - similarity queries and queries over dynamic objects. The motivation for studying similarity queries arises from applications that require imprecise data to be restored and retrieved in databases. Such applications include semi-structured data retrieval such as XML, time-series retrieval, data exploration and multimedia retrieval. The problem of similarity retrieval is studied in the context of multimedia information. The techniques for data representation, indexing, retrieval, and query refinement are developed. The focus of the thesis is on efficient mechanisms to support content based retrieval and query refinement in databases. The motivation for studying similarity queries arises from applications that require dynamic objects to be stored and retrieved in databases. Dynamic objects - e.g., temperature, humidity, wind velocity, mobile objects - are objects whose values changes continuous as a function of time without explicit updates to databases. Examples of applications that interact with dynamic objects are visualization of mobile objects, storm tracking, traffic monitoring. The problem of querying over dynamic objects is studied in the context of mobile objects. The focus of this part is on efficient mechanisms to support various types of queries over mobile objects in databases. Although the two parts of the thesis deal with two different problems, we establish that the underlying techniques for indexing and query processing for both problems are closely connected.




IDFL
Kriengkrai Porkaew, Sharad Mehrotra, and Robert Winkler

E-mail: sharad@ics.uci.edu
Database Support for Efficient Visualization

Abstract
Effective visualization requires efficient techniques to support spatio-temporal queries over large terrain databases. This paper concentrates on continuous queries that correspond to a virtual mobile object visualizing the motions of other mobile objects in a dynamic environment. Continuous queries arise naturally during fly-through in 3-D visualization. A naive approach to evaluating such queries is to repeatedly submit one query per visualized frame to the database. Since subsequent queries have a high degree of overlap with the previous one (due to the continuity of motion), a large amount of computation will be wasted. This paper proposes two alternative mechanisms that take advantage of the continuity in the sequence of queries in order to optimize the evaluation cost of the queries.




IDFL
Kriengkrai Porkaew, Sharad Mehrotra, and Hu Yu

E-mail: sharad@ics.uci.edu
Continuous Queries in Moving Object Databases to Support Efficient Visualization

Abstract
Increasingly application domains require database management systems to represent mobile objects and to support motion-specific queries. An important type of query in such domains is a continuous query which consists of a sequence of instantaneous queries, one for each point of time t' > t, where t is the time the query is initially posed to the database. An example of a continuous query is to monitor objects within a specified distance of an object x, which itself may be mobile, starting at a given time t. A naïve approach to evaluating continuous queries is to repeatedly submit instantaneous queries to the database, one for each point of time t' > t. Since subsequent queries have a high degree of overlap with the previous one (due to the continuity of motion), a large amount of computation will be wasted. This paper proposes two alternative mechanisms that attempt to reuse the answers returned by previous queries in evaluating subsequent queries thereby optimizing the evaluation of continuous queries. Experiments conducted over a real-life dataset consisting of mobile objects – the AHAS data containing Army battle excersises – is used to validate the efficiency of the developed approaches.




IDFL
Yong Rui, Thomas S. Huang,
and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Content-Based Image Retrieval with Relevance Feedback in MARS

Technology advances in the areas of Image Processing (IP) and Information Retrieval (IR) have evolved separately for a long time. However, efficient content-based image retrieval systems require the integration of the two. There is an urgent need to develop an integration mechanism to link the image retrieval model to text retrieval model, such that the well established text retrieval techniques can be used. An approach of mapping image feature vector (IP domain) to weighted-term vector (IR domain) is proposed. The relevance feedback technique from the IR domain is used to demonstrate the effectiveness of this mapping. Experimental results show that the image retrieval precision increases considerably with the help of relevance feedback.




IDFL
Yong Rui, Thomas S. Huang,
and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
MARS and Its Applications to MPEG-7

Abstract
To address the emerging needs of access to and retrieval of multimedia objects in many applications, we have started a Multimedia Analysis and Retrieval Systems project at the University of Illinois. This project addresses three main aspects in Multimedia Information Retrieval, i.e. feature extraction, multimedia object description, and retrieval algorithm. Although MPEG-7 will concentrate only on multimedia object description, such a goal will be better accomplished if its interfaces to feature extraction and retrieval algorithm are appropriate defined. In this proposal, we will first give a brief overview of the MARS system. Then we propose a multimedia object model for MPEG-7's content description interface. The proposed model allows information abstraction at various semantic levels. To better model human perception subjectivity to multimedia data, relevance feedback is integrated into the retrieval process. Our experimental results show that the proposed multimedia object model and retrieval model are general enough for modeling and specific enough to adapt to user's information need.





IDFL
Yong Rui, Thomas S. Huang,
and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Exploring Video Structure Beyond the Shots

Abstract
While existing shot-based video analysis approaches provide users with better access to the video than the raw data stream does, they are still not sufficient for meaningful video browsing and retrieval, since: (1) the shots in a long video are still too many to be presented to the user; (2) shots do not capture the underlying semantic structure of the video, based on which the user may wish to browse/retrieve the video. To explore video structure at a semantic level, this paper presents an effective approach for video scene structure construction, in which shots are grouped into semantic-related scenes. The output of the proposed algorithm provides a structured video that greatly facilitates user's access. Experiments based on real-world movie videos validate the effectiveness of the proposed approach.




IDFL
Yong Rui, Thomas S. Huang,
and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Browsing and Retrieving Video Content in a Unified Framework

Abstract
In this paper, we first review the recent research progress in video analysis, representation, browsing, and retrieval. Motivated by the mechanism used to access book's content, we then present novel techniques for constructing video Table-of-Contents and index to facilitate accessing video's content. We further explore the relationship between video browsing and retrieval and propose a unified framework to incorporate both entities in a seamless way. Preliminary research results justify our proposed framework for providing access to videos based on their content.




IDFL
Yong Rui, Thomas S. Huang,
Sharad Mehrotra,
and M. Ortega-Binderberger


E-mail: huang@ifp.uiuc.edu
Automatic Matching Tool Selection Using Relevance Feedback in MARS

Abstract
For a given visual feature, due to the diversity of human's subjective judgment, a visual information retrieval system that supports a single prefixed similarity measure will result in poor retrieval performance. To address this problem, this paper proposes the concept of similarity matching toolkit which consists of different similarity measures simulating human's perceptions of the given feature from different aspects. The toolkit supports a feedback-driven tool selection mechanism which adapts to the similarity measure that best fits the user's perception.

To illustrate the advantage of the proposed toolkit approach, we apply it to shape-based image retrieval. The paper describes a shape matching toolkit consisting of four transformation-invariant and computationally efficient matching tools and describes how relevance feedback can be used for automatic tool selection. Experimental results validate the flexibility of the matching toolkit and show the effectiveness of the relevance feedback for shape matching tool selection.




IDFL
Yong Rui, Thomas S. Huang,
Sharad Mehrotra,
and M. Ortega-Binderberger


E-mail: huang@ifp.uiuc.edu
Human Perception Subjectivity and Relevance Feedback in Multimedia Informational Retrieval

Abstract
Content-based multimedia information retrieval (MIR) has become one of the most active research areas in the past few years. While the existing approaches establish the basis of MIR, techniques for incorporating human perception subjectivity into the retrieval process have not been fully investigated. To address this problem, this paper introduces an integrated relevance feedback architecture for MIR, which dynamically captures the user's perception subjectivity. During the retrieval process, the user's perception subjectivity is modeled simultaneously at various levels, by using dynamically updated weights based on the user's relevance feedback. The experimental results show that the proposed approach greatly reduces the user's effort of composing a query and captures the user's information need more precisely.




IDFL
Yong Rui, Thomas S. Huang,
Michael Ortega, and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval

Abstract
Content-Based Image Retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research efforts establish the basis of CBIR, the usefulness of the proposed approaches is limited. Spcifically, these efforts have relatively ignored two distinct characteristics of CBIR systems: (1) the gap between high level concepts and low level features; (2) subjectivity of human perception of visual content.

This paper proposes a relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR. During the retrieval process, the user's high level query and perception subjectivity are captured by dynamically updated weights based on the user's feedback. The experimental results over more than 70,000 images show that the proposed approach greatly reduces the user's effort of composing a query and captures the user's information need more precisely.




IDFL
Yong Rui, Michael Ortega,
Thomas S. Huang, and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Information Retrieval Beyond the Text Document

Abstract
With the expansion of the Internet, searching for information goes beyond the boundary of physical libraries. Millions of documents of various media types, such as text, image, video, audio, graphics, and animation, are available around the world and linked by the Internet.

Unfortunately, the state of the art of search engines for media types other than text lags far behind their text counterparts. To address this situation, we have developed the Multimedia Analysis and Retrieval System (MARS). This paper reports some of the progress made over the years towards exploring Information Retrieval beyond the text domain. In particular, the following aspects of MARS are addressed in the paper: visual feature extraction, retrieval models, query reformulation techniques, efficient execution speed performance and user interface considerations. Extensive experimental results are reported to validate the proposed approaches.




IDFL
Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao

E-mail: wolfson@ouri.eecs.uic.edu
Querying the Uncertain Position of Moving Objects

Abstract
In this paper we propose a data model for representing moving objects with uncertain positions in database systems. It is called the Moving Objects Spatio-Temporal (MOST) data model. We also propose Future Temporal Logic (FTL) as the query language for the MOST model, and devise an algorithm for processing FTL queries in MOST.




IDFL
Prasad Sistla, Ouri Wolfson, and Yixiu Huang

E-mail: wolfson@ouri.eecs.uic.edu
Minimization of Communication Cost Through Caching in Mobile Environments

Abstract
Users of mobile computers will soon have online access to a large number of databases via wireless networks. Because of limited bandwidth, wireless communication is more expensive than wire communication. In this paper we present and analyze various static and dynamic data allocation methods. The objective is to optimize the communication cost between a mobile computer and the stationary computer that stores the online database. Analysis is performed in two cost models. One is connection (or time) based, as in cellular telephones, where the user is charged per minute of connection. The other is message based, as in packet radio networks, where the user is charged per message. Our analysis addresses both, the average case and the worst case for determining the best allocation method.




IDFL
Prasad Sistla, Ouri Wolfson,
Yelena Yesha, and Robert Sloan


E-mail: wolfson@ouri.eecs.uic.edu
Towards a Theory of Cost Management for Digital Libraries

Abstract
One of the features that distinguishes digital libraries from traditional databases is new cost models for client-access to intellectual property. Clients will pay for accessing data items in digital libraries, and we believe that optimizing these costs will be as important as optimizing performance in traditional databases. In this paper we discuss cost models and protocols for accessing digital libraries, with the objective of determining the minimum cost protocol for each model.

We expect that in the future information appliances will come equipped with a cost optimizer, in the same way that today computers come with a built-in operating system. The paper makes the initial steps towards a theory and practice of intellectual property cost management.




IDFL
James Tayeb, Ozgu Ulusoy,
and Ouri Wolfson


E-mail: wolfson@ouri.eecs.uic.edu
A Quadtree-Based Dynamic Attribute Indexing Method

Abstract
Dynamic attributes are attributes that change continuously over time making it impractical to issue explicit updates for every change. In this paper, we adapt a variant of the quadtree structure to solve the problem of indexing dynamic attributes. The approach is based on the key idea of using a linear function of time for each dynamic attribute that allows us to predict its value in the future. We contribute an algorithm for regenerating the quadtreee-based index periodically that minimizes CPU and disk access cost. We also provide an experimental study of performance focusing on query processing and index update overheads.




IDFL
O. Wolfson, A. Lelescu, and B. Xu

E-mail: wolfson@ouri.eecs.uic.edu
Retrieval of Collaborative Work from Multimedia Databases Using Relevance Feedback

Abstract
In this paper we address the problem of retrieving stored multimedia presentations using relevance feedback. We model multimedia presentations using a crisp relation or object-oriented database, augmented with a text attribute. We also introduce a language for retrieval by content from such databases. The language is based on fuzzy logic. We also introduce a method for query refinement that uses relevance feedback provided by the user.




IDFL
O.Wolfson, P. Sistla, B. Xu, J. Zhou,
and S. Chamberlain


E-mail: wolfson@ouri.eecs.uic.edu
Tracking Moving Objects Using Database Technology in DOMINO

Abstract
Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pick-up a customer); or for a trucking company database a typical query may be: retrieve the trucks that are currently within 1 mile of truck ABT312 (which needs assistance); or for a database representing the current location of objects in a battlefield a typical query may be: retrieve the friendly helicopters that are in a given region, or, retrieve the friendly helicopters that are expected to enter the region within the next 10 minutes. The queries may originate from the moving objects, or from stationary users. We will refer to applications with the above characteristics as moving-objects-database (MOD) applications, and to queries as the ones mentioned above as MOD queries.

In the military MOD applications arise in the context of the digital battlefield, and in the civilian industry they arise in transportation systems. For example, Omnitracs developed by Qualcomm is a commercial system used by the transportation industry, which enables MOD functionality. It provides location management by connecting vehicles (e.g. trucks), via satellites, to company databases. The vehicles are equipped with a Global Positioning System (GPS), and they automatically and periodically report their location.




IDFL
Ouri Wolfson, Sam Chamberlain,
Son Dao, Liqin Jiang


E-mail: wolfson@ouri.eecs.uic.edu
Location Management in Moving Object Databases

Abstract
In this paper we first introduce Moving Objects Databases and their related research problems. Then we concentrate on a particular problem, namely reducing the information cost associated with a trip taken by some moving object (e.g. a vehicle). The information cost of a trip consists of the overhead of position-update messages, average uncertainty and the deviation of the database position from the actual position of the object. We introduce three position update policies, namely immediate linear policy (il), plain dead reckoning (pdr) and adaptive dead reckoning (adr). We show that adr has a lower information cost than pdr.




IDFL
Ouri Wolfson and Yixiu Huang

E-mail: wolfson@ouri.eecs.uic.edu
Competitive Analysis of Caching in Distributed Databases

Abstract
This paper makes two contributions. First, we introduce a model for evaluating the performance of data allocation and replication algorithms in distributed databases. The model is comprehensive in the sense that it accounts for I/O cost, for communication cost, and because of reliability considerations, for limits on the minimum number of copies of the object. The model captures existing replica-management algorithms, such as read-one-write-all, quorum-consensus, etc. These algorithms are static in the sense that, in the absence of failures, the copies of each object are allocated to a fixed set of processors.

In modern distributed databases, particularly in mobile computing environments, processors will dynamically store objects in their local database and will relinquish them. Therefore, as a second contribution of this paper, we introduce an algorithm for automatic dynamic allocation of replicas to processors. Then, using the new model, we compare the performance of the dynamic allocation algorithm. As a result, we obtain the relationship between the communication cost and I/O cost for which static allocation is superior to dynamic allocation, and the relationships for which dynamic allocation is superior.




IDFL
Ouri Wolfson, Bo Xu, Sam Chamberlain,
and Liqin Jiang


E-mail: wolfson@ouri.eecs.uic.edu
Moving Objects Databases: Issues and Solutions

Abstract
Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pick-up a customer). In the military, moving objects database applications arise in the context of the digital battlefield, and in the civilian industry they arise in transportation systems.

Currently, moving objects database applications are being developed in an ad hoc fashion. Database Management System (DBMS) technology provides a potential foundation upon which to develop these applications, however, DBMS's are currently not used for this purpose. The reason is that there is a critical set of capabilities that are needed by moving objects database applications and are lacking in existing DBMS's. The objective of our Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing these capabilities on top of existing DBMS's. In this paper we describe the problems and our proposed solutions.




IDFL
Ouri Wolfson, Bo Xu, Sam Chamberlain,
and Liqin Jiang


E-mail: wolfson@ouri.eecs.uic.edu
Challenges and Approaches in Motion Databases

Abstract
Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pick-up a customer). In the military, moving objects database applications arise in the context of the digital battlefield, and in the civilian industry they arise in transportation systems.

Currently, moving objects database applications are being developed in an ad hoc fashion. Database Management System (DBMS) technology provides a potential foundation upon which to develop these applications, however, DBMS's are currently not used for this purpose. The reason is that there is a critical set of capabilities that are needed by moving objects database applications and are lacking in existing DBMS's. The objective of our Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing these capabilities on top of existing DBMS's. In this paper we describe the problems and our proposed solutions.




IDFL
Hu Yu, Sharad Mehrotra, Robert Winkler, Sean S. Ho, Timothy C. Gregory,
and Swati D. Allen


E-mail: sharad@ics.uci.edu
Integration of SATURN System and VGIS

Abstract
This paper reports on our efforts to integrate the SpAtio-Temporal Uncertainty ReasoNing (SATURN) system with the Virtual GIS (VGIS) system to improve the performance and scalability of VGIS and to enhance it as a collaborative planning tool. We added three new components to VGIS: a spatio-temporal object manager, a performance monitor and a task database. The spatio-temporal object manager uses SATURN's dynamic multidimensional indexing techniques to support efficient object traversal during visualization. The performance monitor adjusts the resource allocation between VGIS components and adaptively adjusts image quality to guarantee bounded visualization performance. The task database extends VGIS as a tool for collaborative planning. Performance results illustrate that the object manager and the performance monitor significantly improve VGIS performance, enabling it to scale to complex scenarios with a large number of dynamic objects.




IDFL
Y. Zhuang, S. Mehrotra, and T. Huang

E-mail: sharad@ics.uci.edu
A Multimedia Information Retrieval Model Based on Semantic and Visual Content

Abstract
Unlike text-based information retrieval model, information retrieval based on multimedia lacs a good IR model. In this paper, the goal is to present a retrieval model based on both semantic and visual content about the multimedia object. First, we survey the four basic IR models: boolean model, cluster-based model, vector-based model and probabilistic model. Considering the fact that a user's query will always contain both the textual information as well as the visual information, we work out an algorithm for the distance measurement.




IDFL
Yueting Zhuang, Yong Rui,
Thomas S. Huang, and Sharad Mehrotra


E-mail: huang@ifp.uiuc.edu
Applying Semantic Association to Support Content-Based Video Retrieval

Abstract
The traditional approach to video retrieval is to first annotate the video by textual information (titles and key words) and then the queries will be searched based on this keyword set. Since automatic annotation has not yet been available, this work needs great amount of labor and has been proved to be unrealistic in applications. Another approach, which seems to be at the other extreme, is to utilize the low-level video content, such as color, texture, shape, motion features and so on, in an attempt to get rid of the need of key words annotation.

We hold the view in this paper that a user preferable query form should include both the keywords and video contents. In this paper, we will explore the semantic aspect based on video TOC structuring. Close-captioning is used to extract a basic keywords set. Word-Net, an electronic lexical system, is used to provide semantic association. The approach has been applied in Web-MARS VIR and the running result has shown that the retrieval performance is greatly improved.