Philippe Salembier
Universitat Politecnica de Catalunya
 
A Position Statement for Panel 5: MPEG7 Issues
The 1998 International Workshop on Very Low Bitrate Video Coding
 
The MPEG-7 initiative intends to standardize the content description of multimedia document. The normative parts of the standard will include Descriptors (Ds: assignment of a representation value to one or more features), Description Schemes (DSs: definition of a structure and semantic of descriptors and their relationships) and a Description Definition Language (DDL: the language to specify Description Schemes). In this position paper, several issues related to the descriptors are introduced to stimulate discussion during the panel session of VLBV98.

Following the discussions within the MPEG-7 group and related publications, it seems that there are at least two different viewpoints on the nature of descriptors that will be useful for MPEG-7 applications. The first one is related to people with a strong background on Audio-Visual or multimedia archive (the "documentalist" approach), whereas the second viewpoint is represented by people working audio or video processing (the "Signal processing & Pattern recognition" approach). To caricaturize the two approaches, a reduced set of issues is listed in the following table together with some possible answers from both viewpoints:

 
 
Question
 
 
Documentalist approach
 
 "Signal Processing & Pattern recognition" approach 
 
Should all descriptors address the semantic level of the content ? Yes, almost of all them. In most application, the user wants to formulate a simple query with a very specific meaning. No. Currently, most of the queries are formulated using words. However in the future, new query mechanisms will appear. Some of them may be answered without addressing the semantic level of the description (simple case is query by example).
What is the role of the indexing process of the document ? The goal of the indexing process is first to model the content of the document, that is to define (or select) the relevant ontology, and then to assign semantic descriptors. In a sense, it is to define all possible answers to future queries. In some sense, the indexing creates a restricted universe of answers. The objective of the indexing process is to represent as much as possible the document. However, the representation has not to be designed only for visualization. 
What kind of queries can be answered? It is difficult to answer queries involving semantic notions or descriptors that have not been foreseen during the indexing. It is up to the search engine to make a good use of the descriptors to give the correct information to the user.
Where is the "intelligent" part of the process located? Mainly in the indexing process. This process has generally to be done by experts which should have a strong background on indexing and be aware of the particular application they are working for. The search process is relatively simple, since most of the time the objective is to check the presence of a particular descriptor.  Mainly in the search engine. Most of the indexing is done by automatic analysis with a moderate amount of human supervision. A fairly large amount of processing has to be done in the search engine to match the user query with the descriptors.
Type of descriptors Generally text-based or enumerated descriptors. Beside text-based or enumerated descriptors, the type of descriptors can be quite flexible. Examples include arbitrary shape, statistical representation of shape, texture or sound, etc.
How is it possible to deal with new descriptors ? If a descriptor with a completely new semantic has to be introduced, one should clearly define its meaning and somehow register it so that all indexing experts and the search engines are aware of its existence and may use it.  Since human interaction is less relevant, one has simply to tell to the indexing application and to the search engineshow to instantiate this particular descriptor. For a large number of descriptors, a solution consists in transmitting the extraction algorithm.
 
As can be seen, both viewpoints involve strong and weak points. Based on the technical contributions to be received at the beginning of 1999, MPEG-7 will have to define the appropriate compromise between these two extreme positions. In the context of this position paper, these viewpoints are contrasted to stimulate discussions.