Adam T. Lindsay
Riverland Research Excelsiorlaan
 
A Position Statement for Panel 5: MPEG7 Issues
The 1998 International Workshop on Very Low Bitrate Video Coding
 
MPEG-7 is an ambitious standardization effort from the Moving Pictures Expert Group. It seeks to standardize a common interface for describing multimedia materials. With a common agreement reached (scheduled to be by the end of 2000), the standard will enable many different applications world wide, and an unprecedented amount of interoperability.

In order for the international community to start technical work in this direction, much less reach the promised interoperability, it must first resolve one issue which has far reaching consequences for the whole standard. It affects the scope of the standard, the technical work, and the ultimate success and adoption of the standard. Such a far-reaching issue, unfortunately, is not technical, but human in nature. MPEG-7 must reconcile the approaches favored by the different communities attracted by the work.

Structuralist vs. Perceptualist

To characterize the problem as a social one, a clash of multiple scientific and engineering communities, is not to say that it is without technical aspects. Indeed it is the different technical insights, the different ways of formulating the challenge presented by MPEG-7 that causes the most difficulty within MPEG-7. The most striking difference is that between the database community and the signal processing community, which I categorize as the Structuralists and the Perceptualists.

In broad strokes, the Structuralists, those from the database world and those who need high-level descriptions, believe that MPEG-7 needs only to provide a standardized structure to the international community. The Perceptualists, those who have been involved in image analysis primarily, see the only way to success in standardizing the descriptors, or the representations of the content. Of course, these are caricatured representations. Still, the Perceptualists must learn that many potential problems are easily solved by imposing a structure on the standard. The Structuralists, however, must learn that a structure does not solve everything.

I will now sketch a few scenarios of where the standard might potentially go without the proper balance between the two communities.

Bootstrapping Semantics

Some Structuralists assert that the only thing that needs to be standardized within MPEG-7 is the Description Definition Language (DDL), which would thus provide a solid SGML-like underpinning in order for users of the standard to create their own Description Schemes (DS's). It is true that devising such a language is a formidable task in itself, and could easily take the two to three years allotted to the standardization effort. It is true that providing such a language provides great flexibility for many users of the standard, and ensures that it will last well into the future. We all must take these benefits and drawbacks to heart, but the truth remains that the DDL alone will not be enough for a standard.

The problem with providing just a DDL to create description schemes is that it does not ensure interoperability. In fact, it may well prevent any degree of the interoperability expected of the standard. The DS's are structured combinations of Descriptors (D's), which correspond to perceptual features. A common attitude among the Structuralists is that they are loath to reveal what is in their data structures, and wish to allow the market to compete based on the D's, these representations. By keeping the Descriptors, the basic unit of semantics in MPEG-7, a private issue, then there is little to enable one computer to "understand" another.

For example, a DS may express that an image must contain objects, and objects have many attributes, one of which is "color." The means of expressing this structure (the DDL), and even the structure (DS) itself may be standardized, but many Structuralists wish to compete in the marketplace based on providing the best representation for "color." The DS expressed in the DDL may be perfectly parsable, but the D's at the bottom of this tree (or whichever) structure must also be agreed upon for any information about the image to be exchanged. Without standardized descriptors, interoperability is lost, and MPEG-7 just becomes a Tower of Babel, with a multitude of solutions claiming to be compliant to a standard, but not sharing enough information for that standard to be useful.

Finding Flexibility

So, it is clear that Descriptors are necessary. But they cannot do the job of standardization alone. Providing a structure solves many problems that a bottom-up description method might encounter. Structure allows novel combinations of features to be explored. It provides a framework for reference to parts of a whole document, or a series of individual documents, where one might have to standardize different descriptors for each possible combination.

Flexibility is also welcome in ensuring the longevity of the standard. It would not be wise to standardize Description Schemes for a given task, and not allow for different related applications to modify them for their own specific uses.

Without flexibility, MPEG-7 lends itself to eventual obsolescence and narrowness of scope. If a powerful DDL is not provided, then the very real possibility exists for other parties to extend the standard to their own uses, and thereby eliminating the strength of a true industrial standard.

Scope and Conclusion

This last issue of flexibility thus raises the obvious question, "Who is MPEG-7 for?" Do we follow a specific, bottom-up approach for a few identified domains? Or do we keep the solution generic, allowing anyone to create their own MPEG-7 solution? If it is not clear already, my answer is both and neither. MPEG-7 should make a strong showing in some core applications, establishing DS's and variants that would serve the video, image, music, and sound indexing communities well, allowing a good number of initial products to target those basic standards. MPEG-7 should also provide a level of genericity (in the D's) and power (in the DDL) that will allow specialized communities (such as medical imaging) who wish to adapt the standard to their uses to do so.

Ultimately, there should be a generic set of Descriptors for audio and visual features, and a specific set of Description Schemes which serve specific applications. A single DDL allows one to create new DS's from existing Descriptors. If a feature cannot be captured through structuring generic D's into a novel DS, it may be adopted in a second phase of the standard, or perhaps registered through a registration body. Such registration must be closely monitored, as it, too, could lead to forced incompatibilities, and a variety of competing, but incompatible, descriptors.

For me, interoperability should be the foremost goal for MPEG-7. It is what the standard offers a tantalizing glimpse of, and should not be dashed upon the rocks of competitive closed-ness. Other goals, in order for this compatibility to be wide-spread should be to provide a core set of specific standards, to allow for early adoption in a given community, and the flexibility for that community to spread, grow, and include other special interests.