Narendra Ahuja
University of Illinois at Urbana Champaign
A Position Statement for Panel 1: Image/Video feature extraction and segmentation
The 1998 International Workshop on Very Low Bitrate Video Coding


In order for a low level segmentation system to have a satisfactory performance, it must have the following properties:

A. Shape and Topological Invariance: The regions should be correctly detected regardless of their shapes and relative placement. For example, an edge point must be detected at its true location, regardless of whether the edge in the vicinity of the point is straight, curved or even contains a corner or a vertex where multiple regions meet.

B. Photometric Scaling: It should be possible to detect all regions which are in contrast to their surround, regardless of the actual degree of intra-region homogeneity and the value of the contrast. Regions having large contrast may be associated with higher scales.

C. Spatial Scaling: It should be possible to detect all regions regardless of their shapes and sizes. Higher scales may be associated with larger regions.

D. Stability and Automatic Scale Selection: Image structures associated with different scales correspond to segmentations that are locally invariant to changes in geometric and contrast sensitivities. Since the contrasts and sizes of regions contained in an arbitrary image are a priori unknown, they should be identified automatically.

Majority of the work on segmentation has been on edge detection, effectively achieved through convolutions using different models of edge profile and geometry. However, all image types cannot be captured by a finite set of tractable models. Since any convolution kernel must incorporate a template for the expected edge, no linear, convolution based approach can avoid the limitations resulting from the use of such models. Similar arguments can be made about the bottom-up character of scale detection and models of region interiors. Therefore segmentation should be viewed as a (bottom-up) grouping process rather than as (top-down) model-fitting.