Munchurl Kim
ETRL-Broadcasting Technology Department
 
A Position Statement for Panel 1: Image/Video feature extraction and segmentation
The 1998 International Workshop on Very Low Bitrate Video Coding
 

 Image segmentation techniques are important tools for content-based image coding, manipulation of image contents, and interactive multimedia applications, etc. Segmentation of image usually divides the image contents into semantic regions that can be dealt as objects. These semantically segmented objects can be coded so that object-based manipulation of image content can be possible in interactive multimedia applications. For example, the MPEG-4 that is being currently standardized, aims at providing core techniques for object based manipulation of audio-visual. Another application is in image indexing wherein image segmentation can facilitate object based image indexing in which each homogeneous regions can be individually represented in the whole image.

Automatic segmentation: As an example of segmenting images into semantic regions, image frames in video can be partitioned into moving object parts and still background. Automatic segmentation of moving objects (foreground) from the background is an important tool for a VOP (Video Object Plane in MPEG-4) generation, however it often suffers from achieving satisfactory segmentation results for a variety of image types because automatic segmentation is an ill-posed problem.

Semi-automatic segmentation: As an alternative, semi-automatic segmentation can be considered which is a user-assisted segmentation technique. A user can initially mark objects of interest around the object boundaries by utilizing human's cognitive information at the initial stage of segmentation. Then the user-guided and selected objects are continuously separated from the unselected areas through time evolution in the image sequences. A possible approach for semi-automatic segmentation may have two processing steps: intra-frame segmentation and inter-frame segmentation. First, the intra-frame segmentation is applied to the first frame of the image sequence or to the frames which contain only newly appeared video objects or scene cut. A user manually defines or segments the newly appeared video objects in the image. Then inter-frame segmentation is applied to the consecutive frames following the first frame or a frame with a newly appeared object or scene cut. In the inter-frame segmentation, the user-defined video objects can be segmented automatically by object tracking. Therefore the resulting partition can achieve the temporal coherence of object labeling i.e. object correspondence and maintain segmentation similarity between successive frames. Therefore more reliable segmentation results can be obtained compared to those with automatic segmentation methods. Semi-automatic segmentation is an attractive approach in non-real time applications.

Video Partition: Besides automatic and semi-automatic segmentation of images/video in which each image frame is partitioned into regions in spatial dimension, the term 'segmentation' can also be meant to video segmentation in temporal dimension based on shot boundary or scene boundary for video indexing. Video consists of a series of image frames in temporal dimension and can be partitioned into a shot unit. Shot designates a set of consecutive frames captured by a single camera operation. Shot boundary detection is usually made by pixel-based, statistic-based, transform-based, feature-based or histogram-based approaches. While the shots are distinguished by physical boundaries, the scenes are separated by semantic boundaries. Scene may consist of visually similar and temporally close shots or shots within the same event (e.g. dialogue). This scene-based segmentation of video can allow for better semantic representation in capturing the underlying story in video. While much research has been fo cused on shot-based video segmentation (shot boundary detection) relatively less effort has been made in scene based segmentation.