Indexed on: 27 Dec '11Published on: 27 Dec '11Published in: Pattern Recognition Letters
This paper presents a method to extract a part-based model of an observed scene from a video sequence. Independent motion is a strong cue that two points belong to different "rigid" entities. Conversely, things that move together throughout the whole video belong together and define a "rigid" object or part. Successfully tracked features indicate trajectories of salient points in the scene. A triangulated graph connects the salient points and encodes their local neighborhood in the first frame. The length variation of the triangle edges is used to label them as relevant (on an object) or separating (connecting different objects). A following grouping process uses the motion of the triangles marked as relevant as a cue to identify the "rigid" parts of the foreground or the background. The choice of the motion-based grouping criterion depends on the type of motion: in the image plane or out of the image plane. The result is a hierarchical description (graph pyramid) of the scene, where each vertex in the top level of the pyramid represents a "rigid" part of the foreground or the background, and encloses to the salient features used to describe it. Promising experimental results show the potential of the approach.