This work was published at the International Symposium on Visual Computing, November 2010 (link)
In addition to the distributed camera network interface project listed above, this is the other main project I have worked on in graduate school. The goal of this project was to develop a
method to reliably learn scene entry and exit locations in video. More generally, given a camera that is viewing an area (scene), the goal is to automatically learn areas in the camera
view that objects (e.g., people, cars, etc.) enter into the scene and exit out of the scene. These entry and exit "zones" can correspond to anything from a doorway to the edge of the camera
view. The resultant method to do this is novel in that it aims to model and exploit the behavioral consistency of the objects in the scene when determining if a plausible entry/exit "zone" is
likely to be a real entry/exit. Further, the proposed approach works with weak tracking input (most existing approaches require more reliable "strong" tracks). This project consists of two main parts. The
first part involves making sense of the weak tracking data (learning "entities") so that it can be used to learn entry and exit regions. The second part involves clustering a set of potential entry and
exit regions, and then scoring them using a behavioral based reliability metric to obtain a final set of entry/exit regions.
Part 1 (learning entities)
As with most scene understanding algorithms, this approach uses tracking data as input. Strong tracking algorithms (ones that track objects reliably as they move through the scene) do not work well in busy urban environments, and
can be computationally expensive. To compensate for these issues a "weak" tracking algorithm is used instead. Rather than attempting to track entire objects as strong trackers do, weak trackers track salient feature points (e.g., areas of high texture or corner points). Such
trackers work well in crowded scenes, though rather than producing a single track as an object moves through the scene, a weak tracker will produce a set of short and frequently broken "tracklets". These weak tracks are then clustered into "entities"
using a modified version of mean-shift clustering on the frame level track obsersvations (see the paper for more details). This transforms the set of weak tracks into a set of "entity" tracks (see below).
This idea being using object tracks is to cluster track start points to form entry clusters, and track end points to form exit clusters. However, this is not possible using a weak tracker as the tracks are very fragmented. The process of learning entity tracks, however, drastically reduces the noise of the weak tracker. To demonstrate this, the below image displays weak tracker start observations and the corresponding entity track start observations (after transforming the weak tracks to entity tracks).
Part 2 (learning entries and exits)
Using the new set of entity tracks, the set of entity entry and entity exit observations are clustered using mean-shift clustering, and
a convex-hull area reduction technique is used to obtain tighter clusters and a more generalized cluster shape (see paper). This results in a
set of potential entry and exit regions. These potential regions are then scored using two behavorial reliability metrics - directional consistency, and
interaction consistency. Directional consistency constrains the regions to be somewhat directional, and enforces that objects leave and entry or enter an exit in a semi-directional manner
(e.g., objects don't spawn in the scene and leave/enter an entry/exit in all directions). Interaction consistency evaluates how other entity tracks in the scene
intersect each potential entry and exit region. This enforces the idea that if an entry exists with objects leaving the entry in some direction, other objects in the scene
should not intersect the region in the same direction that objects enter into the scene from the entry (the entry/exit region should not be a "through state"). An example of how the
described reliability metrics can be used to reduce a set of potential entry/exit regions to reliable entry/exit regions is shown below.
In the above reliable entry image, the arrows denote the most popular angle that objects entered into the scene from each entry. Additional results are shown below.
