Advances in VLSI technology will, over the next decade,
allow us to build a new generation of vision systems that rely on large numbers of inexpensive cameras
and a distributed network of high-performance processors. Networks of distributed smart cameras are an
emerging enabling technology for a broad range of important information technology applications, including
human and animal recognition, surveillance, motion analysis, smart conference rooms, and
facial detection. By having access to scenes from multiple directions, such networks have the
potential to synthesize far more complete views than single-camera systems. However, this distributed
nature, coupled with the inherent challenges associated with real-time video processing,
greatly complicates the development of effective algorithms, architectures, and software. An integrated
research program including video, design tools, and embedded system architecture is
required to understand how new generations of smart camera hardware can
be best utilized.
This project develops new techniques for distributed smart camera
networks through an integrated exploration of distributed algorithms, embedded architectures, and software
synthesis techniques: we are developing new architectures and tools that are designed to handle modern
video algorithms; we are developing new algorithms that can leverage distributed architectures
and are compilable into efficient implementations.
In this project, we are investigating a series of complex smart
camera algorithms and applications, specifically, human gesture recognition; self-calibration of the
distributed camera network; detection, tracking and fusion of trajectories using distributed cameras;
view synthesis using image based visual hulls; gait-based human recognition; and human activity analysis.
Through analysis of these applications, we are exploring domain-specific programming models
and software synthesis techniques to automate their translation into efficient implementations. By
translating domain-specific, formal models of distributed signal processing systems into streamlined
procedural language (e.g., C) implementations, these synthesis techniques are complementary
to the growing body of work on embedded processor code generation
techniques.
This research leads to new embedded architectures for distributed
smart camera networks, and to video processing algorithms that are tailored to the opportunities and
constraints associated with these architectures. The research also leads to a better understanding of
relationships among distributed signal processing, embedded multiprocessors, and low power/low latency
operation, and develops synthesis tools that use this understanding to help automate the implementation
of smart camera applications.
Our research program poses challenges in video analysis algorithms,
embedded system architecture, and synthesis tools for embedded systems.
Performing recognition across multiple cameras requires fusing data at
appropriate point in the processing chain in order to create a model of the scene in world coordinates rather
than camera coordinates. We cannot assume that we can mosaic several images into a single image and
then use traditional methods. For example, we use multiple cameras to avoid occlusion, with different
cameras providing different views of the subject. A mosaicked view would not adequately
capture the relationships between mutually-occluding elements.
In addition, these video algorithms must run in real time and with low
latency. Not only must the video algorithms be able to run at high frame rates, they must also provide their
results with relatively low latency. Distributed smart cameras will be used in many closed-loop applications;
excessive processing delay will make it difficult to use the analysis results to control other
actions.
Distributed smart cameras pose several challenges for embedded system
architectures. Smart cameras process large volumes of data and perform a great deal of computation on that
data. The distributed system architecture must be designed to put appropriate amounts of computation
at the proper spots in the processing pipeline; the network must be designed to handle realtime
requirements. The entire system must also be architected to minimize power consumption—
excessive power consumption will cause nodes to run too hot, increasing installation and maintenance
costs.
Tools that help designers map applications onto distributed embedded
systems are critical to the long-term viability of this approach. The programming models aspect of this work
seeks to develop software synthesis and model-based design technology for video processing applications.
Distributed systems are complex heterogeneous systems that are much harder to program than are
the workstations or PCs traditionally used in video research. Because multiple applications can be
run on a single distributed system, developing tools for a distributed smart camera will allow us to
use the same architecture for many video applications. Tools will also be portable to other distributed
signal processing domains.
Tools and model-based design also provide high-confidence implementations
in the face of complex algorithms that must meet multiple implementation goals (throughput, latency, power
consumption). Model-based design refers to the design of applications using components that
have well-defined behavior and interact through well-defined models of computation. By software
synthesis, we mean the automated derivation of software implementations (e.g., C programs)
from model-based representations. Model-based design is used widely for algorithm
development and simulation of one-dimensional signal processing applications, such as those
used in speech/audio processing and wireless communication. However, its use for software synthesis
and video applications is limited by the expressive power of existing model-based DSP
design tools. Such design tools do not adequately support control flow and multidimensional DSP
operations. As a result, engineers are typically forced to use model-based design tools for early,
subsystem-level algorithm exploration, and then manually develop and integrate the subsystemlevel
implementations for the final production code. This leads to inconsistencies between the
model-based specifications and the production code, which is error-prone and negates useful
properties, such as bounded memory, deadlock avoidance, and local synchrony qualities, that are
offered by the model-based specifications.