Relational Perception takes a holistic view of perception: integrating new sensors, existing detection/classification algorithms and mid-level vision via first-order logic and graphical models. Entities are detected and recognized based on similarity from frame-to-frame, logical relations and in-frame context. This idea allows us to harness many of the existing complementary algorithms for entity detection and classification, and to bring to bear high-level reasoning in the form of first-order logical relations.
One of the key challenges for making relational perception
practical is to make inference and learning computationally efficient
enough to allow for real-time and even low-power inference. In a
person recognition task this is complicated by the fact that new
labels can be introduced at any time requiring bursty model
re-calibration. We have developed an asynchronous learning/inference
architecture to allow for the infrequent addition of new labels while
we do efficient online person recognition at interactive speeds.
Person recognition takes into account multiple levels of cross-frame
similarity (face pixel intensity, face and torso color) as well as
in-frame and high-level logical context such as mutual exclusivity of
labels in the same frame.
Authors: Denver Dash (Intel Labs
Pittsburgh), Tran Q Long (Georgia Institute of Technology), Anton
Chechetka (Carnegie Mellon University), Matthai Philipose (Intel Labs
Seattle)
The development of inexpensive new vision sensors that overlay depth
on each image pixel may have the ability to revolutionize computer
vision. In this project, we have been developing fast techniques for
exploiting the depth channel in these sensors to perform entity
detection and ultimately fast activity recognition from an ego-centric
perspective. We show that simple probabilistic models of shape
descriptors together with absolute scale obtained with depth-cams
leads to robust, fast and accurate human detectors even when the
humans are facing away from the camera. Furthermore, we show that
efficient foreground masking is possible for detecting objects-in-hand
from an ego-centric perspective, allowing activity recognition at
speeds than would be difficult without depth.
Authors: Denver
Dash (Intel Labs Pittsburgh), Sidd Srinivasa (Intel Labs Pittsburgh),
Archana Asokan (Georgia Institute of Technology), James Rehg (Georgia
Institute of Technology), Tudor Achim (Carnegie Mellon University)