Relational Perception:
Integrating new sensors, computer vision and first-order logic


Relational Perception takes a holistic view of perception: integrating new sensors, existing detection/classification algorithms and mid-level vision via first-order logic and graphical models. Entities are detected and recognized based on similarity from frame-to-frame, logical relations and in-frame context. This idea allows us to harness many of the existing complementary algorithms for entity detection and classification, and to bring to bear high-level reasoning in the form of first-order logical relations.




Online Relational Perception for Person Recognition

One of the key challenges for making relational perception practical is to make inference and learning computationally efficient enough to allow for real-time and even low-power inference. In a person recognition task this is complicated by the fact that new labels can be introduced at any time requiring bursty model re-calibration. We have developed an asynchronous learning/inference architecture to allow for the infrequent addition of new labels while we do efficient online person recognition at interactive speeds. Person recognition takes into account multiple levels of cross-frame similarity (face pixel intensity, face and torso color) as well as in-frame and high-level logical context such as mutual exclusivity of labels in the same frame.
Authors: Denver Dash (Intel Labs Pittsburgh), Tran Q Long (Georgia Institute of Technology), Anton Chechetka (Carnegie Mellon University), Matthai Philipose (Intel Labs Seattle)



Real-Time Entity Detection and Activity Recognition with RGB-D Cameras

The development of inexpensive new vision sensors that overlay depth on each image pixel may have the ability to revolutionize computer vision. In this project, we have been developing fast techniques for exploiting the depth channel in these sensors to perform entity detection and ultimately fast activity recognition from an ego-centric perspective. We show that simple probabilistic models of shape descriptors together with absolute scale obtained with depth-cams leads to robust, fast and accurate human detectors even when the humans are facing away from the camera. Furthermore, we show that efficient foreground masking is possible for detecting objects-in-hand from an ego-centric perspective, allowing activity recognition at speeds than would be difficult without depth.
Authors: Denver Dash (Intel Labs Pittsburgh), Sidd Srinivasa (Intel Labs Pittsburgh), Archana Asokan (Georgia Institute of Technology), James Rehg (Georgia Institute of Technology), Tudor Achim (Carnegie Mellon University)

Researchers


Collaborators


Students