Scalable Low-latency Interactive Perception on Streaming Data

The SLIPstream project aims to enable interactive applications driven by real-time processing of high-rate streaming data. Examples of such applications include unconstrained gesture recognition based on frame-rate spatio-temporal event detection and robot control and actuation based on real-time object recognition in video streams. Such interactive and actuation applications are computationally demanding, and require both high computational throughput and low-latency operation of the vision systems. A key component of SLIPstream that attempts to satisfy these requirements is Sprout, a runtime for parallel stream processing that parallelizes vision tasks across a cluster of compute nodes. Unlike traditional schemes for parallelizing computation, Sprout does not simply replicate processing stages to maximize throughput; rather, Sprout applies intelligent replication with careful refactoring of tasks so as to minimize latency. The SLIPstream project is investigating various techniques for runtime adaptation, refactoring applications, and constructing algorithms amenable to such techniques.

An important application focus of the SLIPstream project is creating novel natural user interfaces, such as multimodal gesture/speech interfaces where the user points to devices in the environment and then controls them using voice commands. Another area of interest is simultaneously processing the input from hundreds of video streams, such as those generated in a virtualized reality studio. Since many computer vision problems become easier when the sensor sampling density is increased (whether spatially or temporally), we seek to enable real-time 3D reconstruction for large-scale multi-user environments, where people can interact with each other and the space without props, awkward wearable tracking devices or motion capture markers.

SLIPstream straddles both major thrusts of research at Intel Labs Pittsburgh, Cloud Computing Systems (CCS) and Embedded Real-time Intelligent Systems (ERIS).




Previous Contributors


  • P. Matikainen, P. Pillai, L. Mummert, R. Sukthankar, “Prop-Free Pointing Detection in Dynamic Cluttered Environments,” To appear in IEEE International Conference on Automatic Face and Gesture Recognition (FG), March 2011.
  • P. Matikainen, M. Hebert, R. Sukthankar. Representing Pairwise Spatial and Temporal Relations for Action Recognition. Proceedings of European Conference on Computer Vision (ECCV), September 2010.
  • Q. Zhu, B. Kveton, L. Mummert, P. Pillai. Automatic Tuning of Interactive Perception Applications. 26th Conference on Uncertainty in Artificial Intelligence (UAI). July 2010.
  • M. Chen, L. Mummert, P. Pillai, A. Hauptmann, R. Sukthankar. Controlling Your TV With Gestures. Multimedia Information Retrieval (demo). March 2010.
  • M. Chen, L. Mummert, P. Pillai, A. Hauptmann, R. Sukthankar. Exploiting Multi-Level Parallelism for Low-Latency Activity Recognition in Streaming Video. First ACM Conference on Multimedia Systems, February 2010.
  • P. Matikainen, M. Hebert, R. Sukthankar. Trajectons: Action Recognition Through the Motion Analysis of Tracked Features. ICCV Workshop on Video-oriented Object and Event Classification, October 2009.
  • P. Pillai, L. Mummert, S. Schlosser, R. Sukthankar, C. Helfrich. SLIPstream: Scalable Low-latency Interactive Perception on Streaming Data. The 19th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), June 2009.
  • J. Campbell, L. Mummert, R. Sukthankar. Video Monitoring of Honey Bee Colonies at the Hive Entrance. ICPR Workshop on Visual Observation and Analysis of Animal and Insect Behavior (VAIB), December 2008.
  • P. Matikainen, R. Sukthankar, M. Hebert, Y. Ke. Fast Motion Consistency through Matrix Quantization. Proceedings of BMCV, 2008.
  • Y. Ke, R. Sukthankar, M. Hebert. Event Detection in Crowded Videos. Proceedings of International Conference on Computer Vision, 2007.