Schwarz, J., Marais, C., Leyvand, T., Hudson, S., Mankoff, J. Combining Body Pose, Gaze and Motion to Determine Intention to Interact in Vision-Based Interfaces. In Proceedings of the 32nd Annual SIGCHI Conference on Human Factors in Computing Systems (Toronto, Canada, April 26 – May 1, 2014). CHI ’14. ACM, New York, NY.
paper | video summary | slides
Vision-based interfaces, such as those made popular by the
Microsoft Kinect, suffer from the Midas Touch problem:
every user motion can be interpreted as an interaction. In
response, we developed an algorithm that combines facial
features, body pose and motion to approximate a user’s
intention to interact with the system. We show how this can
be used to determine when to pay attention to a user’s actions and when to ignore them. To demonstrate the value of
our approach, we present results from a 30-person lab study
conducted to compare four engagement algorithms in single
and multi-user scenarios. We found that combining intention to interact with a “raise an open hand in front of you”
gesture yielded the best results. The latter approach offers a
12% improvement in accuracy and a 20% reduction in time
to engage over a baseline “wave to engage” gesture currently used on the Xbox 360