Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device Sensing

Zisu Li, Chen Liang, Yuntao Wang, Yue Qin, Chun Yu, Yukang Yan, Mingming Fan, Yuanchun Shi.

Published at ACM CHI 2023

Best Paper Honorable Mention Award

Abstract

Gestures performed accompanying the voice are essential for voice interaction to convey complementary semantics for interaction purposes such as wake-up state and input modality. In this paper, we investigated voice-accompanying hand-to-face (VAHF) gestures for voice interaction. We targeted on hand-to-face gestures because such gestures relate closely to speech and yield significant acoustic features (e.g., impeding voice propagation). We conducted a user study to explore the design space of VAHF gestures, where we first gathered candidate gestures and then applied a structural analysis to them in different dimensions (e.g., contact position and type), outputting a total of 8 VAHF gestures with good usability and least confusion. To facilitate VAHF gesture recognition, we proposed a novel cross-device sensing method that leverages heterogeneous channels (vocal, ultrasound, and IMU) of data from commodity devices (earbuds, watches, and rings). Our recognition model achieved an accuracy of 97.3\% for recognizing 3 gestures and 91.5\% for recognizing 8 gestures (excluding the "empty" gesture), proving the high applicability. Quantitative analysis also shed light on the recognition capability of each sensor channel and their different combinations. In the end, we illustrated the feasible use cases and their design principles to demonstrate the applicability of our system in various scenarios.

Materials

Bibtex

@inproceedings{zisu2023, author = {Li, Zisu and Liang, Chen and Wang, Yuntao and Qin, Yue and Yu, Chun and Yan, Yukang and Fan, Mingming and Shi, Yuanchun}, title = {Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device Sensing}, year = {2023}, isbn = {9781450394215}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3544548.3581008}, doi = {10.1145/3544548.3581008}, abstract = {}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, articleno = {313}, numpages = {17}, keywords = {sensor fusion, acoustic sensing, hand gestures}, location = {Hamburg, Germany}, series = {CHI '23} }