FrownOnError: Interrupting Responses from Smart Speakers by Facial Expressions

Yukang Yan, Chun Yu, Wengrui Zheng, Ruining Tang, Xuhai Xu, Yuanchun Shi.
Published at ACM CHI 2020
  • Best Paper Honorable Mention Award
Teaser image

Abstract

In the conversations with smart speakers, misunderstandings of users' requests lead to erroneous responses. We propose FrownOnError, a novel interaction technique that enables users to interrupt the responses by intentional but natural facial expressions. This method leverages the human nature that the facial expression changes when we receive unexpected responses. We conducted a first user study (N=12) to understand users' intuitive reactions to the correct and incorrect responses. Our results reveal the significant difference in the frequency of occurrence and intensity of users' facial expressions between two conditions, and frowning and raising eyebrows are intuitive to perform and easy to control. Our second user study (N=16) evaluated the user experience and interruption efficiency of FrownOnError and the third user study (N=12) explored suitable conversation recovery strategies after the interruptions. Our results show that FrownOnError can be accurately detected (precision: 97.4%, recall: 97.6%), provides the most timely interruption compared to the baseline methods of wake-up word and button press, and is rated as most intuitive and easiest to be performed by users.

Materials

Bibtex

@inproceedings{yukang2020, author = {Yan, Yukang and Yu, Chun and Zheng, Wengrui and Tang, Ruining and Xu, Xuhai and Shi, Yuanchun}, title = {FrownOnError: Interrupting Responses from Smart Speakers by Facial Expressions}, year = {2020}, isbn = {9781450367080}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3313831.3376810}, doi = {10.1145/3313831.3376810}, abstract = {}, booktitle = {Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems}, pages = {1–14}, numpages = {14}, keywords = {voice user interface, conversation interruption, facial expression}, location = {Honolulu, HI, USA}, series = {CHI '20} }