Generative Artificial Intelligence’s Utility for Accessibility

With the recent rapid rise in Generative Artificial Intelligence (GAI) tools, it is imperative that we understand their impact on people with disabilities, both positive and negative. However, although we know that AI in general poses both risks and opportunities for people with disabilities, little is known specifically about GAI in particular.

To address this, we conducted a three-month autoethnography of our use of GAI to meet personal and professional needs as a team of researchers with and without disabilities. Our findings demonstrate a wide variety of potential accessibility-related uses for GAI while also highlighting concerns around verifiability, training data, ableism, and false promises.

Glazko, K. S., Yamagami, M., Desai, A., Mack, K. A., Potluri, V., Xu, X., & Mankoff, J. An Autoethnographic Case Study of Generative Artificial Intelligence’s Utility for Accessibility. ASSETS 2023. https://dl.acm.org/doi/abs/10.1145/3597638.3614548

News: Can AI help boost accessibility? These researchers tested it for themselves

Presentation (starts at about 20mins)

Personalized behavior modeling: depression detection

Xuhai Xu, Prerna Chikersal, Janine M. Dutcher, Yasaman S. Sefidgar, Seo Woosuk, Michael J. Tumminia, Daniella K. Villalba, Sheldon Cohen,
Kasey G. Creswell, Creswell, J. David, Afsaneh Doryab, Paula S. Nurius, Eve Riskin, Anind K. Dey, & Jennifer Mankoff. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM interact. mob. wearable ubiquitous technol., Article 41. (March 21) 27pages.

The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interoperability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.

TypeOut: Just-in-Time Self-Affirmation for Reducing Phone Use

Smartphone overuse is related to a variety of issues such as lack of sleep and anxiety. We explore the application of Self-Affirmation Theory on smartphone overuse intervention in a just-in-time manner. We present TypeOut, a just-in-time intervention technique that integrates two components: an in-situ typing-based unlock process to improve user engagement, and self-affirmation-based typing content to enhance effectiveness. We hypothesize that the integration of typing and self-affirmation content can better reduce smartphone overuse. We conducted a 10-week within-subject field experiment (N=54) and compared TypeOut against two baselines: one only showing the self-affirmation content (a common notification-based intervention), and one only requiring typing non-semantic content (a state-of-the-art method). TypeOut reduces app usage by over 50%, and both app opening frequency and usage duration by over 25%, all significantly outperforming baselines. TypeOut can potentially be used in other domains where an intervention may benefit from integrating self-affirmation exercises with an engaging just-in-time mechanism.

Typeout: Leveraging just-in-time self-affirmation for smartphone overuse reduction. Xuhai Xu, Tianyuan Zou, Xiao Han, Yanzhang Li, Ruolin Wang, Tianyi Yuan, Yuntao Wang, Yuanchun Shi, Jennifer Mankoff,and Anind K. Dey. 2022. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). ACM, New York, NY, USA.

College during COVID

Mental health of UW students during Spring 2020 varied tremendously: the challenges of online learning during the pandemic were entwined with social isolation, family demands and socioeconomic pressures. In this context, individual differences in coping mechanisms had a big impact. The findings of this paper underline the need for interventions oriented towards problem-focused coping and suggest opportunities for peer role modeling.

College from home during COVID-19: A mixed-methods study of heterogeneous experiences. Morris ME, Kuehn KS, Brown J, Nurius PS, Zhang H, Sefidgar YS, Xuhai X, Riskin EA, Dey A, Consolvo S, Mankoff JC. (2021) PLoS ONE 16(6): e0251580. (reported in UW News and the Hechtinger Report)

A lineplot showing anxiousness (Y axis, varying from 0 to 4) over time (X axis). Each student in the study is plotted as a different line over each day of the quarter. The plot overall looks very messy, but two things are clear; Every student has a very different trajectory from every other, with all of them going up and down multiple times. And the average, overall, shown is a fit line, is fairly low and slightly increasing (from about .75 to just under 1).
Heterogeneity in individuals’ levels of anxiety (reported in ESM). Individual trajectories of anxiety are shown in different line types and colors (dotted versus solid lines represent different participants). Although the mean level of anxiety is 1 on a scale of 0–4, the significant variation in responses invites examination of individuals and subgroups.

This mixed-method study examined the experiences of college students during the COVID-19 pandemic through surveys, experience sampling data collected over two academic quarters (Spring 2019 n1 = 253; Spring 2020 n2 = 147), and semi-structured interviews with 27 undergraduate students. 

There were no marked changes in mean levels of depressive symptoms, anxiety, stress, or loneliness between 2019 and 2020, or over the course of the Spring 2020 term. Students in both the 2019 and 2020 cohort who indicated psychosocial vulnerability at the initial assessment showed worse psychosocial functioning throughout the entire Spring term relative to other students. However, rates of distress increased faster in 2020 than in 2019 for these individuals. Across individuals, homogeneity of variance tests and multi-level models revealed significant heterogeneity, suggesting the need to examine not just means but the variations in individuals’ experiences. 

Thematic analysis of interviews characterizes these varied experiences, describing the contexts for students’ challenges and strategies. This analysis highlights the interweaving of psychosocial and academic distress: Challenges such as isolation from peers, lack of interactivity with instructors, and difficulty adjusting to family needs had both an emotional and academic toll. Strategies for adjusting to this new context included initiating remote study and hangout sessions with peers, as well as self-learning. In these and other strategies, students used technologies in different ways and for different purposes than they had previously. Supporting qualitative insight about adaptive responses were quantitative findings that students who used more problem-focused forms of coping reported fewer mental health symptoms over the course of the pandemic, even though they perceived their stress as more severe. 

Example quotes:

I like to build things and stuff like that. I like to see it in person and feel it. So the fact that everything was online…. I’m just basically reading all the time. I just couldn’t learn that way

Insomnia has been pretty hard for me . . .  I would spend a lot of time lying in bed not doing anything when I had a lot of homework to do the next day. So then I would become stressed about whether I’ll be able to finish that homework or not.”

“It was challenging … being independent and then being pushed back home. It’s a huge change because now you have more rules again”

For a few of my classes I feel like actually [I] was self-learning because sometimes it’s hard to sit through hours of lectures and watch it.”

I would initiate… we have a study group chat and every day I would be like ‘Hey I’m going to be on at this time starting at this time.’ So then I gave them time to all have the room open for Zoom and stuff. Okay and then any time after that they can join and then said I [would] wait like maybe 30 minutes or even an hour…. And then people join and then we work maybe … till midnight, a little bit past midnight

Interaction via Wireless Earbuds

Xuhai XuHaitian ShiXin YiWenjia LiuYukang YanYuanchun ShiAlex Mariakakis, Jennifer Mankoff, Anind K. Dey:
EarBuddy: Enabling On-Face Interaction via Wireless Earbuds. CHI 2020: 1-14

Past research regarding on-body interaction typically requires custom sensors, limiting their scalability and generalizability. We propose EarBuddy, a real-time system that leverages the microphone in commercial wireless earbuds to detect tapping and sliding gestures near the face and ears. We develop a design space to generate 27 valid gestures and conducted a user study (N=16) to select the eight gestures that were optimal for both human preference and microphone detectability. We collected a dataset on those eight gestures (N=20) and trained deep learning models for gesture detection and classification. Our optimized classifier achieved an accuracy of 95.3%. Finally, we conducted a user study (N=12) to evaluate EarBuddy’s usability. Our results show that EarBuddy can facilitate novel interaction and that users feel very positively about the system. EarBuddy provides a new eyes-free, socially acceptable input method that is compatible with commercial wireless earbuds and has the potential for scalability and generalizability

HulaMove: Waist Interaction

Xuhai XuJiahao LiTianyi YuanLiang HeXin LiuYukang YanYuntao WangYuanchun Shi, Jennifer Mankoff, Anind K. Dey:
HulaMove: Using Commodity IMU for Waist Interaction. CHI 2021: 503:1-503:16

We present HulaMove, a novel interaction technique that leverages the movement of the waist as a new eyes-free and hands-free input method for both the physical world and the virtual world. We first conducted a user study (N=12) to understand users’ ability to control their waist. We found that users could easily discriminate eight shifting directions and two rotating orientations, and quickly confirm actions by returning to the original position (quick return). We developed a design space with eight gestures for waist interaction based on the results and implemented an IMU-based real-time system. Using a hierarchical machine learning model, our system could recognize waist gestures at an accuracy of 97.5%. Finally, we conducted a second user study (N=12) for usability testing in both real-world scenarios and virtual reality settings. Our usability study indicated that HulaMove significantly reduced interaction time by 41.8% compared to a touch screen method, and greatly improved users’ sense of presence in the virtual world. This novel technique provides an additional input method when users’ eyes or hands are busy, accelerates users’ daily operations, and augments their immersive experience in the virtual world.

Detecting Depression △

A series of research projects based on the UWEXP study have focused on detecting depression in various ways. Three such papers are listed below.

Xuhai XuPrerna ChikersalJanine M. DutcherYasaman S. SefidgarWoosuk SeoMichael J. TumminiaDaniella K. VillalbaSheldon CohenKasey G. CreswellJ. David CreswellAfsaneh DoryabPaula S. NuriusEve A. RiskinAnind K. Dey, Jennifer Mankoff:
Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5(1): 41:1-41:27 (2021)

The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.

Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students. Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella Villaalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey Creswell, David Creswell, Jennifer Mankoff and Anind K. Dey. IMWUT, Article No 116. 10.1145/3351274

The rate of depression in college students is rising, which is known to increase suicide risk, lower academic performance and double the likelihood of dropping out. Researchers have used passive mobile sensing technology to assess mental health. Existing work on finding relationships between mobile sensing and depression, as well as identifying depression via sensing features, mainly utilize single data channels or simply concatenate multiple channels. There is an opportunity to identify better features by reasoning about co-occurrence across multiple sensing channels. We present a new method to extract contextually filtered features on passively collected, time-series data from mobile devices via rule mining algorithms. We first employ association rule mining algorithms on two different user groups (e.g., depression vs. non-depression). We then introduce a new metric to select a subset of rules that identifies distinguishing behavior patterns between the two groups. Finally, we consider co-occurrence across the features that comprise the rules in a feature extraction stage to obtain contextually filtered features with which to train classifiers. Our results reveal that the best model with these features significantly outperforms a standard model that uses unimodal features by an average of 9.7% across a variety of metrics. We further verified the generalizability of our approach on a second dataset, and achieved very similar results.

Chikersal, P., Doryab, A., Tumminia, M., Villalba, D., Dutcher, J., Liu, X., Cohen, S., Creswell, K., Mankoff, J., Creswell, D., Goel, M., & Dey, A. “Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust Feature Selection.” ACM Transactions on Computer-Human Interaction (TOCHI), 2020.

We present a machine learning approach that uses data from smartphones and ftness trackers of 138 college students to identify students that experienced depressive symptoms at the end of the semester and students whose depressive symptoms worsened over the semester. Our novel approach is a feature extraction technique that allows us to select meaningful features indicative of depressive symptoms from longitudinal data. It allows us to detect the presence of post-semester depressive symptoms with an accuracy of 85.7% and change in symptom severity with an accuracy of 85.4%. It also predicts these outcomes with an accuracy of >80%, 11-15 weeks before the end of the semester, allowing ample time for preemptive interventions. Our work has signifcant implications for the detection of health outcomes using longitudinal behavioral data and limited ground truth. By detecting change and predicting symptoms several weeks before their onset, our work also has implications for preventing depression.

Shows barchart of import of different features onetecting change in depression
Bar chart shows value of baseline, bluetooth, calls, campus map, location, phone usage, sleep and step features on detecting change in depression. the best set leads to 85.4% accuracy; all features except bluetooth and calls improve on baseline accuracy of 65.9%

Clench Interaction: Biting As Input

Shows human faced diagraming where the clench sensor should be placed between the teeth; the settings for correctly sensing clench, and the hardware platform used.
Xuhai Xu, Chun Yu, Anind K. Dey, Jennifer Mankoff
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19)
People eat every day and biting is one of the most fundamental and natural actions that they perform on a daily basis. Existing work has explored tooth click location and jaw movement as input techniques, however clenching has the potential to add control to this input channel. We propose clench interaction that leverages clenching as an actively controlled physiological signal that can facilitate interactions. We conducted a user study to investigate users’ ability to control their clench force. We found that users can easily discriminate three force levels, and that they can quickly confirm actions by unclenching (quick release). We developed a design space for clench interaction based on the results and investigated the usability of the clench interface. Participants preferred the clench over baselines and indicated a willingness to use clench-based interactions. This novel technique can provide an additional input method in cases where users’ eyes or hands are busy, augment immersive experiences such as virtual/augmented reality, and assist individuals with disabilities.