Notably Inaccessible

Venkatesh Potluri, Sudheesh Singanamalla, Nussara Tieanklin, Jennifer Mankoff: Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility. ASSETS 2023: 13:1-13:19

Computational notebooks are tools that help people explore, analyze data, and create stories about that data. They are the most popular choice for data scientists. People use software like Jupyter, Datalore, and Google Colab to work with these notebooks in universities and companies.

There is a lot of research on how data scientists use these notebooks and how to help them work together better. But there is not much information about the problems faced by blind and visually impaired (BVI) users. BVI users have difficulty using these notebooks because:

  • The interfaces are not accessible.
  • The way data is shown is not user-friendly for them.
  • Popular libraries do not provide outputs they can use.

We analyzed 100,000 Jupyter notebooks to find accessibility problems. We looked for issues that affect how these notebooks are created and read. From our study, we give advice on how to make notebooks more accessible. We suggest ways for people to write better notebooks and changes to make the notebook software work better for everyone.

Domain Specific Metaheuristic Optimization

For non-technical domain experts and designers it can be a substantial challenge to create designs that meet domain specific goals. This presents an opportunity to create specialized tools that produce optimized designs in the domain. However, implementing domain specific optimization methods requires a rare combination of programming and domain expertise. Creating flexible design tools with re-configurable optimizers that can tackle a variety of problems in a domain requires even more domain and programming expertise. We present OPTIMISM, a toolkit which enables programmers and domain experts to collaboratively implement an optimization component of design tools. OPTIMISM supports the implementation of metaheuristic optimization methods by factoring them into easy to implement and reuse components: objectives that measure desirable qualities in the domain, modifiers which make useful changes to designs, design and modifier selectors which determine how the optimizer steps through the search space, and stopping criteria that determine when to return results. Implementing optimizers with OPTIMISM shifts the burden of domain expertise from programmers to domain experts.

Megan Hofmann, Nayha Auradkar, Jessica Birchfield, Jerry Cao, Autumn G. Hughes, Gene S.-H. Kim, Shriya Kurpad, Kathryn J. Lum, Kelly Mack, Anisha Nilakantan, Margaret Ellen Seehorn, Emily Warnock, Jennifer Mankoff, Scott E. Hudson: OPTIMISM: Enabling Collaborative Implementation of Domain Specific Metaheuristic Optimization. CHI 2023: 709:1-709:19

Personalized behavior modeling: depression detection

Xuhai Xu, Prerna Chikersal, Janine M. Dutcher, Yasaman S. Sefidgar, Seo Woosuk, Michael J. Tumminia, Daniella K. Villalba, Sheldon Cohen,
Kasey G. Creswell, Creswell, J. David, Afsaneh Doryab, Paula S. Nurius, Eve Riskin, Anind K. Dey, & Jennifer Mankoff. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM interact. mob. wearable ubiquitous technol., Article 41. (March 21) 27pages.

The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interoperability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.

Practices and Needs of Mobile Sensing Researchers

Passive mobile sensing for the purpose of human state modeling is a fast-growing area. It has been applied to solve a wide range of behavior-related problems, including physical and mental health monitoring, affective computing, activity recognition, routine modeling, etc. However, in spite of the emerging literature that has investigated a wide range of application scenarios, there is little work focusing on the lessons learned by researchers, and on guidance for researchers to this approach. How do researchers conduct these types of research studies? Is there any established common practice when applying mobile sensing across different application areas? What are the pain points and needs that they frequently encounter? Answering these questions is an important step in the maturing of this growing sub-field of ubiquitous computing, and can benefit a wide range of audiences. It can serve to educate researchers who have growing interests in this area but have little to no previous experience. Intermediate researchers may also find the results interesting and helpful for reference to improve their skills. Moreover, it can further shed light on the design guidelines for a future toolkit that could facilitate research processes being used. In this paper, we fill this gap and answer these questions by conducting semi-structured interviews with ten experienced researchers from four countries to understand their practices and pain points when conducting their research. Our results reveal a common pipeline that researchers have adopted, and identify major challenges that do not appear in published work but that researchers often encounter. Based on the results of our interviews, we discuss practical suggestions for novice researchers and high-level design principles for a toolkit that can accelerate passive mobile sensing research.

Understanding practices and needs of researchers in human state modeling by passive mobile sensing. Xu, Xuhai, Jennifer Mankoff, and Anind K. Dey. CCF Transactions on Pervasive Computing and Interaction (2021): 1-23.

Detecting Depression △

A series of research projects based on the UWEXP study have focused on detecting depression in various ways. Three such papers are listed below.

Xuhai XuPrerna ChikersalJanine M. DutcherYasaman S. SefidgarWoosuk SeoMichael J. TumminiaDaniella K. VillalbaSheldon CohenKasey G. CreswellJ. David CreswellAfsaneh DoryabPaula S. NuriusEve A. RiskinAnind K. Dey, Jennifer Mankoff:
Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5(1): 41:1-41:27 (2021)

The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.

Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students. Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella Villaalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey Creswell, David Creswell, Jennifer Mankoff and Anind K. Dey. IMWUT, Article No 116. 10.1145/3351274

The rate of depression in college students is rising, which is known to increase suicide risk, lower academic performance and double the likelihood of dropping out. Researchers have used passive mobile sensing technology to assess mental health. Existing work on finding relationships between mobile sensing and depression, as well as identifying depression via sensing features, mainly utilize single data channels or simply concatenate multiple channels. There is an opportunity to identify better features by reasoning about co-occurrence across multiple sensing channels. We present a new method to extract contextually filtered features on passively collected, time-series data from mobile devices via rule mining algorithms. We first employ association rule mining algorithms on two different user groups (e.g., depression vs. non-depression). We then introduce a new metric to select a subset of rules that identifies distinguishing behavior patterns between the two groups. Finally, we consider co-occurrence across the features that comprise the rules in a feature extraction stage to obtain contextually filtered features with which to train classifiers. Our results reveal that the best model with these features significantly outperforms a standard model that uses unimodal features by an average of 9.7% across a variety of metrics. We further verified the generalizability of our approach on a second dataset, and achieved very similar results.

Chikersal, P., Doryab, A., Tumminia, M., Villalba, D., Dutcher, J., Liu, X., Cohen, S., Creswell, K., Mankoff, J., Creswell, D., Goel, M., & Dey, A. “Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust Feature Selection.” ACM Transactions on Computer-Human Interaction (TOCHI), 2020.

We present a machine learning approach that uses data from smartphones and ftness trackers of 138 college students to identify students that experienced depressive symptoms at the end of the semester and students whose depressive symptoms worsened over the semester. Our novel approach is a feature extraction technique that allows us to select meaningful features indicative of depressive symptoms from longitudinal data. It allows us to detect the presence of post-semester depressive symptoms with an accuracy of 85.7% and change in symptom severity with an accuracy of 85.4%. It also predicts these outcomes with an accuracy of >80%, 11-15 weeks before the end of the semester, allowing ample time for preemptive interventions. Our work has signifcant implications for the detection of health outcomes using longitudinal behavioral data and limited ground truth. By detecting change and predicting symptoms several weeks before their onset, our work also has implications for preventing depression.

Shows barchart of import of different features onetecting change in depression
Bar chart shows value of baseline, bluetooth, calls, campus map, location, phone usage, sleep and step features on detecting change in depression. the best set leads to 85.4% accuracy; all features except bluetooth and calls improve on baseline accuracy of 65.9%

Detecting Loneliness

Feelings of loneliness are associated with poor physical and mental health. Detection of loneliness through passive sensing on personal devices can lead to the development of interventions aimed at decreasing rates of loneliness.

Doryab, Afsaneh, et al. “Identifying Behavioral Phenotypes of Loneliness and Social Isolation with Passive Sensing: Statistical Analysis, Data Mining and Machine Learning of Smartphone and Fitbit Data.” JMIR mHealth and uHealth 7.7 (2019): e13209.

Objective: The aim of this study was to explore the potential of using passive sensing to infer levels of loneliness and to identify the corresponding behavioral patterns.

Methods: Data were collected from smartphones and Fitbits (Flex 2) of 160 college students over a semester. The participants completed the University of California, Los Angeles (UCLA) loneliness questionnaire at the beginning and end of the semester. For a classification purpose, the scores were categorized into high (questionnaire score>40) and low (≤40) levels of loneliness. Daily features were extracted from both devices to capture activity and mobility, communication and phone usage, and sleep behaviors. The features were then averaged to generate semester-level features. We used 3 analytic methods: (1) statistical analysis to provide an overview of loneliness in college students, (2) data mining using the Apriori algorithm to extract behavior patterns associated with loneliness, and (3) machine learning classification to infer the level of loneliness and the change in levels of loneliness using an ensemble of gradient boosting and logistic regression algorithms with feature selection in a leave-one-student-out cross-validation manner.

Results: The average loneliness score from the presurveys and postsurveys was above 43 (presurvey SD 9.4 and postsurvey SD 10.4), and the majority of participants fell into the high loneliness category (scores above 40) with 63.8% (102/160) in the presurvey and 58.8% (94/160) in the postsurvey. Scores greater than 1 standard deviation above the mean were observed in 12.5% (20/160) of the participants in both pre- and postsurvey scores. The majority of scores, however, fell between 1 standard deviation below and above the mean (pre=66.9% [107/160] and post=73.1% [117/160]).

Our machine learning pipeline achieved an accuracy of 80.2% in detecting the binary level of loneliness and an 88.4% accuracy in detecting change in the loneliness level. The mining of associations between classifier-selected behavioral features and loneliness indicated that compared with students with low loneliness, students with high levels of loneliness were spending less time outside of campus during evening hours on weekends and spending less time in places for social events in the evening on weekdays (support=17% and confidence=92%). The analysis also indicated that more activity and less sedentary behavior, especially in the evening, was associated with a decrease in levels of loneliness from the beginning of the semester to the end of it (support=31% and confidence=92%).

Conclusions: Passive sensing has the potential for detecting loneliness in college students and identifying the associated behavioral patterns. These findings highlight intervention opportunities through mobile technology to reduce the impact of loneliness on individuals’ health and well-being.

News: Smartphones and Fitbits can spot loneliness in its tracks, Science 101

Orson (Xuhai) Xu (PhD, co-advised with Anind Dey)

Orson is a Ph.D. student working with Jennifer Mankoff  and Anind K. Dey in the Information School at the University of Washington – Seattle. Prior to joining UW, he obtained his Bachelor’s degrees in Industrial Engineering (major) and Computer Science (minor) from Tsinghua University in 2018. While at Tsinghua, he received Best Paper Honorable Mentioned Award (CHI 2018), Person of the Year Award and Outstanding Undergraduate Awards. His research focuses on two aspects in the intersection of human-computer interaction, ubiquitous computing and machine learning: 1) the modeling of human behavior such as routine behavior and 2) novel interaction techniques.

Visit Orson’s homepage at :

Some recent projects (see more)


eDigs logoJennifer MankoffDimeji OnafuwaKirstin EarlyNidhi VyasVikram Kamath:
Understanding the Needs of Prospective Tenants. COMPASS 2018: 36:1-36:10

EDigs is a research project group in Carnegie Mellon University working on sustainability. Our research is focused on helping people find a perfect rental through machine learning and user research.

We sometimes study how our members use EDigs in order to learn how to build software support for successful social communities.

eDigs websiteScreenshot of showing a mobile app, facebook and twitter feeds, and information about it.

Modeling Human Routines

Modeling and Understanding Human Routine Behavior

Human routines are blueprints of behavior, which allow people to accomplish their purposeful repetitive tasks and activities. People express their routines through actions that they perform in the particular situations that triggered those actions. An ability to model routines and understand the situations in which they are likely to occur could allow technology to help people improve their bad habits, inexpert behavior, and other suboptimal routines. In this project we explore generalizable routine modeling approaches that encode patterns of routine behavior in ways that allow systems, such as smart agents, to classify, predict, and reason about human actions under the inherent uncertainty present in human behavior. Such technologies can have a positive effect on society by making people healthier, safer, and more efficient in their routine tasks.


Modeling and Understanding Human Routine Behavior
Nikola Banovic, Tofi Buzali, Fanny Chevalier, Jennifer Mankoff, and Anind K. Dey
In Proceedings of the 2016 ACM annual conference on Human Factors in Computing Systems(CHI ’16). ACM, New York, NY, USA.
Honorable Mention Award

Dynamic question ordering

In recent years, surveys have been shifting online, offering the possibility for adaptive questions, where later questions depend on responses to earlier questions. We present a general framework for dynamically ordering questions, based on previous responses, to engage respondents, improving survey completion and imputation of unknown items. Our work considers two scenarios for data collection from survey-takers. In the first, we want to maximize survey completion (and the quality of necessary imputations) and so we focus on ordering questions to engage the respondent and collect hopefully all the information we seek, or at least the information that most characterizes the respondent so imputed values will be accurate. In the second scenario, our goal is to give the respondent a personalized prediction, based on information they provide. Since it is possible to give a reasonable prediction with only a subset of questions, we are not concerned with motivating the user to answer all questions. Instead, we want to order questions so that the user provides information that most reduces the uncertainty of our prediction, while not being too burdensome to answer.

Kirstin Early, Stephen E. Fienberg, Jennifer Mankoff. (2016). Test time feature ordering with FOCUS: Interactive predictions with minimal user burden. In Proceedings of 2016 ACM Conference on Pervasive and Ubiquitous ComputingHonorable Mention: Top 5% of submissions. Talk slides.