Detecting Depression △

A series of research projects based on the UWEXP study have focused on detecting depression in various ways. Three such papers are listed below.

Xuhai XuPrerna ChikersalJanine M. DutcherYasaman S. SefidgarWoosuk SeoMichael J. TumminiaDaniella K. VillalbaSheldon CohenKasey G. CreswellJ. David CreswellAfsaneh DoryabPaula S. NuriusEve A. RiskinAnind K. Dey, Jennifer Mankoff:
Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5(1): 41:1-41:27 (2021)

The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future.

Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students. Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella Villaalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey Creswell, David Creswell, Jennifer Mankoff and Anind K. Dey. IMWUT, Article No 116. 10.1145/3351274

The rate of depression in college students is rising, which is known to increase suicide risk, lower academic performance and double the likelihood of dropping out. Researchers have used passive mobile sensing technology to assess mental health. Existing work on finding relationships between mobile sensing and depression, as well as identifying depression via sensing features, mainly utilize single data channels or simply concatenate multiple channels. There is an opportunity to identify better features by reasoning about co-occurrence across multiple sensing channels. We present a new method to extract contextually filtered features on passively collected, time-series data from mobile devices via rule mining algorithms. We first employ association rule mining algorithms on two different user groups (e.g., depression vs. non-depression). We then introduce a new metric to select a subset of rules that identifies distinguishing behavior patterns between the two groups. Finally, we consider co-occurrence across the features that comprise the rules in a feature extraction stage to obtain contextually filtered features with which to train classifiers. Our results reveal that the best model with these features significantly outperforms a standard model that uses unimodal features by an average of 9.7% across a variety of metrics. We further verified the generalizability of our approach on a second dataset, and achieved very similar results.

Chikersal, P., Doryab, A., Tumminia, M., Villalba, D., Dutcher, J., Liu, X., Cohen, S., Creswell, K., Mankoff, J., Creswell, D., Goel, M., & Dey, A. “Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust Feature Selection.” ACM Transactions on Computer-Human Interaction (TOCHI), 2020.

We present a machine learning approach that uses data from smartphones and ftness trackers of 138 college students to identify students that experienced depressive symptoms at the end of the semester and students whose depressive symptoms worsened over the semester. Our novel approach is a feature extraction technique that allows us to select meaningful features indicative of depressive symptoms from longitudinal data. It allows us to detect the presence of post-semester depressive symptoms with an accuracy of 85.7% and change in symptom severity with an accuracy of 85.4%. It also predicts these outcomes with an accuracy of >80%, 11-15 weeks before the end of the semester, allowing ample time for preemptive interventions. Our work has signifcant implications for the detection of health outcomes using longitudinal behavioral data and limited ground truth. By detecting change and predicting symptoms several weeks before their onset, our work also has implications for preventing depression.

Shows barchart of import of different features onetecting change in depression
Bar chart shows value of baseline, bluetooth, calls, campus map, location, phone usage, sleep and step features on detecting change in depression. the best set leads to 85.4% accuracy; all features except bluetooth and calls improve on baseline accuracy of 65.9%

Gender in Online Doctor Reviews

Dunivin Z, Zadunayski L, Baskota U, Siek K, Mankoff J. Gender, Soft Skills, and Patient Experience in Online Physician Reviews: A Large-Scale Text Analysis. Journal of Medical Internet Research. 2020;22(7):e14455.

This study examines 154,305 Google reviews from across the United States for all medical specialties. Many patients use online physician reviews but we need to understand effects of gender on review content. Reviewer gender was inferred from names.

Reviews were coded for overall patient experience (negative or positive) by collapsing a 5-star scale and for general categories (process, positive/negative soft skills). We estimated binary regression models to examine relationships between physician rating, patient experience themes, physician gender, and reviewer gender.

We found considerable bias against female physicians: Reviews of female physicians were considerably more negative than those of male physicians (OR 1.99; P<.001). Critiques of female physicians more often focused on soft skills such as amicability, disrespect and candor. Negative reviews typically have words such as “rude, arrogant, and condescending”

Reviews written by female patients were also more likely to mention disrespect (OR 1.27, P<.001), but female patients were less likely to report disrespect from female doctors than expected.

Finally, patient experiences with the bureaucratic process also impacted reviews. This includes issues like cost of care. Overall, lower patient satisfaction is correlated with high physician dominance (e.g., poor information sharing or using medical jargon)

Limitations of our work include the lack of definitive (or non-binary) information about gender; and the fact that we do not know about the actual outcomes of treatment for reviewers.

Even so, it seems critical that readers attend to the who the reviewers are when reading online reviews. Review sites may also want to provide information about gender differences, control for gender when presenting composite ratings for physicians, and helping users write less biased reviews. Reviewers should be aware of their own gender biases and assess reviews for this (http://slowe.github.io/genderbias/).

Detecting Loneliness

Feelings of loneliness are associated with poor physical and mental health. Detection of loneliness through passive sensing on personal devices can lead to the development of interventions aimed at decreasing rates of loneliness.

Doryab, Afsaneh, et al. “Identifying Behavioral Phenotypes of Loneliness and Social Isolation with Passive Sensing: Statistical Analysis, Data Mining and Machine Learning of Smartphone and Fitbit Data.” JMIR mHealth and uHealth 7.7 (2019): e13209.

Objective: The aim of this study was to explore the potential of using passive sensing to infer levels of loneliness and to identify the corresponding behavioral patterns.

Methods: Data were collected from smartphones and Fitbits (Flex 2) of 160 college students over a semester. The participants completed the University of California, Los Angeles (UCLA) loneliness questionnaire at the beginning and end of the semester. For a classification purpose, the scores were categorized into high (questionnaire score>40) and low (≤40) levels of loneliness. Daily features were extracted from both devices to capture activity and mobility, communication and phone usage, and sleep behaviors. The features were then averaged to generate semester-level features. We used 3 analytic methods: (1) statistical analysis to provide an overview of loneliness in college students, (2) data mining using the Apriori algorithm to extract behavior patterns associated with loneliness, and (3) machine learning classification to infer the level of loneliness and the change in levels of loneliness using an ensemble of gradient boosting and logistic regression algorithms with feature selection in a leave-one-student-out cross-validation manner.

Results: The average loneliness score from the presurveys and postsurveys was above 43 (presurvey SD 9.4 and postsurvey SD 10.4), and the majority of participants fell into the high loneliness category (scores above 40) with 63.8% (102/160) in the presurvey and 58.8% (94/160) in the postsurvey. Scores greater than 1 standard deviation above the mean were observed in 12.5% (20/160) of the participants in both pre- and postsurvey scores. The majority of scores, however, fell between 1 standard deviation below and above the mean (pre=66.9% [107/160] and post=73.1% [117/160]).

Our machine learning pipeline achieved an accuracy of 80.2% in detecting the binary level of loneliness and an 88.4% accuracy in detecting change in the loneliness level. The mining of associations between classifier-selected behavioral features and loneliness indicated that compared with students with low loneliness, students with high levels of loneliness were spending less time outside of campus during evening hours on weekends and spending less time in places for social events in the evening on weekdays (support=17% and confidence=92%). The analysis also indicated that more activity and less sedentary behavior, especially in the evening, was associated with a decrease in levels of loneliness from the beginning of the semester to the end of it (support=31% and confidence=92%).

Conclusions: Passive sensing has the potential for detecting loneliness in college students and identifying the associated behavioral patterns. These findings highlight intervention opportunities through mobile technology to reduce the impact of loneliness on individuals’ health and well-being.

News: Smartphones and Fitbits can spot loneliness in its tracks, Science 101

Passively-sensing Discrimination

See the UW News article featuring this study!

A deeper understanding of how discrimination impacts psychological health and well-being of students would allow us to better protect individuals at risk and support those who encounter discrimination. While the link between discrimination and diminished psychological and physical well-being is well established, existing research largely focuses on chronic discrimination and long-term outcomes. A better understanding of the short-term behavioral correlates of discrimination events could help us to concretely quantify the experience, which in turn could support policy and intervention design. In this paper we specifically examine, for the first time, what behaviors change and in what ways in relation to discrimination. We use actively-reported and passively-measured markers of health and well-being in a sample of 209 first-year college students over the course of two academic quarters. We examine changes in indicators of psychological state in relation to reports of unfair treatment in terms of five categories of behaviors: physical activity, phone usage, social interaction, mobility, and sleep. We find that students who encounter unfair treatment become more physically active, interact more with their phone in the morning, make more calls in the evening, and spend less time in bed on the day of the event. Some of these patterns continue the next day.

Passively-sensed Behavioral Correlates of Discrimination Events in College Students. Yasaman S. Sefidgar, Woosuk Seo, Kevin S. Kuehn, Tim Althoff, Anne Browning, Eve Ann Riskin, Paula S. Nurius, Anind K Dey, Jennifer Mankoff. CSCW 2019.

A bar plot sorted by number of reports, with about 100 reports of unfair treatment based on national origin, 90 based on intelligence, 70 based on gender, 60 based on apperance, 50 on age, 45 on sexual orientation, 35 on major, 30 on weight, 30 on height, 20 on income, 10 on disability, 10 on religion, and 10 on learning
Breakdown of 448 reports of unfair treatment by type. National, Orientation, and Learning refer to ancestry or national origin, sexual orientation, and learning disability respectively. See Table 3 for details of all categories. Participants were able to report multiple incidents of unfair treatment, possibly of different types, in each report. As described in the paper, we do not have data on unfair treatment based on race.
A heatplot showing sensor data collected by day in 5 categories: Activity, screen, locations, fitbit, and calls.
A heatplot showing compliance with sensor data collection. Sensor data availability for each day of the study is shown in terms of the number of participants whose data is available on a given day. Weeks of the study are marked on the horizontal axis while different sensors appear on the vertical axis. Important calendar dates (e.g., start / end of the quarter and exam periods) are highlighted as are the weeks of daily surveys. The brighter the cells for a sensor the larger the number of people contributing data for that sensor. Event-based sensors (e.g., calls) are not as bright as sensors continuously sampled (e.g., location) as expected. There was a technical issue in the data collection application in the middle of study, visible as a dark vertical line around the beginning of April.
A diagram showing compliance in surveys, organized by nweek of study. One line shows compliance in the large surveys given at pre, mid and post, which drops from 99% to 94% to 84%. The other line shows average weekly compliance in EMAs, which goes up in the second week to 93% but then drops slowly (with some variability) to 89%
Timeline and completion rate of pre, mid, and post questionnaires as well as EMA surveys. Y axis
shows the completion rates and is narrowed to the range 50-100%. The completion rate of pre, mid, and post questionnaires are percentages of the original pool of 209 participants, whereas EMA completion rates are based on the 176 participants who completed the study. EMA completion rates are computed as the average completion rate of the surveys administered in a certain week of the study. School-related events (i.e., start and end of quarters as well as exam periods) are marked. Dark blue bars (Daily Survey) show the weeks when participants answered surveys every day, four times a day
Barplot showing significance of morning screen use, calls, minutes asleep, time in bed, range of activities, number of steps, anxiety, depression, and frustration on the day before, of, and after unfair treatment. All but minutes asleep are significant at p=.05 or below on the day of discrimination, but this drops off after.
Patterns of feature significance from the day before to two days after the discrimination event. The
shortest bars represent the highest significance values (e.g., depressed and frustrated on day 0; depressed on day 1; morning screen use on day 2). There are no significant differences the day before. Most short-term relationships exist on the day of the event, a few appear on the next day (day 1). On the third day one
significant difference, repeated, from the first day is observed.

Lyme Disease’s Heterogeneous Impact

An ongoing, and very personal thread of research that our group engages in (due to my own journey with Lyme Disease, which I occasionally blog about here) is research into the impacts of Lyme Disease and opportunities for helping to support patients with Lyme Disease. From a patient perspective, Lyme disease is as tough to deal with as many other more well known conditions [1].

Lyme disease can be difficult to navigate because of the disagreements about its diagnosis and the disease process. In addition, it is woefully underfunded and understudied, given that the CDC estimates around 300,000 new cases occur per year (similar to the rate of breast cancer) [2].

Bar chart showing that Lyme disease is woefully under studied.

As an HCI researcher, I started out trying to understand the relationship that Lyme Disease patients have with digital technologies. For example, we studied the impact of conflicting information online on patients [3] and how patients self-mediate the accessibility of online content [4]. It is my hope to eventually begin exploring technologies that can improve quality of life as well.

However, one thing patients need right away is peer reviewed evidence about the impact that Lyme disease has on patients (e.g. [3]) and the value of treatment for patients (e.g. [4]). Here, as a technologist, the opportunity is to work with big data (thousands of patient reports) to unpack trends and model outcomes in new ways. That research is still in the formative stages, but in our most recent publication [4] we use straightforward subgroup analysis to demonstrate that treatment effectiveness is not adequately captured simply by looking at averages.

This chart shows that there is a large subgroup (about a third) of respondents to our survey who reported positive response to treatment, even though the average response was not positive.

There are many opportunities and much need for further data analysis here, including documenting the impact of differences such as gender on treatment (and access to treatment), developing interventions that can help patients to track symptoms, manage interaction within and between doctors, and navigate accessibility and access issues.

[1] Johnson, L., Wilcox, S., Mankoff, J., & Stricker, R. B. (2014). Severity of chronic Lyme disease compared to other chronic conditions: a quality of life survey. PeerJ2, e322.

[2] Johnson, L., Shapiro, M. & Mankoff, J. Removing the mask of average treatment effects in chronic Lyme Disease research using big data and subgroup analysis.

[3] Mankoff, J., Kuksenok, K., Kiesler, S., Rode, J. A., & Waldman, K. (2011, May). Competing online viewpoints and models of chronic illness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 589-598). ACM.

[4] Kuksenok, K., Brooks, M., & Mankoff, J. (2013, April). Accessible online content creation by end users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 59-68). ACM.

 

Hypertension recognition through overnight Heart Rate Variability sensing

Ni, H., Cho, S., Mankoff, J., & Yang, J. (2017). Automated recognition of hypertension through overnight continuous HRV monitoring. Journal of Ambient Intelligence and Humanized Computing, 1-13.

Hypertension is a common and chronic disease, caused by high blood pressure. Since hypertension often has no warning signs or symptoms, many cases remain undiagnosed. Untreated or sub-optimally controlled hypertension may lead to cardiovascular, cerebrovascular and renal morbidity and mortality, along with dysfunction of the autonomic nervous system. Therefore, it could be quite valuable to predict or provide early warnings about hypertension. Heart rate variability (HRV) analysis has emerged as the most valuable non-invasive test to assess autonomic nervous system function, and has great potential for detecting hypertension. However, HRV indicators may be subtle and present at random, resulting in two challenges: how to support continuous monitoring for hours at a time while being unobtrusive, and how to efficiently analyze the collected data to minimize data collection and user burden. In this paper, we present a machine learning-based approach for detecting hypertension, using a waist belt continuous sensing system that is worn overnight. Using 24 hypertension patients and 24 healthy controls, we demonstrate that our approach can differentiate hypertension patients from healthy controls with 93.33% accuracy. This represents a promising approach for performing hypertension classification in the field, and also we would improve its performance based on a large number of hypertensive subjects monitored by the proposed pervasive sensors.

Infant Oxygen Monitoring

Hospitalized children on continuous oxygen monitors generate >40,000 data points per patient each day. These data do not show context or reveal trends over time, techniques proven to improve comprehension and use. Management of oxygen in hospitalized patients is suboptimal—premature infants spend >40% of each day outside of evidence-based oxygen saturation ranges and weaning oxygen is delayed in infants with bronchiolitis who are physiologically ready. Data visualizations may improve user knowledge of data trends and inform better decisions in managing supplemental oxygen delivery.

First, we studied the workflows and breakdowns for nurses and respiratory therapists (RTs) in the supplemental oxygen delivery of infants with respiratory disease. Secondly, using end-user design we developed a data display that informed decision-making in this context. Our ultimate goal is to improve the overall work process using a combination of visualization and machine learning.

Visualization mockup for displaying O2 saturation over time to nurses.
Visualization mockup for displaying O2 saturation over time to nurses.

Severity of Chronic Lyme Disease

Johnson, L., Wilcox, S., Mankoff, J., & Stricker, R. B. (2014). Severity of chronic Lyme disease compared to other chronic conditions: a quality of life survey. PeerJ, 2, e322.

The Centers for Disease Control and Prevention (CDC) health-related quality of life (HRQoL) indicators are widely used in the general population to determine the burden of disease, identify health needs, and direct public health policy. These indicators also allow the burden of illness to be compared across different diseases. Although Lyme disease has recently been acknowledged as a major health threat in the USA with more than 300,000 new cases per year, no comprehensive assessment of the health burden of this tickborne disease is available. This study assesses the HRQoL of patients with chronic Lyme disease (CLD) and compares the severity of CLD to other chronic conditions.

fig-2-1x

Competing Online Viewpoints and Models of Chronic Illness

People with chronic health problems use online resources to understand and manage their condition, but many such resources can present competing and confusing viewpoints. We surveyed and interviewed with people experiencing prolonged symptoms after a Lyme disease diagnosis. We explore how competing viewpoints in online content affect participants’ understanding of their disease. Our results illustrate how chronically ill people search for information and support, and work to help others over time. Participant identity and beliefs about their illness evolved, and this led many to take on new roles, creating content and advising others who were sick. What we learned about online content creation suggests a need for designs that support this journey and engage with complex issues surrounding online health resources.

Jennifer Mankoff, Kit KuksenokSara B. KieslerJennifer A. RodeKelly Waldman:
Competing online viewpoints and models of chronic illness.CHI 2011: 589-598