The Data Pipeline

I’ve been teaching a course on using data for three years now, and it feels ‘finished’ enough that it is perhaps worth writing about. When I began the course, I had the goal of (1) exploring the human side of data, meaning the ways in which what we know about people impact our ability to make, process, interpret, and use data (2) make the course accessible to a broad variety of people (not just programmers) and (3) organize the course around modules that would produce nice portfolio items. The result is the course at data.cmubi.org.  While it has evolved over the years, it’s always included at least a few beginner programmers, and the projects have been interesting opportunities for students to explore issues like interactive machine learning, data visualization, and topic areas they care a great deal about.

A big emphasis on the course is on data cleaning –understanding deeply the flaws in your data, from bias in data collection to missing values in data files. Many (hopefully most) of the projects below have significant sections documenting their sources and efforts / decision making around this topic.

Another big emphasis in the course is on understanding what the data will be used for, and by whom. Tied to this, we talk extensively about intelligibility in machine learning, the importance of narrative in visualization (and visualization in general), and the importance of defining the question you are answering.

Here are some of the highlights over the last three years:

Screen Shot 2016-05-09 at 12.51.04 PM

Bus bunching is a phenomenon that can impact bus wait times. One of my 2016 students has been collecting data and extensively studying the phenomenon. His final project in the class drew on this data set and explores visual representations of the phenomenon.

 

Screen Shot 2016-05-09 at 12.56.20 PMYelp data is always an area of interest. In 2014 … In 2015 students explored which state has the best pizza :). In 2016, the ‘Bon Yinzers‘ developed a wonderful series of visualizations of factors that affect popularity of Pittsburgh restaurants. They uncovered some interesting phenomena such as the unexpectedly off-cycle checkin times of the most active Yelp users in Pittsburgh.

Screen Shot 2016-05-09 at 1.00.13 PMSan Francisco Crime Alert explores the likelihood of different types of  Crime in different SF area neighborhoods. Their prediction algorithm gives you a way to explore the prevelance of major and minor crime in terms of time of year, time of day, and location.

Screen Shot 2016-05-09 at 1.03.50 PMIn 2015, a group collected and analyzed data set of tweets by potential ISIS supporters, with the goal of ultimately engaging others in helping to label such data and understand how ISIS supporter accounts differ from other accounts with sometimes similar tweets (e.g. news accounts or bloggers).

Screen Shot 2016-05-09 at 1.12.00 PMOften, a goal of class students is more about policy than about end users. In 2015 Healt$care explores the quality of healthcare and its relationship to dollars spent across the U.S. in a highly visual fashion.

 

Screen Shot 2016-05-09 at 1.17.31 PMIn 2014, a group asked what jobs are popular in what parts of the US?. Again a combination of data visualization and prediction supports exploration of the question. A similar approach was explored by a 2014 group that collected data about movie piracy and its relationship to DVD release strategies.

Sadly, not all of the older projects still work (web standards change so fast!). I wish I could provide links to work such as the Reddit AMA visualization pictured here.

Screen Shot 2015-05-04 at 4.08.56 PM

 

AMIA trip report

I have been curious about AMIA for some time, and was even invited to be part of a panel submission to it this year. So when I realized it was only a few hours’ drive away, I took advantage of the closeness to plan a last minute trip. It has been an interesting experience and well worth the attendance. Although a very large conference, the group of people attending seems to be friendly and open, and I was welcomed in particular by two wonderful women I met, Bonnie Kaplan and Patti Brennan. The sessions are an intriguing combination of computer science, medicine, and clinical practice (with the quality of/knowledge about each varying based on the expertise/presence of appropriate collaborators).  I attended sessions on Monday, Tuesday, and Wednesday. The theme that stood out to me more than any other across my experiences here was the great diversity of stakeholders that are (and especially that should be) considered in the design of effective health IT. Some very interesting observations came out of the large scale analysis of clinical data that I saw discussed on Monday. For example, there is a lot of attention being paid to data privacy (although one person commented that this is commonly misunderstood as “Uniqueness is not synonymous with being identified”) and particularly how to scrub data so that it can “get past IRB” for further analysis. One interesting approach taken by N. Shah (Learning Practice-based Evidence from unstructured Clinical Notes; Session S22) is to extract the terms (features) and use those instead of the data itself. Of course a limitation here is that you have to think of features ahead of time.

Another interesting topic that came up repeatedly is the importance of defining the timeline of the data as much as the timeline of the person. Questions that need to be answered include what is time zero in the data being analyzed (and what might be missing as a result); what is the exit cause, or end moment for each person in the database (and who is being left out / what is the bias as a result?); and the observation that in general “sick people make more data.” To this I would add that if you attempt to address these biases by collecting information there is potentially selection bias in the subjects and the impact of the burden of sensing on the data producer. Connected to this is the ongoing questions of the benefits and problems of a single unique identifier as a way of connecting health information.

Last observation from Monday is the question of what public data sets are out there that we should make ourselves aware of. For example, MIT has big data medical initiative and (also see http://groups.csail.mit.edu/medg/) and may have a clinical notes data set associated with it (I am still looking for this).

On Tuesday I started the day with S44: year in review  (D. Masys). I missed the very start of it, but came in when he was talking about studies of IT’s use in improving clinical practice, such as a study showing that reminding clinicians to do their work better improves patient outcomes (“physician alerts” “embedded in EHR systems” etc), or maybe just improves process, with the observation that we should measure both. Interestingly to me, the question of also improving process and outcomes by organizing the work of caregivers (and reminding them of things) was missing from this discussion.

Dr. Masys then moved on to explore unexpected consequences of IT that had been published: adding virtual reality caused “surgeon blindness” to some information; missed lab results in another study and alert fatigue in another (drug-drug interactions suffer from 90% overrides…). Given the difficulty of publishing negative results, it would be interesting to explore this particular set of work for tips. It was also interesting to hear his critique of questionable results, particularly the repeated mentions of hawthorne effects  because so many interventions are compared to care as usual (rather than an equal-intensity control condition). Another way of phrasing this is to ask at what cost does the intervention work (and/or how do we “adjust for the intensity of the intervention” )

Another category Dr. Masys explored of interest to me was health applications of mobile electronics. Briefly, one study looked at chronic widespread pain … reduced ‘catastophizing’; four looked at text messaging vs telephone appointment reminders; effectiveness of a short message reminder in increased followup compliance; text4baby mobile health program; cameroon mobile phone SMS 9CAMPS) trial (PLoS One)

Dr. Masys then moved on to the practice of clinical informatics and bioinformatics (out of “the world of rcts”). This focused on new methods that might be interesting. I particularly want to follow up on one of the few studies that looked at multiple stakeholders which had the goal of reducing unintended negative consequences; the use of registries to do low cost, very large trials;  the use of a private key derived from dna data being encrypted for encrypting that same data; and the creation of a 2D barcode summarizing patient genetic variants that affect the dose or choice of a drug; and a demonstration that diagnostic accuracy was as good on a tiny mobile phone screen as a big screen.

The last category reviewed by Dr. Masys was editors choice publications from JAMIA; J. of Biomed. Informatics; and the Diane Forsyth award. Almost all of these seem worth reviewing in more depth — particularly the JAMIA articles scientific research in the age of omics (explores the need to increase accountability of scientists for the quality of their research) web-scale pharmacovigilance (used public search engine logs to detect novel drug drug interactions); CPOEs decrease medication errors (a meta study that basically concluded without realizing it that CPOEs would work better if we had only applied basic principals from contextual inquiry!) and the JBI articles by Rothman, who developed a continuous measure of patient condition-predicted hospital re-admission and mortality independent of disease (how does this compare with patient reported health status); Weiskopf (who documented the relative incompleteness of EHR data across charts he studied); Friedman’s overview of NLP state of the art and prospects for significant progress (summary of a workshop); Post’s article on tools for analytics of EHR data; and Valizadegan’s article on learning classification models from multiple experts who may disagree (given my interest in multiple viewpoints).

Next, I attended a panel about Diana Forsyth (obit; some pubs; edited works), an ethnographer who had a big impact on the field of medical informatics (and others as well) … she has passed away, and perhaps only a small number of people read work, but her work had an enormous influence on those people who encountered her writing on methods, research topics, and so on. She was compared by one panelist to Arthur Kleinman (who helped to make the distinction between the abstraction of disease and the human experience of illness; treatment and healing). Some of the most interesting parts of the discussion were focused on how the field is changing over time, prompted by a question of Katie Siek’s — for example getting data into the computer, computers into the hospitals, now making them work for people correctly, and what comes after that? Another interesting comment was about the authority of the physician being in part based on their ability to diagnose (which conveys all sorts of societal benefits). This points to the role of the physician (when diagnosis doesn’t exist human creativity is especially needed) versus IT (which can handle more well defined situations). However with respect to healing, maybe the power of physicians is in listening as much as diagnosing (also something computer’s can’t do, right?). Other topics that came up included the importance of the patient voice and patient empowerment/participation.

After lunch with a friend from high school I attended S66 (User centered design for patients and clinicians). In line with the hopes of the Forsyth panel I saw a mixture of techniques here including qualitative analysis. Unfortunately, what I did not see was technology innovation (something that may point to a different in vocabulary regarding what “user centered design” means). However the qualitative methods seemed strong. One interesting talk explored the issues in information transfer from the hospital to home health care nurses. A nice example of some of the breakdowns that occur between stakeholders in the caregiver community. More and more, however, I find myself wondering why so much of the work here only focuses on caregivers with degrees of some sort in medicine (as opposed to the full ecology of caregivers). I was pleased to see low-income settings represented, exploring the potential of mobile technology to help with reminders to attend appointments and other reminders; and a series of 3 studies on health routines culminating in a mobile snack application (published at PervasiveHealth) by Katie Siek & collaborators. One nice aspect of this project was that the same application had differing interfaces for different stakeholders (e.g. teenagers vs parents).

I started to attend the crowdsourcing session after the break, but it did not appear to have much in terms of actual crowdsourcing. An open area for health informatics? Instead I went on to S71, family health history & health literacy. The most interesting paper in the session, to me, looked at health literacy in low SES communities (by many co-authors including Suzanne Bakken). In particular, they have data from 4500 households which they would like to visualize back to the participants to support increased health literacy. Their exploration of visualization options was very detailed and user centered and resulted in the website GetHealthyHeights.org (which doesn’t seem to be alive at the moment). However I have concerns about the very general set of goals with respect to what they hope people will get out of the visualizations. It would be interesting to explore whether there’s a higher level narrative that can be provided to help with this. Similarly, does it make sense to present “typical” cases rather than specific data.

On Wednesday I began in S86: Late breaking abstracts on machine learning in relation to EMRs. This session had some interesting exploration of tools as well as some patient focused work. One study looked at prediction of mobility improvements for older adults receiving home health care, by subgrouping 270k patients and looking at factors associated with the subgroups. Steps included de-identification; standardize data; accounting for confounding factors; divide into sub groups; and then used data mining to look at factors that affected individual scores and group scores using clustering and pattern mining. An interesting take on what is part of the data “pipeline” that goes beyond some of the things I’ve been thinking are needed for lowering barriers to data science. Another looked at decision support for pre-operative medication management (an interesting problem when I consider some of the difficulties faced by the many doctors coordinating my mother-in-law’s care recently).  This work was heuristic in nature (a surprising amount of work here is still focusing on heuristics over other more statistically based approaches). From this work I noticed another trend however, the need to connect many different types of information together (such as published work on drugs, clinical notes, and patient history).

The last session I attended was S92, one of the few sessions focused specifically on patients (and not very well attended…). The first talk was about creating materials for patient consumption, supporting access to EHRs, 2-way secure messaging, and customized healthcare recommendations. They focused especially on summarizing medication information concisely. The second is about a national network for comparative effectiveness. Maybe this is the crowdsourcing of health IT? This was focus group based research (a surprising popular method across AMIA given how little support there is for this method in HCI) exploring user attitudes about data sharing. Interesting that the work presented here ignored a long history of research in trust in computing e.g. from Cliff Nass, the e-commerce literature, and so on. However, the data was nicely nuanced in exploring a variety of ethical issues and acknowledging the relative sophistication of the group members with respect to these issues. The issues raised are complex — who benefits, who owns the data, how would the bureaucracy function, how to manage authorization given that studies aren’t always known yet (and opt-in vs opt-out approaches). I wonder how a market for research would function (think kickstarter but I donate my data instead of money…). The next paper looked at what predicted people thinking EHR are important both for themselves and their providers, and through a disparities lens.

The closing plenary was given by Mary Czerwinski (pubs) from Microsoft Research. I always enjoy her talks and this was no exception. Her focus was on her work with affective systems, relating to stress management. Her presentation included the a system for giving clinicians feedback about their empathy in consults with patients; as well as a system for giving parents reminders when they were too stressed to remember the key interactions that could help their ADHD kids. Interestingly, in the parent case, (1) the training itself is helpful and (2) the timing is really important — you need to predict a stress situation is building to intervene successfully (I would love to use this at home :). She ended by talking about a project submitted to CHI 2014 that used machine learning to make stress management suggestions based on things people already do (e.g. visit your favorite social network; take a walk; etc). One of the most interesting questions was whether emotional state could predict mistake making in coding data (or other tasks).

Would I go back a second time? Maybe … It is a potentially valuable setting for networking with physicians; the technical work is deep enough to be of interest (though the data sets are not as broad as I’d like to see). It’s a field that seems willing to accept HCI and to grow and change over time. And the people are really great. The publishing model is problematic (high acceptance rates; archival) and I think had an impact on the phase of the 3421work that was presented at times. What was missing from this conference? Crowdsourcing, quantified self research, patient websites like PatientsLikeMe, patient produced data (such as support group posts), significant interactive technology innovation outside the hospital silo. In the end, the trip was definitely worthwhile.

Some observations about people who might be interesting to other HCI professionals interested in healthcare. For example, I noticed that MITRE had a big presence here, perhaps because of their recent federally funded research center. In no particular order here are some people I spoke with and/or heard about while at AMIA 2013:


Patti Brennan (some pubs) is the person who introduced me to or told me about many of the people below, and generally welcomed me to AMIA. She studies health care in the home and takes a multi-stakeholder perspective on this. A breath of fresh air in a conference that has been very focused on things that happen inside the physician/hospital silo.

Bonnie Kaplan is at the center for medical informatics in the Yale school of medicine. Her research focuses on “Ethical, legal, social, and organizational issues involving information technologies in health care, including electronic health and medical records, privacy, and changing roles of patients and clinicians.”

Mike Sanders from www.seekersolutions.com, which is providing support for shared information between nurses, caregivers & patients, based in B.C. (Canada).

Amy Franklin from UT Health Sciences Center, has done qualitative work exploring unplanned decision making using ethnographic methods. Her focus seems to be primarily on caregivers, though the concepts might well transfer to patients.

Dave Kaufman is a cognitive scientist at ASU who studies, among other HCI and health including “conceptual understanding of biomedical information and decision making by lay people.”  His studies of mental models and miscommunication in the context of patient handoff seem particularly relevant to the question of how the multi-stakeholder system involved in dealing with illness functions.

Paul Tang (Palo Alto Medical Foundation) is a national leader in the area of electronic health records and patient-facing access to healthcare information.

Danny Sands (bio; some pubs)– doctor; entrepreneur; founded society for participatory medicines; focus on doctor-patient communication and related tools; studies of ways to improve e.g. patient doctor email communication.

Dave deBronkart (e-patient Dave, who’s primary physician was Dr. Sands during his major encounter with the healthcare system), best summarized in his Ted talk “Let Patients Help” (here’s his blog post on AMIA 2013)George Demiris from University of Washington studies “design and evaluation of home based technologies for older adults and patients with chronic conditions and disabilities, smart homes and ambient assisted living applications and the use of telehealth in home care and hospice.”. His projects seem focused on elders both healthy and sick. One innovative project explored the use of skype to bring homebound patients into the discussions by the hospice team.
Mary Goldstein who worked on temporal vision of patient data (KNAVE-II) and generally “studies innovative methods of implementing evidence-based clinical practice guidelines for quality improvement” including decision support.

Mark Musen studies “mechanisms by which computers can assist in the development of large, electronic biomedical knowledge bases. Emphasis is placed on new methods for the automated generation of computer-based tools that end-users can use to enter knowledge of specific biomedical content.” and has created the Protégé knowledge base framework and ontology editing system.

Carol Friedman does “both basic and applied research in the area of natural language processing, specializing in the medical domain” including creating the MedLEE system (“a general natural language extraction and encoding system for the clinical domain”). Her overview of NLP paper was mentioned in the year in review above.

Suzanne Bakken (pubs) has been doing very interesting work in low income communities around Columbia in which she is particularly interested in communicating the data back to the data producers rather than just focusing on its use for data consumers.Henry Feldman (pubs) who was an IT professional prior to becoming a physician has some very interesting thoughts on open charts, leading to the “Open Notes” project

Bradley Malin (pubs) is a former CMU student focused on privacy who has moved into the health domain who is currently faculty at Vanderbilt. His work provides a welcome and necessary theoretical dive into exactly how private various approaches to de-identifying patient data are. For example, his 2010 JAMIA article showed that “More than 96% of 2800 patients’ records are shown to be uniquely identified by their diagnosis codes with respect to a population of 1.2 million patients.”


Jina Huh
 (pubs) studies social media for health. One of her recent publications looked at health video loggers as a source of social support for patients. She shares an interest with me in integrating clinical perspectives into peer-produced data.
Katie Siek (pubs) who recently joined the faculty at Indiana does a combination of HCI and health research mostly focusing on pervasive computing technologies. One presentation by her group at AMIA this year focused on a mobile snacking advice application that presented different views to different stakeholders.
Madhu Reddy (some pubs) trained at UC Irvine under Paul Dourish and Wanda Pratt and brings a qualitative perspective to AMIA (he was on the Diana Forsyth panel for instance). He studies “collaboration and coordination in healthcare settings”
Kathy Kim who spoke in the last session I attended about her investigations of patient views on a large data sharing network to support research, but also does work that is very patient centered (e.g. mobile platforms for youth).
Steve Downs who works in decision support as well as policy around “how families and society value health outcomes in children”
Chris Gibbons (some pubs) who focuses on health disparity (e.g. barriers to inclusion in clinical trials and the potential of eHealth systems).

Data Collection & Analytics Tools?

I have become fascinated recently with the question of the role that data has in supporting analysis, action, and reflection. Actually, it would be more accurate to say that I’ve become aware recently that this is an intrinsic driver in much of the work I do, and thus it has become something I want to reflect on more directly. In this post, I want to explore some of the tools others have already built that might support analytics, machine learning, and so on. If you know of something I’ve missed, feel free to share it in the comments! So, in no particular order:

  • Hazy  provides a small handful of key primitives for data analysis. These include Victor, which “uses RDBMS to solve a large class of statistical data analysis problems (supervised machine learning using incremental gradiant algorithms) and WisCi (/ DeepDive, it’s successor), which is “an encylopedia powered by machines, for the people. ” RapidMiner is a similar tool that has been used by thousands. It is open source and supports data analysis and mining.
  • Protégé is “a suite of tools to construct domain models and knowledge-based applications with ontologies” including visualization and manipulation
  • NELL learns over time from the web. It has been running since 2010 and has “accumulated over 50 million candidate beliefs.”  A similar system is
  • Ohmage and Ushahidi are open source citizen sensing platforms (think citizen based data collection). Both support mobile and web based data entry. This stands in contrast to things like Mechanical Turk which is a for-pay service, and games and other dual-impact systems like PeekaBoom (von Ahn et al.) which can label objects in an image using crowd labor, or systems like Kylin (Hoffmann et al.) which simultaneously accelerates community content creation and information extraction.
  • WEKA and LightSide support GUI based machine learning (WEKA requires some expertise and comes with a whole textbook, while LightSide is built on WEKA but simplifies aspects of it, and specializes in mining textual data). For more text mining support, check out Coh-Metrix, which “calculates the coherence of texts on a wide range of measures. It replaces common readability formulas by applying the latest in computational linguistics and linking this to the latest research in psycholinguistics.” Similarly, LIWC, which supports linguistic analysis (not free) by providing a dictionary and a way to compare to that dictionary to analyze the presence of 70 language dimensions in a new text from negative emotions to casual words.

Deployed tools research and products aside, there is also a bunch of research in this area, ranging from early work such as aCappela (Dey et al.), Screen Crayons (Olsen, et al.). More recently, Gestalt (Patel et al.“allows developers to implement a classification pipeline” and Kietz et al. use an analysis of RapidMiner’s many data analysis traces to automatically predict optimal KDD-Workflows.

Luis von Ahn, Ruoran Liu and Manuel Blum Peekaboom: A Game for Locating Objects in Images In ACM CHI 2006

Hoffmann, R., Amershi, S., Patel, K., Wu, F., Fogarty, J., & Weld, D. S. (2009, April). Amplifying community content creation with mixed initiative information extraction. In Proceedings of the 27th international conference on Human factors in computing systems (pp. 1849-1858). ACM.

Dey, A. K., Hamid, R., Beckmann, C., Li, I., & Hsu, D. (2004, April). a CAPpella: programming by demonstration of context-aware applications. InProceedings of the SIGCHI conference on Human factors in computing systems (pp. 33-40). ACM.

Olsen Jr, Dan R., Trent Taufer, and Jerry Alan Fails. “ScreenCrayons: annotating anything.” Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, 2004.

Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, James A. Landay: Gestalt: integrated support for implementation and analysis in machine learning. UIST 2010: 37-46

Kietz et al. (2012). Designing KDD-Workflows via HTN-Planning, 1–2. doi:10.3233/978-1-61499-098-7-1011

Search and Rescue and Probability Theory

A man and a dog together belaying down a rock face
Canine Search and Rescue (photo from AMRG website)

I spent a fascinating evening with the Allegheny Mountain Rescue Group today. This is a well run organization that provides free help for search and rescue efforts in the Pittsburgh area and beyond. I was in attendance because my kids and I were looking for a way to give Gryffin (our new puppy) a job in life beyond “pet” and we love to work in the outdoors. Canine search and rescue sounded like a fascinating way to do this and we wanted to learn more. During the meeting, I discovered a team of well-organized, highly trained, passionate and committed individuals that has a healthy influx of new people interested in taking part and a strong core of experienced people who help to run things. The discussions of recent rescues were at times heart rending, and very inspiring.

Later in the evening during a rope training session I started asking questions and soon learned much more about how a search operates. I discovered that about a third of searches end in mystery. Of those for which the outcome is known, there is about an even split between finding people who are injured, fine, or have died. Searches often involve multiple organizations simultaneously, and it is actually preferable to create teams that mix people from different search organizations rather than having a team that always works together. Some searches may involve volunteers as well. A large search may have as many as 500 volunteers, and if the target of the search may still be alive, it may go day and night. Searches can last for days. And this is what led me to one of the most unexpected facts of the evening.

I asked: How do you know when a search is over? And the answer I got was that a combination of statistics and modeling is used to decide this in a fairly sophisticated fashion. A search is broken up into multiple segments, and a probability is associated with each segment (that the person who is lost is in a segment). When a segment is searched, the type of search (human only, canine, helicopter, etc.) and locations searched, along with a field report containing details that may not be available otherwise are used to update the probability that a person is in that segment (but was missed) or absent from that segment. Finally, these probabilities are combined using a spreadsheet(s?) to help support decision making about whether (and how) to proceed. According to the people I was speaking with, a lot of this work is done by hand because it is faster than entering data in and dealing with more sophisticated GIS systems (though typically a computer is available at the search’s base, which may be something like a trailer with a generator). GPS systems may be used as well to help searchers with location information and/or track dogs.

Some of the challenges mentioned are the presence of conflicting information, the variability in how reliable different human searchers are, the fact that terrain may not be flat or easily represented in two dimensions, the speed of computer modeling, the difficulty of producing exact estimates of how different searchers affect the probability of finding someone and the variable skill levels of searchers (and the need to organize large numbers of searchers, at times partly untrained). When I raised the possibility of finding technology donations such as more GPS systems, I was also told that it is critical that any technology, especially technology intended for use in the field, be ultra simple to use (there is no time to mess with it), and consistent (i.e. searchers can all be trained once on the same thing).

Although this blog post summarizes what was really just a brief (maybe hour long) conversation with two people, the conversation had me thinking about research opportunities. The need for good human centered design is clear here, as is the value of being able to provide technology that can support probabilistic analysis and decision making. Although it sounds like they are not in use currently, predictive models could be applicable, and apparently a fair amount of data is gathered about each search (and searches are relatively frequent). Certainly visualization opportunities exist as well. Indeed, a recent VAST publication (Malik et al., 2011) looked specifically at visual analytics and its role in maritime resource allocation (across multiple search and rescue operations).

But the thing that especially caught my attention is the need to handle uncertain information in the face of both ignorance and conflict. I have been reading recently about Dempster-Shafer theory, which is useful when fusing multiple sources of data that may not be easily modeled with standar probabilities. Dempster-Shafer theory assigns a probability mass to each piece of evidence, and is able to explicitly model ignorance. It is best interpreted as producing information about the provability of a hypothesis, which means that at times it may produce a high certainty for something that is unlikely (but more provable than the alternatives). For example, suppose two people disagree about something (which disease someone has, for instance), but share a small point of agreement (perhaps both people have a low-confidence hypothesis that the problem is a brain tumor) that is highly improbable from the perspective of both individuals (one of whom suspects migraines, the other a concussion).  That small overlap will be the most certain outcome of combining their belief models in Dempster-Shafer theory, so a brain tumor, although both doctors agree it is unlikely, would be considered most probable by the theory.

One next obvious step would be to do some qualitative research and get a better picture of what really goes on in a search and rescue operation. Another possible step would be to collect a data set from one or more willing organizations (assuming the data exists) and explore algorithms that could aid decision making or make predictions using the existing data. Or then again, one could start by collecting GPS devices (I am suer some of you out there must have some sitting in a box that you could donate) and explore whether there are enough similar ones (android + google maps?) to meet the constraint of easy use and trainability. I don’t know yet whether I will pick up this challenge, but I certainly hope I have the opportunity to. It is a fascinating, meaningful, and technically interesting opportunity.

Malik, A., Maciejewski, R., Maule, B., & Ebert, D. S. (2011). A visual analytics process for maritime resource allocation and risk assessment. In the 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 221-230.

Why study the future?

I have asked myself that question numerous times over the last several years. Why years? Because the paper that I will be presenting at CHI 2013 (Looking Past Yesterday’s Tomorrow: Using Futures Studies Methods to Extend the Research Horizon) is the 5th iteration of an idea that began at CHI 2009, was submitted in its initial form to CHI 2011 and 2012, then DIS 2012, Ubicomp 2012, and finally CHI 2013 (and, I think, winner for most iterations of one paper I’ve ever submitted). Each submission sparked long and informative reviews and led to major revisions (in one case even a new study), excepting the last one (which was accepted).

I am telling this story for two reasons. First, I want to explore what drove me to lead this effort despite the difficulty of succeeding. Second, I want to explore what I learned from the process that might help others publishing difficult papers. Continue reading Why study the future?

Things we wish we’d know when planning a trip to Zürich

Welcome to Zürich ☺. If you are moving here there are a few things you might want to know, in no particular order :). This is especially geared towards folks living in ETH university housing.

Paperwork

(most of these things take around 30 mins – 60 mins once you find the right place)

  • To register, you will need to go to Kreisburo 6 first, then Berninerplatz (a stop on the 10 tram, the Kreisburo will give you an appointment there)
  • To leave, you will need to go to the migrationsamt (in the city hall, next to the fraumünster church) and register to leave
  • To get a half-pass (half off all tram and train travel) you can go to Bellevue (on the 9) and go into the office in the building that’s right at the center of the stop OR go to the main train station and go into the “travel agency” (take a ticket and be prepared to wait a bit
  • To get a monthly pass (free travel all month long) with your half pass you can go to the train station or use any of the newer (“fancy”) electronic machines (like the one at the Winkleriedstr. Tram stop). The “fancy” machines have an English button. There is no “fancy” machine at the airport, so don’t expect to renew your monthly pass at the end of a trip back and forth to Zürich

Money & Phones

  • I would get a Bank account at the post finance (in the post office, if you walk from Winkleried str Tram stop to Rigiblick Tram stop, you’ll see it on the left).
  • When you get bills, you typically get a “pink slip” – bring it with some cash to the post office, and you can pay it there.
  • The post office is closed over lunch
  • ETH pay can be picked up between 11 and 2 in the back right corner of the 2nd floor (I think) of the main ETH building
  • Sunrise pre-pay is the simplest mobile phone plan. You can “top it up” at any Co-op grocery store (just ask for, say “50 CHF on my sunrise)
  • Sunrise pre-pay charges you 1 CHF on each day that you make a call, text or use internet (up to 3 CHF per day total). You can get an add-on plan for unlimited internet if you use it a lot, for about 10 CHF a month.

Shopping

We’re not big shoppers, so this is just the basics.

  • H&M has reasonable clothing. The big Co-op and Migro stores have inexpensive clothing options too. There are also lots of sales in the “mall” under the main train station
  • There are a number of farmer’s markets worth checking out.

Things to check out that you might otherwise miss

  • Feminist Zürich: The labrynth and Feminist Tours of Zürich
  • The rooftop swimming pool & spa (“Thermalbad Zürich”)
  • Dolder ice rink 
  • Swimming in the clear cool clean lake of Zürich (‘nough said)
  • Tour the archeological ruins of Zürich (register at City Hall to get a “key to the city” and a map). Takes time to get the key, so this is really only for folks living here.
  • Lots of wonderful places to walk in the Züriberg (Look for the life-sized elephant fountain in the woods) and the Jütliberg. Enjoy them.
  • There’s lots of festivals in Zürich and Switzerland worth checking out. Basel Fasnacht in the spring, independence day parade in mid August, etc. etc. Google to find them. Don’t necessarily confine yourself to Switzerland – for example Austria has numerous “balls” in dance season (winter).

English speakers

  • The expat forums are a great place to find advice about all sorts of stuff
  • There’s some great meetup groups for childless expats – they do all sorts of sporty stuff in the mountains, if you’re into that. They tend to hold separate from the swiss
  • If you prefer to mix with the locals, try a yoga class, join an orchestra, etc. Downside is you have to speak some german and it helps if you’re working on understanding swiss german.
  • The ETH has a tandem-partner program. You can sign up to practice german and offer to help someone with English. We had great experiences with it. They also offer German classes (1x week)
  • If you have kids, the public school has an amazing program for helping them to learn german before shifting them to “regular” school. The teachers are wonderful and for my kids at least, the class worked wonders. Just register with the school system.
  • Be prepared for younger (even 1-3 grade) kids being done with school at noon two days a week or more, and having no school from 12-2. Don’t worry though, Hort will feed them a warm meal and let them play/do crafts during lunch, and as late as you need on weekdays.

Doctors

There’s an English speaking doctor’s office that has long hours at the main train station. There’s also a 24 hour pharmacy there. You should receive accident insurance through ETH, and you know better than I where you get your health insurance.

Garbage

  •  For ETH folks, you just buy regular garbage bags. For everyone else, there’s special taxed bags
  • Recycling: plastic goes inside the co-op in their wall collection unit. Metal and glass you can find bins for around the city 2-4 times a year (unsure how often) you will find a garbage bag in your mailbox for clothing and shoes. Anything else of quality, if you put it outside, someone is likely to take it.

Other

  • The climate makes gardening easy. The abundance of green space also makes foxes quite common. As a result, you can’t eat greens raw: they can leave a parasite on plants that is deadly in the rare case you catch it.
  • There is a community farm that you can help out at near the botanical gardens, if you want more than that. I’m sure there’s other options if you want an actual garden bed, but a year is short.
  • We were able to get permission to garden in the non-grassy areas of our yard.
  • We went to a Tot Shabbat service at a local liberal temple, the Jüdische Liberale Gemeinde. It’s a bit out of the way in what looks like an apartment building, but the people we met were wonderful and very welcoming. Be prepared for swiss german though :).

Have fun!

Public school in Switzerland

During our time in Switzerland, the children attended public school. An important goal for us was that they would learn the language, and the public school system supported this. The school the children were assigned to was 1.5 km from our home, and we generally walked or took the bus (sometimes the children went the whole way on their own toward the end).

The athletic area and my son's school building
The athletic area and my son’s school building

My son was placed in a class for second language speakers. His class had an ever-changing group of about eight children, and two main teachers. The class curriculum was ungraded and tailored to the children, who were moved into mainstream classes as they learned german (if they were staying in Switzerland longer-term). These were all grade-school children, so in addition to main lesson, they had handwork, music, and swimming classes with other teachers, along with special times for gym, art, language and mathematics.

My daughter was placed into a Kindergarten class where a mixture of swiss german and german was spoken. Her main teacher, who had taught at a Steiner school for 22 years before switching to a public Kindergarten, was a warm hearted and loving woman who connected well with my daughter and supported her love of creative play as well as craft work.  One day a week was spent in the woods (the entire morning) playing and cooking over a campfire. Other days were spent in the classroom and back yard. One afternoon a week, my daughter had German class along with other older students who were new to German.

School included a 1.5 hour lunch break, and ended at noon two days a week for my son. My daughter was done at noon three days a week and had the same lunch break. If parents worked during those hours, children could attend “Hort” — a sort of daycare with a kitchen (hot food is an expected part of a healthy lunch). We were skeptical about Hort at first, but my son in particular grew to love the free play and delicious food it provided, and both children often came home with crafts or stories from Hort.

Some things that stood out about the childrens’ experience in school, besides the overall quality of the education, were:

  • Dedicated teachers who educated in a way we loved (the principal, one of my son’s teachers, and my daughter’s teacher all had experience with Steiner education for example, one of my son’s teachers was also trained in art therapy)
  • Very high quality facilities (the school had its own swimming pool, for example, with a moveable floor!)
  • Recorder Concert
    The Block-Flöte player at the concert

    Quality was important throughout. At the end of year picnic, there was a small concert. The school had arranged for a world-class Block-Flöte player to perform in a fairy tale retelling. The music was incredible, and the children ate up every note (and every word).

  • School in switzerland is clearly organized on the principal that education isn’t just about getting as much information into children as possible as quickly as possible. There seemed to be 1-2 short weeks (or whole weeks) free of school every month we were there. Overall, the Swiss do not seem to worry about the children spending hours learning each day. Between half days (every week) and frequent vacations, it is also set up for families with a parent who works part time or not at all.
  • The children learned German incredibly fast (I was reading and retelling portions of Harry Potter to my son all in german within six weeks).

Upshot? I can highly recommend public school as an option for visiting families in Switzerland.

My chronic illness and academia

A friend happened to send an interesting article  about chronic illness and academia my way today, and it made me realize that a post on the topic is long past due.

For those of you don’t know, I have Lyme disease. I was diagnosed in October 2007 (pre-tenure), but had been ill for at least a year before that. I blog about my Lyme disease at http://gotlyme.wordpress.com (inventive title, I know :)). A special section of the blog focuses on work and illness. Lyme disease was debilitating for me, and I was on disability (part time) during part of my treatment. I am now much better, but still have relapses about once a year and bad days more often. Prior to my experiences with Lyme disease, I had a very difficult repetitive strain injury that also caused severe impairment (typing at most 30 minutes a day at first, 2 hours a few years later). This occurred at the start of my PhD and lasted into the beginning of my first faculty position.

I don’t want to go into details about those illnesses here, rather I’d like to speak about the relationship between having an invisible chronic illness and being an academic. I have had more than one person approach me asking for advice and guidance as they deal with their own illness, usually by word of mouth. Illnesses like mine (and many other chronic illnesses) are mostly invisible, and it can be difficult to find information about how to cope with them. When other academics disclose their own struggles and process (such as Elyn Saks’ article about working with schizophrenia, or Gerry Gold’s article on the social context of long term disability as an option for those who are considering leaving work because of their illness) it can be eye opening and inspiring for those of us trying to find our own way forward. Equally important is access to facts, such as the AAUP’s report on how faculty members with disabilities should be accommodated.

For me personally, one of the biggest changes I’ve had to make (and most positive) has related to my  approach to managing my time. Time management has been a theme since early in my graduate school career and I have learned never to take time for granted, how to prioritize, and when to cut back. I discuss some of the things I learned on my Lyme blog:  managing an unpredictable illness, and trying to manage a full time job on a half time schedule.

Another big challenge has been disclosure. It took a great deal of time for me to value the label “disabled” as a valid description of myself, and I have always been sensitive about it not only personally, but also with respect to my work. I never filed anything official with the university during my graduate school experience. I did not speak of my impairment during my job search until I reached the negotiation stage. I did however confront our dean at the time when I thought he wasn’t doing enough to prevent others from suffering the same preventable injury I had.  At CMU, I spoke with my department chair as soon as I had a diagnosis. However, I waited to file anything official until I desperately needed a parking permit, and one close to my building. Even then I accepted second best up until the day when I almost collapsed trying to get from that spot to my office. Finally fired up, I marched (well hobbled with my cane) straight into the deans office and demanded something better.

Although I have disclosed much of my experience at this point and freely speak of it when it seems pertinent, there is still one area that I rarely discuss: the cognitive impacts of my disability.  At my worst, I would sometimes spend more hours each week in fog and pain than out. Thanks to my very supportive husband, I would grab my computer and do essential thought-work whenever the clouds cleared and he’d take the kids so I could do so. Even so I read one journal paper (submitted before my diagnosis) in horror when the proof came back a year later. I have experienced moments in meetings with students when I could not find the words to express my thoughts; been reminded that I just said the opposite of what I meant, and experienced large black spots in my memory. These moments (though thankfully mostly behind me) were experienced as fearful signs of the possibility that I might not be able to continue as an academic. Most difficult is when they occur in public contexts, such as the difficulties I had at a talk I gave a few years back, and the program committee meeting that led to a blog post on the difficulties of re-integrating into academia after a long pause.

But what has defined the positive side of my experience, more than anything, is the support of those around me. I will never forget hearing that my advisor almost got into a fight defending the truth of my claims that I could not type. Or the day that an angry stranger mumbled about supporting the disabled as he opened a door for me when the push button failed and my hands lacked the strength (this led to my eventual acceptance of the label disabled and a related lifelong interest in assistive technology research). I have had long phone conversations with a colleague in our field who experienced chronic fatigue. I have been given a role as a collaborator when I sorely needed it. Instant Message chats galore have enabled remote and close colleagues to help me work through difficult patches and decide strategy. I have been driven home countless time by a close friend when I ended my day too weak to bike or walk, and been given writing aids (as a graduate student with RSI) and more recently teaching leaves, co-instructors, extra TA support, classes scheduled around my disability, and control over my tenure clock.

If you experience a chronic illness, I encourage you to go after the support you deserve, accept help, and seek advice. Drop me a note any time. Check out the Chronic Illness and Academia forum at the Chronicle of Higher Education. Talk to the people you trust, and get advice about when to say more. Stand up for yourself when you need it, and if people are not supportive, find other friends. Above all, know that it is possible to be both disabled and an academic, if that is the path you choose.

Learning languages

I’ve mentioned before that one of my sabbatical goals was to learn a new language (Hindi). I am not fluent, but I think I came a fair way with it, and I want to comment on the role of different technologies and approaches in our successes (and failures) as a family to learn the three languages that we tackled on this trip.

One of the most useful technologies we employed was the Rosetta Stone software. The kids loved Rosetta Stone, which we started using almost as soon as the sabbatical was approved to get them familiar with Hindi. They spent about 30 minutes at a time on it at the beginning. At our peak, this happened almost daily (after we left Pittsburgh but before we were settled in India. Eventually we hired a tutor (a wonderful friend now) to come for about an hour most days instead. The kids were far more resistant to being tutored than they were to using the software, but I feel we covered much more ground in those hours. We made up all sorts of games, retold fairy tales, played shop, and generally did our best to make it child friendly.

Hindi was a relatively hard language to learn (new alphabet, different sentence structures, and so on). Once we got past the vocabulary phase,  progress was slowish. Still, by the end of the fall we could have whole conversations in Hindi as a family. The kids were not alone in learning the language: Anind and I were trying very hard to learn it as well and we tried to speak it at meals, with our Indian driver, and so on. So between the tutoring and the daily practice opportunities, they used Rosetta Stone less and less.

The Rosetta Stone was not a pure success. It required the right context to be used — enough motivation, and not too much other support. We almost never used the German Rosetta Stone I bought, and of course the kids are far more fluent (they are immersed, unlike with Hindi, and it is a much easier language for them to learn). Use of Rosetta Stone is rare at this point, and mostly me.

You get free tutoring through the online package with Rosetta Stone, along with access to online games. The games are a fun way to practice but slow. When possible, I sign up and have a session with an online tutor. It is based on the material I’m currently covering in the software. However, they only let the kids do it when there’s no other remote participants, which is sometimes hard to find at popular times in the early stages of learning a language.

As a computer scientist, I cannot help but be impressed by the software. It is dedicated to learning language through immersion, and the authors have done an excellent job of maintaining that throughout the software and the tutoring sessions. It uses speech recognition to check pronunciation, and provides multi-media support for learning. And it works, if you put the time in with it, you learn. To my mind, it’s a success as an educational tool and an interactive tool. It supposedly has a social side as well, though as a Hindi learner I was one of few and could not take advantage of it. I’d be curious to see what it’s like.

It has always seemed such a shame to me, that learning multiple languages is not a norm in the United States. During our travels we met 10 year olds who spoke 5 or 6 languages, all with fluent ease. They never needed to touch a piece of software or a tutor. The world has become so small, yet so many of us in the United States fail to give our children the gift of understanding and the mastery of complexity that comes with learning multiple languages. Most of my swiss cousins have raised children who are bi- or tri- lingual, and without the errors that plague even my immersed children.  There is no substitute for that level of early exposure.

More technology?

I just came across a call to arms by Kristina Höök, “A Cry for More Tech at CHI!” in Interactions this month. I was so glad to see her writing about this and I hope the article bears fruit. She talks about the ways in which technology can inspire design — I would argue also enable design — and why alternate forms of publications should be given archival value as one way of supporting tech research (such as videos, demos, etc). Imagine if videos at CHI were valued as much as videos at SIGGRAPH!

I don’t want to say much more because I hope you go and read what she had to say, but I will note that her sentiment doesn’t just apply to CHI. How about other conferences? And a question I’ve been asking myself recently — what can I do to encourage more tech in my own research group? The answer, I’ve discovered, is to uncover what technology research means to me. That I will say more about.

As I mentioned in a recent post about my sabbatical goals, I have spent some time recently trying to reposition myself. I am and will always be driven by real-world problems, usually ones that come out of personal experience. However, if that is the sole driving force in my work, why am I not a sociologist? Or a politician? Or a anthropologist? Why don’t I run a non-profit? Why am I a computer scientist? The answer that I keep coming around to is that I like to build re-usable solutions to problems, solutions that are (ideally) bigger than the problem I started with. In addition, I believe that technology has value in part because it can solve problems in new ways, sometimes better ways, if we are innovative about how we use it, rigorous, and willing to push the boundary of what technology is capable of. So I am also driven by a wish to build systems, hard systems, systems that do things that have not been done before or create new ways of doing things. In fact, when I look back over the technical work I’ve done, after years of trying (for every job search, tenure review, and so on) I think I finally have put my finger on the unifying theme in my work.

I have always, in some way or another, built what I think of now as data-driven interfaces. I’m certainly not the first person to use this term. Nor is it confined to my area (HCI). But to me it describes one of the most important roles that technology has to play in the world. Many of the most revolutionary impacts of technology have centered around its ability to show information (think of spreadsheets and visualization tools), share information (think of everything from email to the web to facebook) and process information (think of the work in context-aware and ubiquitous computing, machine learning, and so on). And my own inspiration has been similar — my honors thesis as an undergraduate centered around exposing what was going on inside of programs; my PhD work on managing the uncertainty that arises when recognizing sensed input; my PhD students have developed/are developing tools for building ambient and peripheral displays (a form of visualization), rapidly prototyping and field testing ubicomp apps (providing essential data about what information belongs in the application in the first place), measuring and predicting which users have difficulty with an interface (a type of information processing), handling uncertain input within the user interface toolkit, and sharing information about energy use in the home. Of course when you have a hammer, everything looks like a nail (and there’s many aspects of each of these projects that I left out so they would look more nail-shaped) but I do believe there’s a theme here.

Having recognized the theme, a natural question that I, as a toolkit builder, find myself asking is: What is hard about building these sorts of interfaces, especially in situations where we expect people to use the resulting applications. I think there are many unsolved problems, along the entire pipeline from deciding what data to use all the way through to acting on it in some way. Ideally, this should all be put together, in a fashion that scaffolds the process as much as possible and enables communication of constraints and other information from one end of the pipeline to other and back.

The idea of a pipeline for data-driven, interactive systems leads to a whole host of interesting questions I hope to begin answering. What should be communicated within the system? What should be communicated to the user? How does all of this change the way we construct toolkits at the input level? What about the output level? What abilities might we want to give end users with respect to data-driven interfaces? How do we help people build effective classifiers when they have not studied machine learning? How do we help people to select and integrate visualization techniques? What new sensors can we construct and what should we sense? When and how might we involve people (i.e. the crowd) in gathering, labeling, extracting features from, interpreting, even visualizing information? How do we trade this off with machines? And finally, how does the interactive nature of the end systems affect the way we should answer any of these questions?

 

Jennifer Mankoff | University of Washington