Data Collection & Analytics Tools?

I have become fascinated recently with the question of the role that data has in supporting analysis, action, and reflection. Actually, it would be more accurate to say that I’ve become aware recently that this is an intrinsic driver in much of the work I do, and thus it has become something I want to reflect on more directly. In this post, I want to explore some of the tools others have already built that might support analytics, machine learning, and so on. If you know of something I’ve missed, feel free to share it in the comments! So, in no particular order:

  • Hazy  provides a small handful of key primitives for data analysis. These include Victor, which “uses RDBMS to solve a large class of statistical data analysis problems (supervised machine learning using incremental gradiant algorithms) and WisCi (/ DeepDive, it’s successor), which is “an encylopedia powered by machines, for the people. ” RapidMiner is a similar tool that has been used by thousands. It is open source and supports data analysis and mining.
  • Protégé is “a suite of tools to construct domain models and knowledge-based applications with ontologies” including visualization and manipulation
  • NELL learns over time from the web. It has been running since 2010 and has “accumulated over 50 million candidate beliefs.”  A similar system is
  • Ohmage and Ushahidi are open source citizen sensing platforms (think citizen based data collection). Both support mobile and web based data entry. This stands in contrast to things like Mechanical Turk which is a for-pay service, and games and other dual-impact systems like PeekaBoom (von Ahn et al.) which can label objects in an image using crowd labor, or systems like Kylin (Hoffmann et al.) which simultaneously accelerates community content creation and information extraction.
  • WEKA and LightSide support GUI based machine learning (WEKA requires some expertise and comes with a whole textbook, while LightSide is built on WEKA but simplifies aspects of it, and specializes in mining textual data). For more text mining support, check out Coh-Metrix, which “calculates the coherence of texts on a wide range of measures. It replaces common readability formulas by applying the latest in computational linguistics and linking this to the latest research in psycholinguistics.” Similarly, LIWC, which supports linguistic analysis (not free) by providing a dictionary and a way to compare to that dictionary to analyze the presence of 70 language dimensions in a new text from negative emotions to casual words.

Deployed tools research and products aside, there is also a bunch of research in this area, ranging from early work such as aCappela (Dey et al.), Screen Crayons (Olsen, et al.). More recently, Gestalt (Patel et al.“allows developers to implement a classification pipeline” and Kietz et al. use an analysis of RapidMiner’s many data analysis traces to automatically predict optimal KDD-Workflows.

Luis von Ahn, Ruoran Liu and Manuel Blum Peekaboom: A Game for Locating Objects in Images In ACM CHI 2006

Hoffmann, R., Amershi, S., Patel, K., Wu, F., Fogarty, J., & Weld, D. S. (2009, April). Amplifying community content creation with mixed initiative information extraction. In Proceedings of the 27th international conference on Human factors in computing systems (pp. 1849-1858). ACM.

Dey, A. K., Hamid, R., Beckmann, C., Li, I., & Hsu, D. (2004, April). a CAPpella: programming by demonstration of context-aware applications. InProceedings of the SIGCHI conference on Human factors in computing systems (pp. 33-40). ACM.

Olsen Jr, Dan R., Trent Taufer, and Jerry Alan Fails. “ScreenCrayons: annotating anything.” Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, 2004.

Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, James A. Landay: Gestalt: integrated support for implementation and analysis in machine learning. UIST 2010: 37-46

Kietz et al. (2012). Designing KDD-Workflows via HTN-Planning, 1–2. doi:10.3233/978-1-61499-098-7-1011

Search and Rescue and Probability Theory

A man and a dog together belaying down a rock face
Canine Search and Rescue (photo from AMRG website)

I spent a fascinating evening with the Allegheny Mountain Rescue Group today. This is a well run organization that provides free help for search and rescue efforts in the Pittsburgh area and beyond. I was in attendance because my kids and I were looking for a way to give Gryffin (our new puppy) a job in life beyond “pet” and we love to work in the outdoors. Canine search and rescue sounded like a fascinating way to do this and we wanted to learn more. During the meeting, I discovered a team of well-organized, highly trained, passionate and committed individuals that has a healthy influx of new people interested in taking part and a strong core of experienced people who help to run things. The discussions of recent rescues were at times heart rending, and very inspiring.

Later in the evening during a rope training session I started asking questions and soon learned much more about how a search operates. I discovered that about a third of searches end in mystery. Of those for which the outcome is known, there is about an even split between finding people who are injured, fine, or have died. Searches often involve multiple organizations simultaneously, and it is actually preferable to create teams that mix people from different search organizations rather than having a team that always works together. Some searches may involve volunteers as well. A large search may have as many as 500 volunteers, and if the target of the search may still be alive, it may go day and night. Searches can last for days. And this is what led me to one of the most unexpected facts of the evening.

I asked: How do you know when a search is over? And the answer I got was that a combination of statistics and modeling is used to decide this in a fairly sophisticated fashion. A search is broken up into multiple segments, and a probability is associated with each segment (that the person who is lost is in a segment). When a segment is searched, the type of search (human only, canine, helicopter, etc.) and locations searched, along with a field report containing details that may not be available otherwise are used to update the probability that a person is in that segment (but was missed) or absent from that segment. Finally, these probabilities are combined using a spreadsheet(s?) to help support decision making about whether (and how) to proceed. According to the people I was speaking with, a lot of this work is done by hand because it is faster than entering data in and dealing with more sophisticated GIS systems (though typically a computer is available at the search’s base, which may be something like a trailer with a generator). GPS systems may be used as well to help searchers with location information and/or track dogs.

Some of the challenges mentioned are the presence of conflicting information, the variability in how reliable different human searchers are, the fact that terrain may not be flat or easily represented in two dimensions, the speed of computer modeling, the difficulty of producing exact estimates of how different searchers affect the probability of finding someone and the variable skill levels of searchers (and the need to organize large numbers of searchers, at times partly untrained). When I raised the possibility of finding technology donations such as more GPS systems, I was also told that it is critical that any technology, especially technology intended for use in the field, be ultra simple to use (there is no time to mess with it), and consistent (i.e. searchers can all be trained once on the same thing).

Although this blog post summarizes what was really just a brief (maybe hour long) conversation with two people, the conversation had me thinking about research opportunities. The need for good human centered design is clear here, as is the value of being able to provide technology that can support probabilistic analysis and decision making. Although it sounds like they are not in use currently, predictive models could be applicable, and apparently a fair amount of data is gathered about each search (and searches are relatively frequent). Certainly visualization opportunities exist as well. Indeed, a recent VAST publication (Malik et al., 2011) looked specifically at visual analytics and its role in maritime resource allocation (across multiple search and rescue operations).

But the thing that especially caught my attention is the need to handle uncertain information in the face of both ignorance and conflict. I have been reading recently about Dempster-Shafer theory, which is useful when fusing multiple sources of data that may not be easily modeled with standar probabilities. Dempster-Shafer theory assigns a probability mass to each piece of evidence, and is able to explicitly model ignorance. It is best interpreted as producing information about the provability of a hypothesis, which means that at times it may produce a high certainty for something that is unlikely (but more provable than the alternatives). For example, suppose two people disagree about something (which disease someone has, for instance), but share a small point of agreement (perhaps both people have a low-confidence hypothesis that the problem is a brain tumor) that is highly improbable from the perspective of both individuals (one of whom suspects migraines, the other a concussion).  That small overlap will be the most certain outcome of combining their belief models in Dempster-Shafer theory, so a brain tumor, although both doctors agree it is unlikely, would be considered most probable by the theory.

One next obvious step would be to do some qualitative research and get a better picture of what really goes on in a search and rescue operation. Another possible step would be to collect a data set from one or more willing organizations (assuming the data exists) and explore algorithms that could aid decision making or make predictions using the existing data. Or then again, one could start by collecting GPS devices (I am suer some of you out there must have some sitting in a box that you could donate) and explore whether there are enough similar ones (android + google maps?) to meet the constraint of easy use and trainability. I don’t know yet whether I will pick up this challenge, but I certainly hope I have the opportunity to. It is a fascinating, meaningful, and technically interesting opportunity.

Malik, A., Maciejewski, R., Maule, B., & Ebert, D. S. (2011). A visual analytics process for maritime resource allocation and risk assessment. In the 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 221-230.

Why study the future?

I have asked myself that question numerous times over the last several years. Why years? Because the paper that I will be presenting at CHI 2013 (Looking Past Yesterday’s Tomorrow: Using Futures Studies Methods to Extend the Research Horizon) is the 5th iteration of an idea that began at CHI 2009, was submitted in its initial form to CHI 2011 and 2012, then DIS 2012, Ubicomp 2012, and finally CHI 2013 (and, I think, winner for most iterations of one paper I’ve ever submitted). Each submission sparked long and informative reviews and led to major revisions (in one case even a new study), excepting the last one (which was accepted).

I am telling this story for two reasons. First, I want to explore what drove me to lead this effort despite the difficulty of succeeding. Second, I want to explore what I learned from the process that might help others publishing difficult papers. Continue reading “Why study the future?”

Things we wish we’d know when planning a trip to Zürich

Welcome to Zürich ☺. If you are moving here there are a few things you might want to know, in no particular order :). This is especially geared towards folks living in ETH university housing.

Paperwork

(most of these things take around 30 mins – 60 mins once you find the right place)

  • To register, you will need to go to Kreisburo 6 first, then Berninerplatz (a stop on the 10 tram, the Kreisburo will give you an appointment there)
  • To leave, you will need to go to the migrationsamt (in the city hall, next to the fraumünster church) and register to leave
  • To get a half-pass (half off all tram and train travel) you can go to Bellevue (on the 9) and go into the office in the building that’s right at the center of the stop OR go to the main train station and go into the “travel agency” (take a ticket and be prepared to wait a bit
  • To get a monthly pass (free travel all month long) with your half pass you can go to the train station or use any of the newer (“fancy”) electronic machines (like the one at the Winkleriedstr. Tram stop). The “fancy” machines have an English button. There is no “fancy” machine at the airport, so don’t expect to renew your monthly pass at the end of a trip back and forth to Zürich

Money & Phones

  • I would get a Bank account at the post finance (in the post office, if you walk from Winkleried str Tram stop to Rigiblick Tram stop, you’ll see it on the left).
  • When you get bills, you typically get a “pink slip” – bring it with some cash to the post office, and you can pay it there.
  • The post office is closed over lunch
  • ETH pay can be picked up between 11 and 2 in the back right corner of the 2nd floor (I think) of the main ETH building
  • Sunrise pre-pay is the simplest mobile phone plan. You can “top it up” at any Co-op grocery store (just ask for, say “50 CHF on my sunrise)
  • Sunrise pre-pay charges you 1 CHF on each day that you make a call, text or use internet (up to 3 CHF per day total). You can get an add-on plan for unlimited internet if you use it a lot, for about 10 CHF a month.

Shopping

We’re not big shoppers, so this is just the basics.

  • H&M has reasonable clothing. The big Co-op and Migro stores have inexpensive clothing options too. There are also lots of sales in the “mall” under the main train station
  • There are a number of farmer’s markets worth checking out.

Things to check out that you might otherwise miss

  • Feminist Zürich: The labrynth and Feminist Tours of Zürich
  • The rooftop swimming pool & spa (“Thermalbad Zürich”)
  • Dolder ice rink 
  • Swimming in the clear cool clean lake of Zürich (‘nough said)
  • Tour the archeological ruins of Zürich (register at City Hall to get a “key to the city” and a map). Takes time to get the key, so this is really only for folks living here.
  • Lots of wonderful places to walk in the Züriberg (Look for the life-sized elephant fountain in the woods) and the Jütliberg. Enjoy them.
  • There’s lots of festivals in Zürich and Switzerland worth checking out. Basel Fasnacht in the spring, independence day parade in mid August, etc. etc. Google to find them. Don’t necessarily confine yourself to Switzerland – for example Austria has numerous “balls” in dance season (winter).

English speakers

  • The expat forums are a great place to find advice about all sorts of stuff
  • There’s some great meetup groups for childless expats – they do all sorts of sporty stuff in the mountains, if you’re into that. They tend to hold separate from the swiss
  • If you prefer to mix with the locals, try a yoga class, join an orchestra, etc. Downside is you have to speak some german and it helps if you’re working on understanding swiss german.
  • The ETH has a tandem-partner program. You can sign up to practice german and offer to help someone with English. We had great experiences with it. They also offer German classes (1x week)
  • If you have kids, the public school has an amazing program for helping them to learn german before shifting them to “regular” school. The teachers are wonderful and for my kids at least, the class worked wonders. Just register with the school system.
  • Be prepared for younger (even 1-3 grade) kids being done with school at noon two days a week or more, and having no school from 12-2. Don’t worry though, Hort will feed them a warm meal and let them play/do crafts during lunch, and as late as you need on weekdays.

Doctors

There’s an English speaking doctor’s office that has long hours at the main train station. There’s also a 24 hour pharmacy there. You should receive accident insurance through ETH, and you know better than I where you get your health insurance.

Garbage

  •  For ETH folks, you just buy regular garbage bags. For everyone else, there’s special taxed bags
  • Recycling: plastic goes inside the co-op in their wall collection unit. Metal and glass you can find bins for around the city 2-4 times a year (unsure how often) you will find a garbage bag in your mailbox for clothing and shoes. Anything else of quality, if you put it outside, someone is likely to take it.

Other

  • The climate makes gardening easy. The abundance of green space also makes foxes quite common. As a result, you can’t eat greens raw: they can leave a parasite on plants that is deadly in the rare case you catch it.
  • There is a community farm that you can help out at near the botanical gardens, if you want more than that. I’m sure there’s other options if you want an actual garden bed, but a year is short.
  • We were able to get permission to garden in the non-grassy areas of our yard.
  • We went to a Tot Shabbat service at a local liberal temple, the Jüdische Liberale Gemeinde. It’s a bit out of the way in what looks like an apartment building, but the people we met were wonderful and very welcoming. Be prepared for swiss german though :).

Have fun!

Public school in Switzerland

During our time in Switzerland, the children attended public school. An important goal for us was that they would learn the language, and the public school system supported this. The school the children were assigned to was 1.5 km from our home, and we generally walked or took the bus (sometimes the children went the whole way on their own toward the end).

The athletic area and my son's school building
The athletic area and my son’s school building

My son was placed in a class for second language speakers. His class had an ever-changing group of about eight children, and two main teachers. The class curriculum was ungraded and tailored to the children, who were moved into mainstream classes as they learned german (if they were staying in Switzerland longer-term). These were all grade-school children, so in addition to main lesson, they had handwork, music, and swimming classes with other teachers, along with special times for gym, art, language and mathematics.

My daughter was placed into a Kindergarten class where a mixture of swiss german and german was spoken. Her main teacher, who had taught at a Steiner school for 22 years before switching to a public Kindergarten, was a warm hearted and loving woman who connected well with my daughter and supported her love of creative play as well as craft work.  One day a week was spent in the woods (the entire morning) playing and cooking over a campfire. Other days were spent in the classroom and back yard. One afternoon a week, my daughter had German class along with other older students who were new to German.

School included a 1.5 hour lunch break, and ended at noon two days a week for my son. My daughter was done at noon three days a week and had the same lunch break. If parents worked during those hours, children could attend “Hort” — a sort of daycare with a kitchen (hot food is an expected part of a healthy lunch). We were skeptical about Hort at first, but my son in particular grew to love the free play and delicious food it provided, and both children often came home with crafts or stories from Hort.

Some things that stood out about the childrens’ experience in school, besides the overall quality of the education, were:

  • Dedicated teachers who educated in a way we loved (the principal, one of my son’s teachers, and my daughter’s teacher all had experience with Steiner education for example, one of my son’s teachers was also trained in art therapy)
  • Very high quality facilities (the school had its own swimming pool, for example, with a moveable floor!)
  • Recorder Concert
    The Block-Flöte player at the concert

    Quality was important throughout. At the end of year picnic, there was a small concert. The school had arranged for a world-class Block-Flöte player to perform in a fairy tale retelling. The music was incredible, and the children ate up every note (and every word).

  • School in switzerland is clearly organized on the principal that education isn’t just about getting as much information into children as possible as quickly as possible. There seemed to be 1-2 short weeks (or whole weeks) free of school every month we were there. Overall, the Swiss do not seem to worry about the children spending hours learning each day. Between half days (every week) and frequent vacations, it is also set up for families with a parent who works part time or not at all.
  • The children learned German incredibly fast (I was reading and retelling portions of Harry Potter to my son all in german within six weeks).

Upshot? I can highly recommend public school as an option for visiting families in Switzerland.

My chronic illness and academia

A friend happened to send an interesting article  about chronic illness and academia my way today, and it made me realize that a post on the topic is long past due.

For those of you don’t know, I have Lyme disease. I was diagnosed in October 2007 (pre-tenure), but had been ill for at least a year before that. I blog about my Lyme disease at http://gotlyme.wordpress.com (inventive title, I know :)). A special section of the blog focuses on work and illness. Lyme disease was debilitating for me, and I was on disability (part time) during part of my treatment. I am now much better, but still have relapses about once a year and bad days more often. Prior to my experiences with Lyme disease, I had a very difficult repetitive strain injury that also caused severe impairment (typing at most 30 minutes a day at first, 2 hours a few years later). This occurred at the start of my PhD and lasted into the beginning of my first faculty position.

I don’t want to go into details about those illnesses here, rather I’d like to speak about the relationship between having an invisible chronic illness and being an academic. I have had more than one person approach me asking for advice and guidance as they deal with their own illness, usually by word of mouth. Illnesses like mine (and many other chronic illnesses) are mostly invisible, and it can be difficult to find information about how to cope with them. When other academics disclose their own struggles and process (such as Elyn Saks’ article about working with schizophrenia, or Gerry Gold’s article on the social context of long term disability as an option for those who are considering leaving work because of their illness) it can be eye opening and inspiring for those of us trying to find our own way forward. Equally important is access to facts, such as the AAUP’s report on how faculty members with disabilities should be accommodated.

For me personally, one of the biggest changes I’ve had to make (and most positive) has related to my  approach to managing my time. Time management has been a theme since early in my graduate school career and I have learned never to take time for granted, how to prioritize, and when to cut back. I discuss some of the things I learned on my Lyme blog:  managing an unpredictable illness, and trying to manage a full time job on a half time schedule.

Another big challenge has been disclosure. It took a great deal of time for me to value the label “disabled” as a valid description of myself, and I have always been sensitive about it not only personally, but also with respect to my work. I never filed anything official with the university during my graduate school experience. I did not speak of my impairment during my job search until I reached the negotiation stage. I did however confront our dean at the time when I thought he wasn’t doing enough to prevent others from suffering the same preventable injury I had.  At CMU, I spoke with my department chair as soon as I had a diagnosis. However, I waited to file anything official until I desperately needed a parking permit, and one close to my building. Even then I accepted second best up until the day when I almost collapsed trying to get from that spot to my office. Finally fired up, I marched (well hobbled with my cane) straight into the deans office and demanded something better.

Although I have disclosed much of my experience at this point and freely speak of it when it seems pertinent, there is still one area that I rarely discuss: the cognitive impacts of my disability.  At my worst, I would sometimes spend more hours each week in fog and pain than out. Thanks to my very supportive husband, I would grab my computer and do essential thought-work whenever the clouds cleared and he’d take the kids so I could do so. Even so I read one journal paper (submitted before my diagnosis) in horror when the proof came back a year later. I have experienced moments in meetings with students when I could not find the words to express my thoughts; been reminded that I just said the opposite of what I meant, and experienced large black spots in my memory. These moments (though thankfully mostly behind me) were experienced as fearful signs of the possibility that I might not be able to continue as an academic. Most difficult is when they occur in public contexts, such as the difficulties I had at a talk I gave a few years back, and the program committee meeting that led to a blog post on the difficulties of re-integrating into academia after a long pause.

But what has defined the positive side of my experience, more than anything, is the support of those around me. I will never forget hearing that my advisor almost got into a fight defending the truth of my claims that I could not type. Or the day that an angry stranger mumbled about supporting the disabled as he opened a door for me when the push button failed and my hands lacked the strength (this led to my eventual acceptance of the label disabled and a related lifelong interest in assistive technology research). I have had long phone conversations with a colleague in our field who experienced chronic fatigue. I have been given a role as a collaborator when I sorely needed it. Instant Message chats galore have enabled remote and close colleagues to help me work through difficult patches and decide strategy. I have been driven home countless time by a close friend when I ended my day too weak to bike or walk, and been given writing aids (as a graduate student with RSI) and more recently teaching leaves, co-instructors, extra TA support, classes scheduled around my disability, and control over my tenure clock.

If you experience a chronic illness, I encourage you to go after the support you deserve, accept help, and seek advice. Drop me a note any time. Check out the Chronic Illness and Academia forum at the Chronicle of Higher Education. Talk to the people you trust, and get advice about when to say more. Stand up for yourself when you need it, and if people are not supportive, find other friends. Above all, know that it is possible to be both disabled and an academic, if that is the path you choose.

Learning languages

I’ve mentioned before that one of my sabbatical goals was to learn a new language (Hindi). I am not fluent, but I think I came a fair way with it, and I want to comment on the role of different technologies and approaches in our successes (and failures) as a family to learn the three languages that we tackled on this trip.

One of the most useful technologies we employed was the Rosetta Stone software. The kids loved Rosetta Stone, which we started using almost as soon as the sabbatical was approved to get them familiar with Hindi. They spent about 30 minutes at a time on it at the beginning. At our peak, this happened almost daily (after we left Pittsburgh but before we were settled in India. Eventually we hired a tutor (a wonderful friend now) to come for about an hour most days instead. The kids were far more resistant to being tutored than they were to using the software, but I feel we covered much more ground in those hours. We made up all sorts of games, retold fairy tales, played shop, and generally did our best to make it child friendly.

Hindi was a relatively hard language to learn (new alphabet, different sentence structures, and so on). Once we got past the vocabulary phase,  progress was slowish. Still, by the end of the fall we could have whole conversations in Hindi as a family. The kids were not alone in learning the language: Anind and I were trying very hard to learn it as well and we tried to speak it at meals, with our Indian driver, and so on. So between the tutoring and the daily practice opportunities, they used Rosetta Stone less and less.

The Rosetta Stone was not a pure success. It required the right context to be used — enough motivation, and not too much other support. We almost never used the German Rosetta Stone I bought, and of course the kids are far more fluent (they are immersed, unlike with Hindi, and it is a much easier language for them to learn). Use of Rosetta Stone is rare at this point, and mostly me.

You get free tutoring through the online package with Rosetta Stone, along with access to online games. The games are a fun way to practice but slow. When possible, I sign up and have a session with an online tutor. It is based on the material I’m currently covering in the software. However, they only let the kids do it when there’s no other remote participants, which is sometimes hard to find at popular times in the early stages of learning a language.

As a computer scientist, I cannot help but be impressed by the software. It is dedicated to learning language through immersion, and the authors have done an excellent job of maintaining that throughout the software and the tutoring sessions. It uses speech recognition to check pronunciation, and provides multi-media support for learning. And it works, if you put the time in with it, you learn. To my mind, it’s a success as an educational tool and an interactive tool. It supposedly has a social side as well, though as a Hindi learner I was one of few and could not take advantage of it. I’d be curious to see what it’s like.

It has always seemed such a shame to me, that learning multiple languages is not a norm in the United States. During our travels we met 10 year olds who spoke 5 or 6 languages, all with fluent ease. They never needed to touch a piece of software or a tutor. The world has become so small, yet so many of us in the United States fail to give our children the gift of understanding and the mastery of complexity that comes with learning multiple languages. Most of my swiss cousins have raised children who are bi- or tri- lingual, and without the errors that plague even my immersed children.  There is no substitute for that level of early exposure.