The iData Course Website
The increasing availability of data has created a sea change in the way we build interactive systems. It is possible to easily access information about a person’s activities, online and offline, about the state of the world around them, and about the activities of other people connected to them either directly or through their use of shared resources. This information can help to contextualize interaction, support inference, provide recommendations, or be directly investigated by the user themselves. The goal of this course is to provide you with the tools to build data-driven interactive systems and explore the new opportunities enabled by this data through a combination of guest lectures, discussion of current literature, and practical skills development. Over the course of the semester, you will learn about collecting, analyzing and interacting with data. The class also includes a midterm and final project, which is focused around student interests. In 2016 I posted a description of some of the projects students had engaged in over the first three years of the class.
The goal of this course is to provide you with the tools to build data-driven interactive systems and explore the new opportunities enabled by this data through a combination of guest lectures, discussion of current literature, and practical skills development. Over the course of the semester, you will learn about the entire data pipeline from collecting and analyzing to interacting with data.
This course requires comfort with programming, as required projects make use of (at a minimum) python, sql, css, and javascript (including D3). A series of “project bytes” help to lay the groundwork for larger group projects.
The learning goals of the course are as follows:
- To introduce basic concepts in data collection including data formats, parsing and sources of data
- To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
- To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification
- To introduce basic concepts in data visualization including what makes a good visualization and the use of interaction in visualization
- To provide practical applied examples of the data pipeline through an examination of current literature
- To provide hands on experience with creating data driven applications and a produce a portfolio of such applications
Prerequisites:
The class will involve programming and debugging. If required by your background, it is possible to minimize the programming you do for projects (in which case you will be expected to spend more time on other factors such as beautiful visual designs). However, you should not take the course if you find programming or debugging extremely difficult because you will have to master several very different programming languages/concepts in very short order (projects make use use of web programming frameworks including Flask, Bootstrap, Ajax, jQuery, D3, Google Appspot; and multiple languages including Python, Javascript and SQL).
Projects:
The course is project oriented. It includes 1-2 self-defined projects along with 4-6 smaller “project bytes” designed to provide the stepping stones needed to complete the larger projects. Some of the specific skills that will be covered in projects include:
- Display data from an API (such as the twitter API) on a website you create
- Create a mashup of data from multiple web APIs
- Create an interactive visualization of a data set
Project documentation and source code can be found on github. The course is in transition, thus for now look at the original project descriptions and repository, but to see the plan for the future take a look at the idata repository.
Readings:
The following books are recommended:
- Interactive Data Visualization for the Web (Free online version)
- Doing Data Science (Schutt & O’Neil) — based on the very successful Columbia course on data science taught by Schutt (uses R and Python)
These books may also be useful:
- Visualize This (Nathan Yau) (uses R and Python)
- Programming Google App Engine, Charles Severance (uses Python, plus add-ons like JavaScript)
- Python for Data Analysis, Wes McKinney (Python)
Brief (and Tentative) List of Topics Covered:
Concepts
- Structured vs unstructured data
- Dealing with heterogeneous data
- Sampling and Bias in Data Collection
- Sensed Data
- Mobile Data
- Data transformation and analysis
- Information Visualization
- Current research in information driven interfaces
Skills
- Getting Web data
- Dealing with APIs and Oauth
- Getting access to mobile data
- Common data formats
- Data parsing
- Common problems with data
- Tools for analyzing data
- Tools for visualizing data