Data Driven Detroit (D3) launched the 2014 One D Scorecard in May (read more about that here). Today, we’re writing to share more about our process for making this interactive data tool through a Q & A with NiJeL, a team of data scientists and developers that D3 collaborated with to build out the Scorecard. But first, a little context about the project:
In its third iteration, the new Scorecard makes exciting strides in data management and presentation. Working with NiJeL, we focused our resources and energy on two key components of development: an administrative tool for data management on the back end; and an interface powered by interactive data visualizations. We also revamped our data by updating and curating indicators, creating indices for each of the five Priority Areas, and incorporating a data deep-dive using original Opportunity Mapping research from the Kirwan Institute.
Let’s dig into four questions with Lela Prashad and JD Godchaux of the NiJeL team.
D3: Way back when we first started working together, we warned you that we were managing the One D Scorecard data through Excel workbooks, a single workbook per indicator (over 50 workbooks at one time!). It was a bit of a data nightmare in a few ways, especially when it came time for annual updates or when we needed to compare individual indicators across geographies. How did you sift through our data and come to the new centralized solution we’re using today?
NiJeL: Good question! We realized early on that we needed an easy way for D3 staff to update indicators as these datasets are updated, rather than all at once, say on an annual basis. We also understood that we wanted an automated way to add these new data to the One D Scorecard website as soon as an indicator is updated, and the only reasonable way to make this happen was to build a database to house all these indicators. So, we built a MySQL database and modified Xataface, an open-source software designed to add a simple admin interface for a MySQL database.
Once we had honed in on using these tools, we went through each Excel workbook and added each indicator to the MySQL database, slightly reorienting the data from the Excel sheets to make it easier to use. We then wrote two scripts, one to simply pull all of the relevant indicators for each region and package them up all together so the website could create the visualizations that it does, and another to calculate the five Priority Area index values and the overall One D Index score for each region.
Now, the staff at D3 can update any specific indicator by uploading a CSV (comma-separated value) file with any new data they would like to add. Once these new data are added, the web app will update the site visualizations once there is a critical mass of the data from each year to make the index calculations meaningful. We’re hoping this will be a big improvement!
D3: While older iterations of the Scorecard ranked regions based on their performance in a single indicator, we took that a few steps further this year using indices. An index lets us roll up the individual indicators that comprise a Priority Area into a single summary score, and then roll up each of those five Priority Area scores to create a One D Index Score for each region, making comparisons comprehensive and straightforward. Our favorite feature is how an index calculates on the fly and smartly recognizes when too few indicators for a given index have been updated to update the index itself. Can you share a bit about the process for programming in these analytical features to the highly visual front end?
NiJeL: Of course! As you mentioned, we want to be smart about how we’re calculating the index scores for each of the five Priority Areas and the overall One D Index, and we want to do this in the context of new data being continually added to the database. To accomplish this goal, we programmatically look at the group of indicators in each Priority Area and determine if more than 50% of these indicators have data for the year in question. For instance, the Economy Priority Area has 7 indicators, so if 4 or more of these indicators have data available for 2012, then we would calculate an Economy Index for 2012.
However, for us to go ahead and calculate a One D Index and include the year in our visualizations, each Priority Area would need to surmount the 50% threshold. Once that occurs, the indicators and Priority Area indices are added to the visualizations and the data become available for download.
NiJeL: Well, we had the distinct advantage of working with two individuals, one being Ms. Hartman and the other being D3 Project Manager Jessica McInchak, who both were interested in web interactive design and building interactive visualizations. Both actually contributed to the codebase for the One D Scorecard, which is an extremely rare thing for a designer and a project manager to want to do, but both Ms. Hartman and Ms. McInchak were excited to have the opportunity which made for a fantastic working relationship.
Ms. Hartman’s design for many of the elements in the One D Scorecard were inspired by other designs live on other websites, and so when it came time to build these visualizations, we did have some examples to view, though most were written using other tools like Raphael. We decided to use D3.js mainly because of its flexibility — it allowed us to be able to build the visualizations as closely as possible to what Ms. Hartman designed. The most challenging aspect of building to static wireframes, like the ones Ms. Hartman designed, is understanding the intended interactions and transitions between states within a particular visualization. It’s challenging as a designer to draw out intended interactions and as a developer to follow through on those intentions, but our close collaboration with Ms. Hartman and Ms. McInchak, minimized any differences we had on building the interactions as intended.
It’s tough to pick our favorite chart to build, since they all had their challenges and rewards, but we’d have to say the “array of pinwheels’ visualization (where viewers can see each region’s pinwheel in an array of rows and columns) was our favorite to build.
In this visualization, the pinwheels load such that the region with the highest One D Index value is placed in the upper left corner and the remaining regions are placed in descending order from left to right and top to bottom. Visitors can reorder the pinwheels by selecting a specific priority area to view from the “Organize By” drop down menu. Building this chart required extensive use of D3.js transitions, which allowed us to be creative in how we moved from one state to another. When visitors select a different Priority Area (or a different year of data) to view, we effectively run three separate transitions on each regional pinwheel. First, we change the color of each pinwheel slice, setting the Priority Area selected to its full opacity and setting the opacity of the other slices to almost fully transparent. At the same time, we change the size of the pie slice if a visitor has selected to view a different year with the time slider. Finally, we reorder the pinwheels in descending order based on the index chosen, but that transition only occurs after a 1 second delay to allow the first set of transitions to complete. Building these transitions in an attempt to clearly compare the differences across regions and indices was the most challenging and fun part of the development.
D3: The 2014 One D Scorecard presented a lot of opportunities to collaborate. Not only did we work with your team around development, but we also partnered with the Kirwan Institute to integrate their Southeast Michigan Regional Opportunity Mapping initiative. Kirwan’s original research was presented through static maps of the overall index scores. What was your motivation and method for interactively mapping both the index and individual indicators?
Crosslet is particularly designed to allow visitors to explore one or multiple variables to see how each is connected, and to see representations of those connections on a map and a simple frequency distribution bar chart. For instance, if a visitor selected the median household income variable, and then selected the income range of $0 – $50,000, they would see only the geographies (census tracts) that have median household incomes below $50,001. They would also see the frequency distribution of the Opportunity Index, high school completion rates, and vacant property rates. Clearly, the distribution of each of the other variable is skewed toward the negative end of the spectrum when we select that income range. However, if we click on the selected range and drag it toward a higher income range, we can see the frequency distributions of the other variables shift toward the more positive end of the spectrum along with income, and on the map we can see which census tracts specifically fit these new criteria. One can also select a range of any other variable on the map to further filter these data. We think that gives visitors a great entry point to exploring these data and drawing their own conclusions about the drivers behind opportunity in the Detroit metro region.
The new One D Scorecard is a powerful tool for its users to access data through visualizations, but it’s also a powerful data management system for D3 to maintain and scale these datasets into the future. And we couldn’t have built it without the awesome team at NiJeL. We’re already counting down the months until the newest annual data indicators are released so we can do our first update!
If you’re interested in talking more about code and collaboration for the One D Scorecard or beyond, connect with D3’s Project Lead Jessica McInchak at email@example.com.