Visualizing 60 Years of Environmental Data
In the summer of 2017, I worked with the Bernhardt Lab and the Cary Institute of Ecosystem Studies. During this time, I spearheaded the design and development of an RShiny Dashboard that visualizes a large data set comprised of more than 60 years of environmental data produced at the Hubbard Brook Environmental Forest. I worked under the direction of Dr. Emily Bernhardt and Dr. Gene Likens, 2001 National Medal of Science.
Too much data! With 60+ years of weekly measurements —from precipitation volume to streamwater chemistry— hydrologists that work at the Hubbard Brook Environmental Forest were in need of a tool that could help them easily explore the data without requiring a strong background in statistical programming languages.
When defining the problem, it was clear that our target audience was hydrologists and environmental scientists that want to explore environmental data for hypothesis testing and hypothesis generation.
Understanding the problem
I facilitated stakeholder interviews with hydrologists Matt Ross and Richard Marinos, as well as with Dr. Emily Bernhardt. These interviews helped me properly define the problem, understand their needs, and make suggestions for the best ways to move forward.
At this stage, I also read various environmental science papers and textbook chapters to understand how the target audience talks about data, the ways in which they usually present it, and the types of visualizations they find useful.
Ideation and Prototyping
After getting insights from the stakeholders, my team and I had two separate brainstorming sessions where we generated ideas that I later prototyped. During the ideation stage I relied on my notepad, two big whiteboards, and Adobe Illustrator. During this time, we also took advantage of our co-working space and had multiple feedback sessions with people from different and relevant fields including environmental science, statistics and computer science.
After we came up with some ideas about how the visualization application could work, I developed a low-fidelity prototype using RShiny, HTML and CSS. I then prepared various iterations where I incorporated the feedback that we got from hallway testing and from the stakeholders themselves.
Key Insights and Features
The clustering of visual data
Empirical data in cognitive psychology has demonstrated that visual clusters formed during pattern recognition are used to reason about graphs during the cognitive integration stage. I proposed and incorporated a color mode feature through which users have the flexibility to choose color as a salient feature that maps to their needs. In one mode, they are able to compare data across different watersheds (watersheds are clustered by color) while in the other mode they are able to compare across solutes (solutes are clustered by color). Both of these comparisons are valuable to hydrologists in different situations and, thus, incorporating this feature was important for our target audience.
Slow graphing: using animation to encourage deep thinking about data
During a feedback session, Dr. Gene Likens pointed out that, in the 1960s when he was working on the paper that would eventually demonstrate the causes of acid rain, the slow income of data played a central role in his exploratory practice. At the time, he would have to wait minutes, hours, and sometimes days for data points to come in. He talked about plotting each one individually and mentioned that the slow pace of this process encouraged him to examine, hypothesize, and analyze each data point carefully. Responding to his feedback, I incorporated a slow animation feature that allows researchers to see the plot slowly evolve.
Flexibility: letting the user decide how they want to explore the data
Since researchers and hydrologists wanted to use the tool for hypothesis formation and preliminary hypothesis testing, I designed a module where the user is given more flexibility to "play" with the data. The user can generate a scatterplot by inputting one or a series of variables in the x and the y axis. A key feature of this design is that the user is able to relate various solute concentrations by performing arithmetic operations, which we realized during the research phase, is commonplace in this field.