08 September 2012

Data visualization

I've been thinking lots lately about how we visualize data in biological research. As a student of both communication and science, I find the interplay between the two fields to be especially compelling. This interest was reinvigorated last spring by some meetings with a visiting NESCent scholar, Tyler Curtain, who introduced me to some of the preeminent and current literature on the field. Since then, I've noticed the theme popping up repeatedly in my interactions with colleagues. Folks pop in to my office to ask for advice on figures, and some NESCent-associated projects like Open Tree of Life are particularly interested in advancing visualization abilities for evolutionary biology.

As a result of this convergence of events, I jumped at the chance to take a short workshop on data visualization offered by Duke Libraries Data and GIS Services. I was struck by the breadth of options available for representing data visually, and began thinking a bit about how different figures in evolutionary biology are from other fields. I talked a bit with Angela Zoss, the instructor of the course, who concurred that biological disciplines tend to lag behind other sciences in adopting more effective methods of visualizing data (I would extend that claim and imply that sciences in general lag behind other fields of study as well).

Let's face it, folks. Figures representing biological data are notoriously problematic to not only build, but interpret. Dendrograms (which in the data visualization world include many types of tree structures) are very complicated in evolutionary biology, and our tendency is to cram as much information as possible into each figure. A single diagram may include tree topology, branch lengths/divergence times, taxon names (tree tips AND higher taxonomic groupings), color coding for one or more traits, etc. Additionally, I've long thought that effective visualizations for genomics research, especially in the comparative realm, are notoriously convoluted and difficult to understand.

Why are our figures so complicated? I think it's because biology as a science inherently includes many different variables, each of which includes sometimes large margins for error. As scientists, we want to tell our research narrative in as non-biased a manner as possible, which means including (visually) as much data as possible. This impulse is compounded by a push to streamline figures for publications, which is further complicated by lack of availability of color, space, and resolution.

But let's face it. We're never telling an unbiased story with our figures. We make decisions about inclusion of data in a study and methods of analysis even before we get to the publication stage, and cramming as much summary information as possible into a figure doesn't represent those biases. What is the goal of a figure or representation of biological data? It should be interpretable by an audience, which in journals means scientific peers. When visualizations become so specialized that only a handful of people in a field can understand them, we're working counter to the purpose of the visualization. It's not doing its job.

I advocate striking a balance between the goals of the two paragraphs above. Tell a clear story with the data, but include enough information for the audience to understand associated variance and error. As a result, I'm planning a open discussion at NESCent with folks from our informatics and science groups, in addition to folks who work on data visualization, to see how we can improve our methods of building figures.

Possible topics for contemplation include (but aren't limited to) the following:

  • tree visualization
  • deep time
  • visualizing error
  • taxonomic levels
  • trait mapping

1 comment:

Jen said...

Hey Kate,

Saw this in the NY Times this morning...