12 May 2014

Data management and maturing as a scientist.

I attended an awesome workshop last week called Data Carpentry. I briefly mentioned my attendance at a Software Carpentry Bootcamp a few years back, and suffice it to say that I was really excited for another two days of computational bootcamping. I even made a quick little Storify of a few of my tweets if you'd care to take a gander at some highlights.

I've spent quite a bit of time in my professional life learning to manage, analyze, and interpret data. In the 10+ years I've been conducting scientific research, my thought processes about how I relate to data have changed markedly.


  1. When I first started learning how to sequence data as an undergraduate, I thought the hardest part of science was collecting data, mostly because that's how I spent most of my time.
  2. As I started graduate school, I thought the hardest part of science was getting started on a research project. No doubt this was due in part to my adventures trying to collect plants for my dissertation.
  3. As I finished my PhD, I decided the hardest part of science was communicating scientific results to get them published, largely because I was being indoctrinated with publications demonstrating my value as a scientist.
  4. During the first part of my postdoc, I thought the hardest part was analyzing data. This is also unsurprising, as my work now is entirely computational (no data collection).
  5. The current iteration of my maturation as a scientist? I think the hardest part of science is keeping track of everything you've done and making it accessible to the rest of the community. It's a concept that encapsulates most of the previous "hardest parts," and while it doesn't often deliver tangible rewards like lines of code, pretty figures, or publications, it's satisfying and essential in its own way. Moreover, my career as a data vulture relies on the community's ability to maintain reproducible, open science.
Anyway, that's why I'm so excited about learning better ways to build/query databases, share code, and maintain data. 

No comments: