12 August 2015

Bioinformatics "Purity Test": Need more questions!

I'm prepping for the new class I'm teaching this semester, Bioinformatics for Research, and I need help! It's a graduate level class that I'm designing to introduce our Master's students to the basic computational skills they will require to perform common research tasks.

Here are the general skills we'll be covering:

  • data/metadata organization and management
  • automation of tasks with the Unix shell
  • introductory R scripting (data parsing, simple statistics, visualizations)
  • executing command-line programs on remote HPC resources
  • version control with Git
  • small projects in genome assembly/annotation and phylogenetics (common questions for which previous grad students have asked for help)
As a part of motivating students to learn these skills, I'm developing a Bioinformatics "Purity Test," similar to the ones I remember being all the rage when I was in high-school and college. Students will be presented with circumstances and they will calculate the percentage of scenarios they've encountered. Here are the survey questions so far:
Have you ever:

  1. Tried to open a file and found it corrupted
  2. Had to recreate a file because it wasn't backed up
  3. Couldn't find a file because you forgot what it was called or where you put it 
  4. Had a computer crash lose your unsaved work
  5. Had to redo work because you (or someone else) decided it needed to be done differently
  6. Couldn't redo your work because you forgot how you did it
  7. Spent hours performing the same task over and over
  8. Read a scientific paper and wondered, “How did they make those number things happen?”
  9. Created a graph/diagram in Microsoft Excel (or Powerpoint)
Calculate your score: 1 point for every "yes" answer, divide by total number of questions.
My question for you, dear blog readers: what other questions should I include?

Permanently deleted your work on accident (from @thatdnaguy)
Frequently copy/paste to reformat documents (from @PaulBlischak)
Found two versions of the same file, but didn't know how they differ (from @nmatasci)
Forgotten your abbreviations for samples, dates, etc in an analysis (from @nmatasci)
Been unable to use software because you can't find a computer that can run it (from @nmatasci)
Been unable to determine the required input format for a program (from @dwbapst)
Been unable to install software dependencies (from @BrownJosephW)
Had errors related to Windows vs Unix line endings (from @BrownJosephW)

24 July 2015

Getting ready for Botany 2015!

I'm heading off to Edmonton this weekend for Botany 2015. This conference is co-hosted by a handful of professional societies, a few for which I've maintained membership for several years (Botanical Society of America and American Society of Plant Taxonomists). I'm excited to be returning to this conference after attending Evolution for several years. My dance card is very full for this conference, but I'm looking forward to each part:
  1. Ecological niche modeling workshop: A few of my undergraduate researchers are pursuing some projects involving species distributions and niche modeling. Pam Soltis (who was on sabbatical at NESCent when I was there) is hosting a workshop on this very topic, and I'm elated to have some time to sit down and formally learn methods using QGIS, an open-source modeling tool.
  2. Oral presentation: I'll be presenting preliminary results from my characterization of transposable elements in Agavoideae (agave/tequila, yucca, etc). I'm really excited for this collaboration with two other early career scientists, Michael McKain and Alexandros Bousios, to finally come to fruition. You can read my presentation abstract here, and I'll be posted the slides to SlideShare after the presentation. 
  3. PLANTS mentor: The Botanical Society of America sponsors undergraduates from under-represented groups to attend their annual meeting, and I'll be acting as a mentor for one of these students during the conference. He'll be giving a talk on palm evolution that sounds great! I'm looking forward to meeting him this weekend.
  4. Student career luncheon: I've been invited to speak at a luncheon for student conference attendees about careers in botany. This sounds like a great way to share my experiences in my path to research and teaching. My short talk will be followed by "speed dating," where students will be able to interact with professionals.
  5. Professional society service: I have some additional obligations to serve a professional society during the conference, which makes me feel like a Very Adult Scientist but also will keep me pretty busy!

13 July 2015

Debriefing from TACC Summer Supercomputing Institute

I had such good intentions to blog more this summer during my reprieve from class planning, but that obviously didn't work out. A short blog post describing my adventure at the Texas Advanced Computing Center (TACC) Summer Supercomputing Institute (SSI) in Austin, Texas last week seemed a good way to reinforce my lessons learned (as well as provide a convenient way to ease back into a normal work schedule). Here are the highlights:

  • Students in the class were from a variety of disciplines, including applied math, physical science, and life science. Some folks were proficient programmers, others (like me) were familiar with running programs but had little experience writing their own compiled programs (consequently, I'm looking into taking more formal courses in CS).
  • The class covered a variety of topics, including parallel programming (OpenMP and MPI), debugging/optimization, data management, and data visualization. You can see some of TACC's previous course materials here (and I hope they'll add public access for materials used for our class, some of which were new!). I was especially enthralled with the session on data management, and am intrigued by exploring Hadoop for genomics.
  • Given that this class (as well as many of TACC's other training sessions) are geared towards folks with a background in programming, I spent some time talking to folks there about whether the resource is appropriate for entry-level folks (i.e., biologists rather than computer scientists). As it turns out, the marketing and education folks there are very interested in continuing to expand the user base, including folks who may not be very proficient at the command line. To that end, I've started a GitHub repo to develop materials for folks new to TACC (which I'll rely on heavily for my graduate-level bioinformatics class this fall). These materials might also be of use to folks who access HPC resources through XSEDE, which has a campus champions program that I'm thinking of joining. 
All in all, it was an intense but intellectually profitable week. Plus, I learned to make fancy figures! I don't know what it means, but the units are PARSECS! Super cool.

11 May 2015

Self evaluation for teaching an undergraduate bioinformatics course

I wrote several lengthy posts last fall and winter that reflected on my preparations to teach a new course this spring (bioinformatics lecture and lab for undergraduate biology majors, main post here). The logistics and intellectual drain of new course prep kept me from writing much about this course as it progressed. Now that the semester is nearing its end (finals are in a week and a half), I'm compelled to report back on how the class developed.

For a quick overview, check out this poster I put together for the UT Tyler teaching symposium, highlighting my experiences implementing this new course:

While each bullet point below could certainly warrant a post all by itself, I'm going ahead and outlining everything while it's still fresh in my memory, laundry-list style:

General course description:
  • Lecture met twice a week for an hour and a half on Tuesday/Thursday morning (three hours total per week)
  • Lab met once a week for three hours on Thursday night.
  • Eight students enrolled, all biology majors, mostly pre-professional.
  • Assessment consisted of weekly homework submitted via GitHub for lab and Blackboard for lecture. Students also completed a class project for lecture by researching and presenting on a topic we didn't cover in class.
  • Only pre-requisites were two semesters of introductory biology.
What worked well:
  • I tried to adopt a lecture style that minimized actual lecturing (I averaged 20 slides for an hour and a half lecture). I implemented class discussions and think-pair-share type activities, including drawing a concept map at the end of semester to summarize.
  • For lab, my students loved R, especially working in RStudio.
  • I explored the use of analogies to explain complicated concepts in genomics and bioinformatics.
  • I used signed pre- and post-class surveys and anonymous mid-semester evaluations to gauge how students felt about the class (this is mostly how I know things were working well!).
What I'll change for next time:
  • Establishing the computational infrastructure remains challenging. I can't require my students to have their own (personal) machine for installing software. I have a computer lab for students to use during lecture and lab, but university policy constrains my ability to use these machines (e.g., I can't install software myself). I also had students log on to a remote HPC resource through TACC, but about half my students had problems accessing it. 
  • Continued from the last point, I had students use Cygwin to learn Unix/shell/bash commands, but the installation on the class computers made the path names ridiculously awful to navigate. My students agreed this was their least favorite part (which is a shame, since shell scripting is my personal workhorse for research).
  • Students appreciated not having exams, but the workload (for them and for me grading) was a bit cumbersome (I had a lecture and lab to grade for each student almost every week). I will consider using alternative assignments (weekly online quizzes, and halving the number of lecture assignments) in the future.
With all of that in mind, I'm pretty happy with how the semester finished out, and am still excited to teach Bioinformatics for Research at the graduate level this fall.

24 April 2015

Under-appreciated Texas wildflowers

In my ongoing quest to balance the computational aspect of my work, I've been working with the East Texas Master Naturalists to continue developing their herbarium collection of local, native plants. It's been a great synergistic relationship: they teach me about native species, and I've been setting up their herbarium database as a series of spreadsheets and documents in Google Drive.

We met this morning out at The Nature Center, a Texas Parks & Wildlife facility that houses the herbarium and has meeting space. This very wet spring has led to an abundance of iconic Texas wildflowers, like bluebonnets and primroses. Much to my delight, I also found some of my favorite spiderworts growing nearby. It's been several years since I did serious plant collections, but I still managed to spot them on a roadside on my way from campus this morning. I apparently haven't lost my skill at picking out the flower color and growth habit from the multitudes of flowers blooming right now. This beauty (picture to right) is a great example of Tradescantia ohiensis, one of the very widespread species of erect Tradescantia. Each individual flower only lasts a day before deliquescing (melting), but the plant will keep blooming until next fall, as long as it doesn't get fried in the Texas heat.

While visiting with my old friend T. ohiensis, I took the opportunity to scratch another itch that's been in my mind for several weeks now. I've been absolutely awestruck by the thistles growing on the roadsides this spring. The picture (to the left) doesn't do it justice, but these plants are almost five feet tall, and covered in menacing, spiky leaves. There appear to be several species of Cirsium here in Texas, and I'm looking forward to seeing more examples of these monsters.

While perhaps not as charismatic as other wildflowers, these two examples get a thumbs up from me as particularly cool plant species.

30 March 2015

Data Carpentry hackathon for genomics

I'm pleased to report back from the Data Carpentry (DC) genomics hackathon, which I attended last week with ~26 other folks at Cold Spring Harbor Labs in New York. The goal of this meeting was to develop modules for a DC workshop focused on analysis of next-generation sequencing and other genomic data. The original DC lessons were designed for a very general audience using ecological data, so we were tasked with outlining, organizing, and starting to write materials for a two-day workshop specifically for genomics.

Each of the following points could be thoroughly explored in their own post, but here are a few highlights from this meeting:

  • Attendees were a great mix of biology researchers and educators from a range of institutions (research intensive, primarily undergraduate), computer scientists, and assessment specialists. This meant we were pulling from a broad range of skills, and incorporating multiple perspectives in planning.
  • The length of the meeting (2.5 days) allowed us to get a running start on actually developing materials (GitHub repos here prefaced with "genomics"). In addition to "intro to Unix" material that would largely remain constant from the original DC lesson, we started developing six modules that cover a general genomics workflow: setting up a project, getting to know your data, data wrangling (QC and alignment), analysis and visualization, and cloud/HPC. I personally found it remarkable and gratifying to see so much attention paid to the initial preparatory stages of a project.
  • Numerous folks emphasized the importance of understanding your target audience. Some of these discussions related to the assumed skill level (or pre-requisites) for workshop attendees. Other conversations related the need to accommodate particular cultural or gender issues while teaching to make the learning environment comfortable for everyone. 
  • What makes DC workshops special and distinct from other courses? In developing the modules described above, we talked about the distinction between Software Carpentry and Data Carpentry, as well as if and when instructors should be expected to teach about biology (rather than computing/data analysis). The general consensus is that the focus of DC on telling a narrative about data means we should be emphasizing "best practices" for improving productivity and reproducibility, rather than advocating for particular types of analyses. That being said, there is ample opportunity during lessons to model rigorous methods, as well as provide extra resources for students to improve their skills in experimental design and statistical reasoning.
  • A particularly challenging aspect of developing such resources is assessment of student improvement following a workshop. It's challenging to evaluate how much students will retain after such a short period of time (2 days), as well as whether these skills will transfer over to their research methods. One breakout group focused on developing a strategy for surveying students prior to and directly following a workshop to measure immediate learning, as well as 3-6 months following to measure long-term gains. We targeted question formats that would address student learning in terms of the following areas: declarative knowledge (Can you recall this fact?), skills (Can you write this code?), and attitude (Will you use this skill?).

I was initially on the fence about whether to apply for the hackathon. I'm a first year professor wallowing in the murky depths of teaching a new course, and my overtaxed brain was whispering that maybe it would cause too much stress. My gut, thankfully, doesn't always listen to my brain. Moreover, the class I'm piloting this semester is an undergraduate bioinformatics class focused on genomics, so the DC hackathon fit naturally into my preparation for the last few weeks of the semester. I'm looking forward to reporting back soon about my semester-long class is wrapping up, as well as my first teaching experience for Software Carpentry workshop in a few weeks.

18 December 2014

Formal address.

Right after finishing my PhD, I started preparations to move to North Carolina to begin a job as a postdoctoral researcher. My mother accompanied me on a preliminary scouting trip to find an apartment. I was baffled and a little amused when she made sure potential landlords and leasing agents knew I was "Dr." Kate Hertweck.

My title has never really felt comfortable to me. I certainly feel like I earned it, but I don't necessarily feel compelled for other folks to address me as such. I added "PhD" to my email signature, along with my affiliation, and that seemed to suit my electronic communication needs. It took over a year before I stopped laughing when people introduced me in person using it. Now that I'm a professor, I still don't introduce myself using that title. More often than not, however, I find myself needing to clarify to various folks on (and off) campus that I am, indeed, a Doctor of Philosophy.

I've taught classes as both a graduate student and postdoc, and until now I've been comfortable with students referring to me by my first name. As I'm writing the lab manual for my class next semester, though, I'm constantly second-guessing my choices in how to reference myself. The generic "your instructor" seems so sterile and unnecessary, given that I'm writing documents specifically about me and my class. But what is a better option?

Of course, I'm a resource junkie, so I took a few minutes to look at what other folks think about this topic. I grabbed blog posts from NeuroDojo and Small Pond Science and articles from Slate and Inside Higher Ed. I was really serious about learning things, so I even read the comments. Here are the options I've discovered for how students may choose to address me:
  1. Dr. Hertweck
  2. Professor Hertweck
  3. Doctor Professor Hertweck
  4. Dr. Kate
  5. Kate
  6. Dr. Hert (pronounced "hurt")
  7. Ma'am
  8. Ms. Hertweck
  9. Mrs. Hertweck
With such a plethora of options, I definitely feel like I need to at least narrow it down for students. I find the last two to be unacceptable, and #7 to be somewhat distasteful (although I often feel compelled to address other folks as such, and it's rather unavoidable here in the South). #3 has too many syllables, with #2 almost too many. #6 exists only to amuse me. But still, I'm straddling the fence over whether to prefer formal or informal names. I've even considered offering all remaining options to students, and keeping track on which they choose (I really do like collecting data). 

I recognize all the arguments for different forms of address. The argument from NeuroDojo resonates with my personal philosophy of science. However, I'm a young, early-career female, so I may need to impose more authority on students. There doesn't seem to be a clear standard in my department, either. Moreover, when my mom introduced me as a doctor when looking for apartments, it actually made a difference (my application fee was waived). I dislike using that type of privilege, but I need to admit that it does occur.

I suppose I've spent a lot of time thinking about this particular topic because it represents a very tangible manifestation of my uncertainty with my new job description. What's appropriate clothing for me to wear to work? How formal should my language be? Moreover, how do all of these considerations interact with my own personal preferences and sense of self? If any of this sounds familiar, it's because I pondered the same issues of personal feelings vs. perceived expectations in my last post. I suspect that this current post will also not be the last.