Princess Tradescantia

18 November 2016

Preparing students to work with computers

As I gear up to teach bioinformatics again (lecture + lab, to undergraduates), I've been thinking about ways to ease the pain of introducing novice learners to coding and data management. I feel like I'm pretty good at helping students overcome challenges similar to those I personally faced as a newbie, but I'm also realizing that undergrads (at least at my institution) are increasingly ill-prepared with basic computer skills. In fact, some of my students access course reading materials (ebooks and PDFs) and submit homework (through Blackboard, our learning management system) exclusively through a mobile device (phone/tablet), and rarely (if ever) use a desktop computer or laptop.

To make sure everyone is on the same page, I've decided to spend a bit of time in class the first week on basic computer skills. I've consulted the Wikiversity Computer Skills page and added a few ideas of my own, and here's what I've come up with:

Operating systems: Windows vs Mac vs Linux, since most students are familiar with the first two but we'll be working on Linux using a Virtual Machine
Computer organization: User folder, Desktop, Documents, Downloads, etc
Keyboards/typing: Location of special characters on keyboard (control, alt, etc), importance of capitalization/quotes/spaces
Internet: remote vs. local computing (since most students don't seem to know where data used on their phone is stored, or where data processing occurs), locating/unzipping downloaded files
Word processing: Microsoft Word vs. text editors

Granted, some of these ideas will need to be reinforced as we actually start coding (like "Don't write computer code in MS Word), but I'd like to introduce the concepts early. What am I missing? Are there other basic computer concepts we take for granted, that don't really fall into the "computer science" realm, but are necessary to be functional bioinformaticists? Please share in the comments!

18 January 2016

Why I sign reviews.

I have been signing my manuscript reviews since I joined NESCent as a postdoc in late 2011. That organization upheld a strong commitment to open science (as evidenced by their data archival efforts since closing their doors last year), and it was that community of scientists that both introduced me to signed reviews and helped me feel comfortable adopting the practice as my default.

There are multitudes of blog posts and articles about signing reviews. Two sources I find particularly useful include multiple viewpoints from different scientists (from The Molecular Ecologist) and interesting data about frequency of signing in different subdisciplines (from publons).

Many of the arguments in favor of signing reviews refer to improving the quality of the scientific assessment in the review, as well as keeping the tone of the review civil and constructive (as opposed to possibly derogatory). I don't believe the content of my reviews differ based on whether I sign my name to them, because I try hard to be rigorous (and not a jerk). Instead, here are the reasons that resonate with me:

1. Signed reviews are easier for authors to interpret. When I received my first signed review as an author of a manuscript, I realized how much easier it was to understand the reviewer's recommendations. By signing my reviews, I'm offering authors additional context into the comments I've provided. I include my name, title, department, university affiliation, and email, which makes me easily google-able (they could even find this blog post!).
2. Sometimes it's pretty obvious who the reviewer is anyway. On occasion, I've received reviews as an author that are almost laughably transparent as to the identity of the "anonymous" reviewer. The science world isn't so big! If you're a taxonomic expert in a particular group of organisms, it's easy to guess that it's you catching spelling mistakes in names of subfamilies (or suggesting someone cite your papers).
3. It's gratifying to be explicitly included in the process of improving a manuscript. I work hard to be a thorough reviewer, and I hope that my effort may be rewarded by a reputation for fairness and helpful insight. I appreciate my name being included in the acknowledgements (although this did once cause some consternation with a few peers, as it wasn't clear I was a reviewer and they thought I was collaborating with their competitors). In fact, I particularly enjoy reviewing for PeerJ, as my review is published with the early version of the manuscript.

Here are a few things to keep in mind about signing your reviews:

1. Be consistent. It's easy to sign a glowing review of a manuscript that you're accepting without recommendations. It's harder to sign a review of a manuscript you're rejecting, but (for the reasons I mention above), is arguably more important for both you and the author.
2. Ask journals before agreeing to review. Here's a tweet exchange I had with @rmflight about corresponding with journals (pure genius!):

@k8hert that is why I ask the editor first before agreeing to review, and make it a condition of reviewing
— Robert M Flight (@rmflight) January 15, 2016

3. You get used to it quickly. Once I committed to my decision to sign all reviews (for papers I accept AND reject), it only took a few reviews to feel comfortable with it. Nothing disastrous has happened, and reviewing papers feels less like working for free.
4. You can do it! Anonymous peer review is still the default in most areas of academia. In most cases, though, signing is a choice. I've heard early career scientists warned away from the practice, but it hopefully helps you to know that there is, in fact, a community of folks who make transparency in science a priority, and they exist in all career stages.

In short, I've signed ALL my reviews for about five years now. In that time, I've published my own papers, applied for grants, and even landed a tenure track faculty job I'm still quite happy with my decision.

(Thanks to @MusselDS for asking the question and prompting me to finally write this post.)

30 September 2015

Classroom informatics: Managing students, information, and communication.

One of my biggest challenges as an educator is helping students manage information. I don't believe it's my job to impart information to students, but to help them learn to think about information and integrate it into their knowledge structures more effectively. Example: I'm not trying to teach my students how genome assembly algorithms work, I'm trying to teach them how to use the instruction manuals that come with assembly software to analyze their data. Layperson example: I'm trying to teach them how to use a recipe book, rather than memorize all the recipes.

This means that I rarely give my students a list of things they need to remember. More often I give students lists of resources that they might be able to use to help themselves find an answer or solution. The end result is that students are confronted with a huge amount of information, not that they're expected to memorize in minute detail, but rather learn to filter and search to find the facts and strategies necessary to complete a task. As someone who's memory is complete rubbish, but who is quite efficient at managing information, I find this a much easier task. For students who have been trained in most contemporary classrooms, they may consider this task monumental.

Essentially, I'm teaching students informatics strategies. This is certainly appropriate, given my research and teaching specialty is bioinformatics. The trouble is that it takes more explanation to describe how to use information, than to simply tell students what information they need to remember. Students then become overwhelmed at the amount of information to which they are given access, because they are trying to process the information in the same way they have managed facts given to them in previous classes. Instead of seeing information as a resource that may possibly be used, they see it as a mountain that needs to be climbed.

How do you help students think about information as a resource to reference, rather data to retain? This seems to be a logical extension of the commonly-touted plight of science educators: we need to focus less on memorization and more on critical (scientific) thinking. We need to teach processes and ways of thinking, rather than factoids and things to remember. To me, the amount of information I give my students seems like overkill, that I'm making the assignment too easy. In truth, it's actually the opposite: I'm requiring my students to use appropriate information filtering methods, and to help themselves.

In practice, I spend a lot of time trying to remind students of the bigger picture, and pointing them towards materials that are already available that may answer questions for them. This is actually an essential but overlooked skill, not just in bioinformatics. Professors often complain about students who don't read the syllabus, which suggests that we're not doing a very good job teaching students how to help themselves. I'm starting to believe this is the real key to contemporary education: helping students utilize the sheer amount of information available to explore and innovate.

12 August 2015

Bioinformatics "Purity Test": Need more questions!

I'm prepping for the new class I'm teaching this semester, Bioinformatics for Research, and I need help! It's a graduate level class that I'm designing to introduce our Master's students to the basic computational skills they will require to perform common research tasks.

Here are the general skills we'll be covering:

data/metadata organization and management
automation of tasks with the Unix shell
introductory R scripting (data parsing, simple statistics, visualizations)
executing command-line programs on remote HPC resources
version control with Git
small projects in genome assembly/annotation and phylogenetics (common questions for which previous grad students have asked for help)

As a part of motivating students to learn these skills, I'm developing a Bioinformatics "Purity Test," similar to the ones I remember being all the rage when I was in high-school and college. Students will be presented with circumstances and they will calculate the percentage of scenarios they've encountered. Here are the survey questions so far:

-----

Have you ever:

Tried to open a file and found it corrupted
Had to recreate a file because it wasn't backed up
Couldn't find a file because you forgot what it was called or where you put it
Had a computer crash lose your unsaved work
Had to redo work because you (or someone else) decided it needed to be done differently
Couldn't redo your work because you forgot how you did it
Spent hours performing the same task over and over
Read a scientific paper and wondered, “How did they make those number things happen?”
Created a graph/diagram in Microsoft Excel (or Powerpoint)

Calculate your score: 1 point for every "yes" answer, divide by total number of questions.

-----

My question for you, dear blog readers: what other questions should I include?

UPDATES:
Permanently deleted your work on accident (from @thatdnaguy)
Frequently copy/paste to reformat documents (from @PaulBlischak)
Found two versions of the same file, but didn't know how they differ (from @nmatasci)
Forgotten your abbreviations for samples, dates, etc in an analysis (from @nmatasci)
Been unable to use software because you can't find a computer that can run it (from @nmatasci)
Been unable to determine the required input format for a program (from @dwbapst)
Been unable to install software dependencies (from @BrownJosephW)
Had errors related to Windows vs Unix line endings (from @BrownJosephW)

24 July 2015

Getting ready for Botany 2015!

I'm heading off to Edmonton this weekend for Botany 2015. This conference is co-hosted by a handful of professional societies, a few for which I've maintained membership for several years (Botanical Society of America and American Society of Plant Taxonomists). I'm excited to be returning to this conference after attending Evolution for several years. My dance card is very full for this conference, but I'm looking forward to each part:

Ecological niche modeling workshop: A few of my undergraduate researchers are pursuing some projects involving species distributions and niche modeling. Pam Soltis (who was on sabbatical at NESCent when I was there) is hosting a workshop on this very topic, and I'm elated to have some time to sit down and formally learn methods using QGIS, an open-source modeling tool.
Oral presentation: I'll be presenting preliminary results from my characterization of transposable elements in Agavoideae (agave/tequila, yucca, etc). I'm really excited for this collaboration with two other early career scientists, Michael McKain and Alexandros Bousios, to finally come to fruition. You can read my presentation abstract here, and I'll be posted the slides to SlideShare after the presentation.
PLANTS mentor: The Botanical Society of America sponsors undergraduates from under-represented groups to attend their annual meeting, and I'll be acting as a mentor for one of these students during the conference. He'll be giving a talk on palm evolution that sounds great! I'm looking forward to meeting him this weekend.
Student career luncheon: I've been invited to speak at a luncheon for student conference attendees about careers in botany. This sounds like a great way to share my experiences in my path to research and teaching. My short talk will be followed by "speed dating," where students will be able to interact with professionals.
Professional society service: I have some additional obligations to serve a professional society during the conference, which makes me feel like a Very Adult Scientist but also will keep me pretty busy!

13 July 2015

Debriefing from TACC Summer Supercomputing Institute

I had such good intentions to blog more this summer during my reprieve from class planning, but that obviously didn't work out. A short blog post describing my adventure at the Texas Advanced Computing Center (TACC) Summer Supercomputing Institute (SSI) in Austin, Texas last week seemed a good way to reinforce my lessons learned (as well as provide a convenient way to ease back into a normal work schedule). Here are the highlights:

Students in the class were from a variety of disciplines, including applied math, physical science, and life science. Some folks were proficient programmers, others (like me) were familiar with running programs but had little experience writing their own compiled programs (consequently, I'm looking into taking more formal courses in CS).
The class covered a variety of topics, including parallel programming (OpenMP and MPI), debugging/optimization, data management, and data visualization. You can see some of TACC's previous course materials here (and I hope they'll add public access for materials used for our class, some of which were new!). I was especially enthralled with the session on data management, and am intrigued by exploring Hadoop for genomics.
Given that this class (as well as many of TACC's other training sessions) are geared towards folks with a background in programming, I spent some time talking to folks there about whether the resource is appropriate for entry-level folks (i.e., biologists rather than computer scientists). As it turns out, the marketing and education folks there are very interested in continuing to expand the user base, including folks who may not be very proficient at the command line. To that end, I've started a GitHub repo to develop materials for folks new to TACC (which I'll rely on heavily for my graduate-level bioinformatics class this fall). These materials might also be of use to folks who access HPC resources through XSEDE, which has a campus champions program that I'm thinking of joining.

All in all, it was an intense but intellectually profitable week. Plus, I learned to make fancy figures! I don't know what it means, but the units are PARSECS! Super cool.

11 May 2015

Self evaluation for teaching an undergraduate bioinformatics course

I wrote several lengthy posts last fall and winter that reflected on my preparations to teach a new course this spring (bioinformatics lecture and lab for undergraduate biology majors, main post here). The logistics and intellectual drain of new course prep kept me from writing much about this course as it progressed. Now that the semester is nearing its end (finals are in a week and a half), I'm compelled to report back on how the class developed.

For a quick overview, check out this poster I put together for the UT Tyler teaching symposium, highlighting my experiences implementing this new course:

Developing an undergraduate bioinformatics course from Kate Hertweck

While each bullet point below could certainly warrant a post all by itself, I'm going ahead and outlining everything while it's still fresh in my memory, laundry-list style:

General course description:

Lecture met twice a week for an hour and a half on Tuesday/Thursday morning (three hours total per week)
Lab met once a week for three hours on Thursday night.
Eight students enrolled, all biology majors, mostly pre-professional.
Assessment consisted of weekly homework submitted via GitHub for lab and Blackboard for lecture. Students also completed a class project for lecture by researching and presenting on a topic we didn't cover in class.
Only pre-requisites were two semesters of introductory biology.

What worked well:

I tried to adopt a lecture style that minimized actual lecturing (I averaged 20 slides for an hour and a half lecture). I implemented class discussions and think-pair-share type activities, including drawing a concept map at the end of semester to summarize.
For lab, my students loved R, especially working in RStudio.
I explored the use of analogies to explain complicated concepts in genomics and bioinformatics.
I used signed pre- and post-class surveys and anonymous mid-semester evaluations to gauge how students felt about the class (this is mostly how I know things were working well!).

What I'll change for next time:

Establishing the computational infrastructure remains challenging. I can't require my students to have their own (personal) machine for installing software. I have a computer lab for students to use during lecture and lab, but university policy constrains my ability to use these machines (e.g., I can't install software myself). I also had students log on to a remote HPC resource through TACC, but about half my students had problems accessing it.
Continued from the last point, I had students use Cygwin to learn Unix/shell/bash commands, but the installation on the class computers made the path names ridiculously awful to navigate. My students agreed this was their least favorite part (which is a shame, since shell scripting is my personal workhorse for research).
Students appreciated not having exams, but the workload (for them and for me grading) was a bit cumbersome (I had a lecture and lab to grade for each student almost every week). I will consider using alternative assignments (weekly online quizzes, and halving the number of lecture assignments) in the future.

With all of that in mind, I'm pretty happy with how the semester finished out, and am still excited to teach Bioinformatics for Research at the graduate level this fall.

Pages