Princess Tradescantia: 2015

30 September 2015

Classroom informatics: Managing students, information, and communication.

One of my biggest challenges as an educator is helping students manage information. I don't believe it's my job to impart information to students, but to help them learn to think about information and integrate it into their knowledge structures more effectively. Example: I'm not trying to teach my students how genome assembly algorithms work, I'm trying to teach them how to use the instruction manuals that come with assembly software to analyze their data. Layperson example: I'm trying to teach them how to use a recipe book, rather than memorize all the recipes.

This means that I rarely give my students a list of things they need to remember. More often I give students lists of resources that they might be able to use to help themselves find an answer or solution. The end result is that students are confronted with a huge amount of information, not that they're expected to memorize in minute detail, but rather learn to filter and search to find the facts and strategies necessary to complete a task. As someone who's memory is complete rubbish, but who is quite efficient at managing information, I find this a much easier task. For students who have been trained in most contemporary classrooms, they may consider this task monumental.

Essentially, I'm teaching students informatics strategies. This is certainly appropriate, given my research and teaching specialty is bioinformatics. The trouble is that it takes more explanation to describe how to use information, than to simply tell students what information they need to remember. Students then become overwhelmed at the amount of information to which they are given access, because they are trying to process the information in the same way they have managed facts given to them in previous classes. Instead of seeing information as a resource that may possibly be used, they see it as a mountain that needs to be climbed.

How do you help students think about information as a resource to reference, rather data to retain? This seems to be a logical extension of the commonly-touted plight of science educators: we need to focus less on memorization and more on critical (scientific) thinking. We need to teach processes and ways of thinking, rather than factoids and things to remember. To me, the amount of information I give my students seems like overkill, that I'm making the assignment too easy. In truth, it's actually the opposite: I'm requiring my students to use appropriate information filtering methods, and to help themselves.

In practice, I spend a lot of time trying to remind students of the bigger picture, and pointing them towards materials that are already available that may answer questions for them. This is actually an essential but overlooked skill, not just in bioinformatics. Professors often complain about students who don't read the syllabus, which suggests that we're not doing a very good job teaching students how to help themselves. I'm starting to believe this is the real key to contemporary education: helping students utilize the sheer amount of information available to explore and innovate.

12 August 2015

Bioinformatics "Purity Test": Need more questions!

I'm prepping for the new class I'm teaching this semester, Bioinformatics for Research, and I need help! It's a graduate level class that I'm designing to introduce our Master's students to the basic computational skills they will require to perform common research tasks.

Here are the general skills we'll be covering:

data/metadata organization and management
automation of tasks with the Unix shell
introductory R scripting (data parsing, simple statistics, visualizations)
executing command-line programs on remote HPC resources
version control with Git
small projects in genome assembly/annotation and phylogenetics (common questions for which previous grad students have asked for help)

As a part of motivating students to learn these skills, I'm developing a Bioinformatics "Purity Test," similar to the ones I remember being all the rage when I was in high-school and college. Students will be presented with circumstances and they will calculate the percentage of scenarios they've encountered. Here are the survey questions so far:

-----

Have you ever:

Tried to open a file and found it corrupted
Had to recreate a file because it wasn't backed up
Couldn't find a file because you forgot what it was called or where you put it
Had a computer crash lose your unsaved work
Had to redo work because you (or someone else) decided it needed to be done differently
Couldn't redo your work because you forgot how you did it
Spent hours performing the same task over and over
Read a scientific paper and wondered, “How did they make those number things happen?”
Created a graph/diagram in Microsoft Excel (or Powerpoint)

Calculate your score: 1 point for every "yes" answer, divide by total number of questions.

-----

My question for you, dear blog readers: what other questions should I include?

UPDATES:
Permanently deleted your work on accident (from @thatdnaguy)
Frequently copy/paste to reformat documents (from @PaulBlischak)
Found two versions of the same file, but didn't know how they differ (from @nmatasci)
Forgotten your abbreviations for samples, dates, etc in an analysis (from @nmatasci)
Been unable to use software because you can't find a computer that can run it (from @nmatasci)
Been unable to determine the required input format for a program (from @dwbapst)
Been unable to install software dependencies (from @BrownJosephW)
Had errors related to Windows vs Unix line endings (from @BrownJosephW)

24 July 2015

Getting ready for Botany 2015!

I'm heading off to Edmonton this weekend for Botany 2015. This conference is co-hosted by a handful of professional societies, a few for which I've maintained membership for several years (Botanical Society of America and American Society of Plant Taxonomists). I'm excited to be returning to this conference after attending Evolution for several years. My dance card is very full for this conference, but I'm looking forward to each part:

Ecological niche modeling workshop: A few of my undergraduate researchers are pursuing some projects involving species distributions and niche modeling. Pam Soltis (who was on sabbatical at NESCent when I was there) is hosting a workshop on this very topic, and I'm elated to have some time to sit down and formally learn methods using QGIS, an open-source modeling tool.
Oral presentation: I'll be presenting preliminary results from my characterization of transposable elements in Agavoideae (agave/tequila, yucca, etc). I'm really excited for this collaboration with two other early career scientists, Michael McKain and Alexandros Bousios, to finally come to fruition. You can read my presentation abstract here, and I'll be posted the slides to SlideShare after the presentation.
PLANTS mentor: The Botanical Society of America sponsors undergraduates from under-represented groups to attend their annual meeting, and I'll be acting as a mentor for one of these students during the conference. He'll be giving a talk on palm evolution that sounds great! I'm looking forward to meeting him this weekend.
Student career luncheon: I've been invited to speak at a luncheon for student conference attendees about careers in botany. This sounds like a great way to share my experiences in my path to research and teaching. My short talk will be followed by "speed dating," where students will be able to interact with professionals.
Professional society service: I have some additional obligations to serve a professional society during the conference, which makes me feel like a Very Adult Scientist but also will keep me pretty busy!

13 July 2015

Debriefing from TACC Summer Supercomputing Institute

I had such good intentions to blog more this summer during my reprieve from class planning, but that obviously didn't work out. A short blog post describing my adventure at the Texas Advanced Computing Center (TACC) Summer Supercomputing Institute (SSI) in Austin, Texas last week seemed a good way to reinforce my lessons learned (as well as provide a convenient way to ease back into a normal work schedule). Here are the highlights:

Students in the class were from a variety of disciplines, including applied math, physical science, and life science. Some folks were proficient programmers, others (like me) were familiar with running programs but had little experience writing their own compiled programs (consequently, I'm looking into taking more formal courses in CS).
The class covered a variety of topics, including parallel programming (OpenMP and MPI), debugging/optimization, data management, and data visualization. You can see some of TACC's previous course materials here (and I hope they'll add public access for materials used for our class, some of which were new!). I was especially enthralled with the session on data management, and am intrigued by exploring Hadoop for genomics.
Given that this class (as well as many of TACC's other training sessions) are geared towards folks with a background in programming, I spent some time talking to folks there about whether the resource is appropriate for entry-level folks (i.e., biologists rather than computer scientists). As it turns out, the marketing and education folks there are very interested in continuing to expand the user base, including folks who may not be very proficient at the command line. To that end, I've started a GitHub repo to develop materials for folks new to TACC (which I'll rely on heavily for my graduate-level bioinformatics class this fall). These materials might also be of use to folks who access HPC resources through XSEDE, which has a campus champions program that I'm thinking of joining.

All in all, it was an intense but intellectually profitable week. Plus, I learned to make fancy figures! I don't know what it means, but the units are PARSECS! Super cool.

11 May 2015

Self evaluation for teaching an undergraduate bioinformatics course

I wrote several lengthy posts last fall and winter that reflected on my preparations to teach a new course this spring (bioinformatics lecture and lab for undergraduate biology majors, main post here). The logistics and intellectual drain of new course prep kept me from writing much about this course as it progressed. Now that the semester is nearing its end (finals are in a week and a half), I'm compelled to report back on how the class developed.

For a quick overview, check out this poster I put together for the UT Tyler teaching symposium, highlighting my experiences implementing this new course:

Developing an undergraduate bioinformatics course from Kate Hertweck

While each bullet point below could certainly warrant a post all by itself, I'm going ahead and outlining everything while it's still fresh in my memory, laundry-list style:

General course description:

Lecture met twice a week for an hour and a half on Tuesday/Thursday morning (three hours total per week)
Lab met once a week for three hours on Thursday night.
Eight students enrolled, all biology majors, mostly pre-professional.
Assessment consisted of weekly homework submitted via GitHub for lab and Blackboard for lecture. Students also completed a class project for lecture by researching and presenting on a topic we didn't cover in class.
Only pre-requisites were two semesters of introductory biology.

What worked well:

I tried to adopt a lecture style that minimized actual lecturing (I averaged 20 slides for an hour and a half lecture). I implemented class discussions and think-pair-share type activities, including drawing a concept map at the end of semester to summarize.
For lab, my students loved R, especially working in RStudio.
I explored the use of analogies to explain complicated concepts in genomics and bioinformatics.
I used signed pre- and post-class surveys and anonymous mid-semester evaluations to gauge how students felt about the class (this is mostly how I know things were working well!).

What I'll change for next time:

Establishing the computational infrastructure remains challenging. I can't require my students to have their own (personal) machine for installing software. I have a computer lab for students to use during lecture and lab, but university policy constrains my ability to use these machines (e.g., I can't install software myself). I also had students log on to a remote HPC resource through TACC, but about half my students had problems accessing it.
Continued from the last point, I had students use Cygwin to learn Unix/shell/bash commands, but the installation on the class computers made the path names ridiculously awful to navigate. My students agreed this was their least favorite part (which is a shame, since shell scripting is my personal workhorse for research).
Students appreciated not having exams, but the workload (for them and for me grading) was a bit cumbersome (I had a lecture and lab to grade for each student almost every week). I will consider using alternative assignments (weekly online quizzes, and halving the number of lecture assignments) in the future.

With all of that in mind, I'm pretty happy with how the semester finished out, and am still excited to teach Bioinformatics for Research at the graduate level this fall.

24 April 2015

Under-appreciated Texas wildflowers

In my ongoing quest to balance the computational aspect of my work, I've been working with the East Texas Master Naturalists to continue developing their herbarium collection of local, native plants. It's been a great synergistic relationship: they teach me about native species, and I've been setting up their herbarium database as a series of spreadsheets and documents in Google Drive.

We met this morning out at The Nature Center, a Texas Parks & Wildlife facility that houses the herbarium and has meeting space. This very wet spring has led to an abundance of iconic Texas wildflowers, like bluebonnets and primroses. Much to my delight, I also found some of my favorite spiderworts growing nearby. It's been several years since I did serious plant collections, but I still managed to spot them on a roadside on my way from campus this morning. I apparently haven't lost my skill at picking out the flower color and growth habit from the multitudes of flowers blooming right now. This beauty (picture to right) is a great example of Tradescantia ohiensis, one of the very widespread species of erect Tradescantia. Each individual flower only lasts a day before deliquescing (melting), but the plant will keep blooming until next fall, as long as it doesn't get fried in the Texas heat.

While visiting with my old friend T. ohiensis, I took the opportunity to scratch another itch that's been in my mind for several weeks now. I've been absolutely awestruck by the thistles growing on the roadsides this spring. The picture (to the left) doesn't do it justice, but these plants are almost five feet tall, and covered in menacing, spiky leaves. There appear to be several species of Cirsium here in Texas, and I'm looking forward to seeing more examples of these monsters.

While perhaps not as charismatic as other wildflowers, these two examples get a thumbs up from me as particularly cool plant species.

30 March 2015

Data Carpentry hackathon for genomics

I'm pleased to report back from the Data Carpentry (DC) genomics hackathon, which I attended last week with ~26 other folks at Cold Spring Harbor Labs in New York. The goal of this meeting was to develop modules for a DC workshop focused on analysis of next-generation sequencing and other genomic data. The original DC lessons were designed for a very general audience using ecological data, so we were tasked with outlining, organizing, and starting to write materials for a two-day workshop specifically for genomics.

Each of the following points could be thoroughly explored in their own post, but here are a few highlights from this meeting:

Attendees were a great mix of biology researchers and educators from a range of institutions (research intensive, primarily undergraduate), computer scientists, and assessment specialists. This meant we were pulling from a broad range of skills, and incorporating multiple perspectives in planning.

The length of the meeting (2.5 days) allowed us to get a running start on actually developing materials (GitHub repos here prefaced with "genomics"). In addition to "intro to Unix" material that would largely remain constant from the original DC lesson, we started developing six modules that cover a general genomics workflow: setting up a project, getting to know your data, data wrangling (QC and alignment), analysis and visualization, and cloud/HPC. I personally found it remarkable and gratifying to see so much attention paid to the initial preparatory stages of a project.

Numerous folks emphasized the importance of understanding your target audience. Some of these discussions related to the assumed skill level (or pre-requisites) for workshop attendees. Other conversations related the need to accommodate particular cultural or gender issues while teaching to make the learning environment comfortable for everyone.

What makes DC workshops special and distinct from other courses? In developing the modules described above, we talked about the distinction between Software Carpentry and Data Carpentry, as well as if and when instructors should be expected to teach about biology (rather than computing/data analysis). The general consensus is that the focus of DC on telling a narrative about data means we should be emphasizing "best practices" for improving productivity and reproducibility, rather than advocating for particular types of analyses. That being said, there is ample opportunity during lessons to model rigorous methods, as well as provide extra resources for students to improve their skills in experimental design and statistical reasoning.

A particularly challenging aspect of developing such resources is assessment of student improvement following a workshop. It's challenging to evaluate how much students will retain after such a short period of time (2 days), as well as whether these skills will transfer over to their research methods. One breakout group focused on developing a strategy for surveying students prior to and directly following a workshop to measure immediate learning, as well as 3-6 months following to measure long-term gains. We targeted question formats that would address student learning in terms of the following areas: declarative knowledge (Can you recall this fact?), skills (Can you write this code?), and attitude (Will you use this skill?).

I was initially on the fence about whether to apply for the hackathon. I'm a first year professor wallowing in the murky depths of teaching a new course, and my overtaxed brain was whispering that maybe it would cause too much stress. My gut, thankfully, doesn't always listen to my brain. Moreover, the class I'm piloting this semester is an undergraduate bioinformatics class focused on genomics, so the DC hackathon fit naturally into my preparation for the last few weeks of the semester. I'm looking forward to reporting back soon about my semester-long class is wrapping up, as well as my first teaching experience for Software Carpentry workshop in a few weeks.

Pages