Reading Genomes: A Companion to Bioinformatics and Genome Analysis

Author

Laurie Stevison

Preface

This is a Quarto book.

To learn more about Quarto books visit https://quarto.org/docs/books.

Why This Course? Why This Book?

This companion textbook supports BIOL 5860/6860: Bioinformatics and Genome Analysis, a hands-on survey of computational approaches to “reading” biological data. As biologists with genetics and computational biology prerequisites, you already know how to frame biological questions and wrangle basic code. This course builds directly on that foundation to demystify the “black box” of bioinformatics tools, showing you how to move from raw sequencing data to biological insight. We will also read primary scientific literature, which you are expected to be familiar with. If you need a refresher, please refer to Appendix A.

The field moves fast—back when I was in graduate school, I first learned 454 sequencing and applied it to my PhD work, which is now long obsolete. Rather than teaching specific tools that may change by graduation, this book emphasizes adaptability: the core concepts, file formats, quality control practices, and workflow thinking that transfer across platforms and projects. Labs progress from exploratory (“find a genome paper on your favorite organism”) to practical (GATK variant calling on HPC) to independent (semester-long group research projects where you discover and master new tools yourselves).

Who This Book Serves

This resource targets our diverse class:

  • Pre-med and future genetic counselors: Clinical relevance through variant interpretation, population genetics, and ethical considerations in human genomics.

  • Graduate students and researchers: Practical pipelines, reproducibility best practices, and project management skills for real analyses.

  • All students: Demystification of the “hidden curriculum”—writing reproducible workflows, applying for funding, and participating in grant review panels.

While a pre-requisite of Genetics is required, no prior genomics experience is required. Similarly, the prerequisite of BIOL 5800/6800 ensures that everyone can keep pace with Linux, R, git, and basic scripting.

How Does Computational Biology Differ from Bioinformatics

Figure 0.1 – Fields of Research within Bioinformatics
Figure 1: Venn Diagram with fields of study in the area of bioinformatics

Having taken a course in Computational Biology, it can often be confusing of the specific distinction between the various fields and how to define each.

This figure shows how bioinformatics lies at the interface of statistics, computer science, and biology, integrating methods from all three disciplines to analyze and interpret biological data. This Figure highlights how related subfields overlap: biostatistics emerges from the intersection of statistics and biology, computational biology from the overlap of computer science and biology, and data science from the overlap of statistics and computer science. At the center, bioinformatics combines these quantitative, computational, and biological approaches under one umbrella.

Importantly, how we define biology in this course is broadly emcompassing all fields of empirical research. This would include biology, biomedical research, such as pharamcology, kinesiology, and veterinary sciences.

Student Learning Objectives

Upon completion of the course, you should be able to:

  • Interpret data formats and quality of:
    • raw genome sequence data
    • sequence alignments
    • genetic variants
  • Conduct genomic analysis using best practice pipelines
  • Analyze data using R and various bioinformatics tools
  • Critically evaluate genome analysis study designs
  • Be able to describe in depth the basic analysis for a variety of data types
  • Develop proficiency in scientific communication skills and reproducibility of research

Labs and Projects as a parallel track

Throughout the semester, you will encounter two recurring threads:

  • A comparative genomics project (Appendix B) that applies the skills you are learning to a real research question.
  • A yeast genomics dataset (Labs 4, 7, 9; Appendix C) that takes you from FASTQ → BAM → VCF → IGV.

Each chapter includes “Lab connections” callouts that point to relevant labs and appendices, so you can cross‑reference concepts, commands, and datasets as you work.

Semester Schedule

Week Date Class topic Primary readings Description Assessments
1 8 Jan Course overview and state of the field Chapter 1; Readings:1;2; Appendix A Big‑picture framing of course. Start Lab 1
2 13 Jan What is a genome analysis? Chapter 2; Reading:3 Sequencing platforms Start Lab 2
15 Jan Intro to NGS data and data QC Chapter 3 Genomics File Formats
3 20 Jan Intro to semester‑long research project Appendix B Project framing and expectations.
22-Jan Lab day – Genome browsers Chapter 4 Explore different browsers and customizations Lab 3
4 27-Jan Pairwise & multiple sequence alignment Chapter 5 Intro to pairwise/MSA concepts
29-Jan Genome sequence alignment Chapter 6 Scaling alignment concepts and algorithms
5 3-Feb Project management & writing methods in bioinformatics Chapter 7 Hidden curriculum emphasis GP Step 1 Due
5-Feb Lab day – Indexing, alignment, and assessment Appendix C Introduce yeast dataset and practical application Lab 4
6 10-Feb Genome assembly guest lecture Chapter 6 Algorithms and case studies
12-Feb Lab day – BLAST on the command line Ch. 5 Sequence Alignment BLAST concepts and practice on the HPC Lab 5
7 17-Feb Various genome analysis workflows Chapter 8 Multi-omics workflows and human case studies Meet with Dr. S
19-Feb Lab day – HMMs for gene annotation Chapter 9 HMM intuition and gene‑finding Lab 6
8 24-Feb Intro to variant calling/filtering Chapter 10 From alignment to variants using GATK
26-Feb Midterm Exam In class conceptual exam Assessment of Chapters 1-7; Labs 1-5; Appendices Exam
9 3-Mar Lab day – Variant filtering and QC Appendix C Practical filtering and interpretation Lab 7; GP Step 2 Due
5-Mar Beyond GATK: LLMs and other Variant Calling Approaches Assigned Article Readings In depth comparison of variant calling methods Lab 8; Annotation Report Due
10 10-Mar Spring break – no class No assigned reading
12-Mar Spring break – no class No assigned reading
11 17-Mar Open lab – group project work Appendix B Execute Bioinformatics Plan and Setup GitHub GP Step 3; Grad Assignment
19-Mar Lab Day – Visualizing Genetic Variants Appendix C Visualizing variants in genome viewers Lab 9
12 24-Mar How Science is Funded? – Grant Funding & Review Process Chapter 11 Hidden curriculum emphasis
26-Mar Population genomics Ch. 12 Population Genomics & Scans Human evolutionary genomics GP Step 4 Due
13 31-Mar Lab day – Human Population Genetics Chapter 12 Hands‑on diversity and divergence analysis Lab 10
2-Apr Genome scans & sliding‑window analysis Chapter 12 Outlier scans, patterns of selection, and case studies Proposal Reviews Due
14 7-Apr Grant panel day 1 Chapter 11 In‑class panel and critique Mock Grant Panel
9-Apr Grant panel day 2 Chapter 11 In‑class panel and critique Mock Grant Panel
15 14-Apr Sequence motifs Chapter 13 Finding sequence motifs in genomes Panel Summary Due
16-Apr Open lab – group project work Appendix B; Chapter 14 Project‑oriented application of course content
16 21-Apr Final Exam In class conceptual exam Review of Chapters 8-14; Labs 6-10 Exam
23-Apr Open lab – group project work Appendix B; Chapter 14 Project‑oriented application of course content GP Step 5 Due
17 1-May Final group presentations (10:30–12:30) Appendix B Capstone, reflections, and next steps GP Step 6-7 Due

Key Dates

Assignment Description Due Date Points Percent course completed
Lab 1 Find and describe a genome analysis paper 15-Jan 25 3%
Lab 2 Design a Genome Project 23-Jan 25 6%
Lab 3 Genome Browsers 29-Jan 25 9%
GP Step 1 Project Overview 3-Feb 20 12%
Lab 4 Genome Alignment/Indexing 12-Feb 25 15%
Lab 5 BLAST on the command line 19-Feb 25 18%
GP Meeting Meet with Dr. Stevison to discuss research plan 20-Feb NA 18%
Lab 6 HMMs for gene annotation 26-Feb 25 21%
Midterm Exam on Chapters 1-7 26-Feb 100 33%
GP Step 2 Detailed Bioinformatics Plan 2-Mar 30 36%
Annotation Project Manual Gene Annotation Report 6-Mar 100 48%
Lab 7 Variant Calling and Filtering 10-Mar 25 52%
Lab 8 Variant Calling Discussion 12-Mar 25 55%
Grad Assignment Guide for a bioinformatics tool! 16-Mar NA 55%
GP Step 3 GitHub Repo with Prelim analysis 23-Mar 10 56%
Lab 9 IGV and Genome Viewers 26-Mar 25 59%
GP Step 4 Peer Review of Step 3 27-Mar 20 61%
Panel Review Mock Panel Reviews (due BEFORE panel) 3-Apr 20 64%
Lab 10 FST, Tajima’s D and Diversity Scans 7-Apr 25 67%
Panel Discussion Panel Discussion and Participation IN CLASS 7-Apr 30 70%
Panel Summary Detailed Summary of Panel Discussion 14-Apr 25 73%
Final Exam Exam on Chapters 8-14 21-Apr 100 85%
GP Step 5 Final GitHub Repository 24-Apr 50 92%
GP Step 6 Final Presentation 1-May 50 98%
GP Step 7 Peer Review of Step 6 1-May 20 100%

*Note: The above schedule and these deadlines are subject to change.

Final Note to Students

Bioinformatics feels like magic until you see the patterns: every pipeline starts with QC, every analysis needs reproducibility, every grant needs a clear workflow. By semester’s end, you’ll not only run these analyses but build the confidence and comfort to explain them to collaborators, reviewers, and future employers.

Let’s read some genomes together.

Dr. Laurie Stevison - Your Steward in Genomics December 2025

Acknowledgments

Portions of this book were drafted and revised with the assistance of an AI‑based writing tool. The instructor reviewed, edited, and is responsible for the final content.


References

1.
Hotaling, S., Kelley, J. L. & Frandsen, P. B. Toward a genome sequence for every animal: Where are we now? Proceedings of the National Academy of Sciences 118, e2109019118 (2021).
2.
Marks, R. A., Hotaling, S., Frandsen, P. B. & VanBuren, R. Representation and participation across 20 years of plant genome sequencing. Nature Plants 7, 1571–1578 (2021).
3.
Bogan, S. N. et al. Temperature and pressure shaped the evolution of antifreeze proteins in polar and deep sea zoarcoid fishes. Molecular Biology and Evolution 42, msaf219 (2025).