16 Lab02: Design a Genome Project
16.1 Lab 2
In Chapter 1, we introduced three broad perspectives on bioinformatics and genomics (cell, organism, tree of life)1. Later in the textbook, Pevsner expands these 3 perspectives into five basic perspectives on genome sequencing.
For this lab, you will use these 5 perspectives, along with your new understanding of sequencing technologies from Chapter 2, to sketch a 1–2 page narrative for a hypothetical genome project1–3. More information on Grant writing and review will be covered later in Chapter 5 to give you plenty of time to work on this lab assignment outside of class while we move to other topics in the course.
The five perspectives on genomics discussed in Pevsner Chapter 15 can be summarized as:
- Catalog genomic information: basic genome features (size, chromosomes, GC content, repeats, gene counts), based on sequencing, assembly, and annotation.
- Catalog comparative genomic information: whole‑genome comparisons to related species, orthologs, divergence times, lateral gene transfer, using whole‑genome alignments and genome browsers.
- Biological principles: how genome structure and variation underlie development, metabolism, behavior, and evolutionary processes (e.g., genome size evolution, polyploidization, gene birth/death).
- Human disease relevance: how genomes relate to disease in humans or plants, including SNPs, linkage and association studies, and host–pathogen interactions.
- Bioinformatics aspects: databases, software, and visualization tools that make genome analysis possible.
These map naturally onto the three perspectives in Chapter 1: cell‑scale questions, organism‑level questions, and tree‑of‑life/comparative questions1.
16.1.1 The Assignment
Write a 1–2 page narrative of the genome analysis project you have designed. To guide you, begin by reflecting on the following questions:
- If you could sequence the genomes of 100 individuals from any species, which species would you choose?
- What hypotheses would you test, how would you perform data analyses, and what resources would you require in terms of hardware, software, and collaborators?
- What ethical issues might arise in sequencing these genomes?
Specifically, think about:
- Which sequencing platforms and coverage would you choose (Chapter 2)?
- Which file formats and QC steps would your workflow rely on (Chapter 3)?1–3
For your design narrative:
- Choose a species and justify why 100 genomes from this species would be informative. - State one or two main hypotheses, or guiding questions if discovery-based.
- Identify which of the five perspectives (and which of the three Chapter 1 perspectives) your project emphasizes.
- Outline your sequencing strategy (platforms, approximate coverage, sample types) using concepts from Chapter 2.
- Sketch the main analysis steps (from raw FASTQ through alignment/assembly and downstream analyses) using ideas from Chapters 1 and 3.
- Briefly note any major ethical issues if humans or other sensitive species are involved.
The focus of your design narrative should be on motivation and approach. You may refer to assigned readings (e.g.,4;5;6) in your narrative, but you do not need a formal reference list.