The genome club

Image: Artwork of monkey and rat surrounded by base pairs

The list of mammals with newly generated genetic maps is growing fast.

Illustration: Brian Willse

January 12, 2005

Tags: Genetics + Genomics

This is the Year of the Rat. It’s also the Year of the Dog and the Opossum. Next year will be the Year of the Cow, the Rhesus Macaque, and maybe a few other assorted creatures with hot blood and fur.

They’re all joining humans, mice, and chimpanzees in the exclusive club of mammals whose whole genome has been sequenced—giving complete and matching sets of each animal’s DNA, and offering researchers the oppor-tunity to rebuild biology and medicine from the ground up.

The technology yielding this treasure works by deciphering an organism’s genetic code, which is held in its chromosomes in DNA “base pairs” that combine adenine and thymine or cytosine and guanine (commonly referred to with the letters A, T, C, and G). Mammals have about three billion base pairs. Scientists tackle the intimidating task of reading them all in exactly the right order with an entirely counter-intuitive approach.

“If you think of the genome as an encyclopedia, you break up the whole encyclopedia and get lots of strings of letters. You then try to put the strings together like a puzzle,” notes Kerstin Lindblad-Toh, codirector of the Genome Sequencing and Analysis Program at the Broad Institute in Cambridge, Massachusetts.

Scientists separate DNA fragments, read them, then reassemble them using supercomputers and some very, very clever programming. The researchers then repeat the process over and over to make sure they’ve created the most accurate genetic map possible. While this is a Herculean task, the efficiency and volume of current sequencing techniques “have surpassed everybody’s wildest dreams,” comments George Weinstock, codirector of the Human Genome Sequencing Center at Baylor College in Houston.

The resulting mammalian DNA “parts lists” give an effect greater than the sum of their parts, says Jane Peterson, associate director of the National Human Genome Research Institute’s Division of Extramural Research.

These projects are opening the floodgates not just for understanding individual organisms, but for comparative genomics studies, revealing what genetic material is conserved over time, and how these genomes relate to each other—and to humans.

“It’s important to understand how this effort relates to human health,” Peterson emphasizes. “The sequenced assembly of genes at different evolutionary distances starts to give clues as to what’s important in the human genome and even why it’s important. This really is the major tool for improving understanding of the human genome.”

Mammals on the march

While organisms from yeast on up are being sequenced in large numbers, scientists are particularly interested in mapping mammalian genomes—especially those mammals employed as biological research models.

Given its lead role in medical studies, “the mouse was a no-brainer” as the national institute picked targets, Peterson remarks. A high-quality draft sequence was published in 2002. Since then, scientists have re-sequenced the mouse over and over and plan to release this final highly polished map in late 2005. The rat, extensively studied in behavioral research and other investigations, was another early choice. A first draft was published this spring. Last December, scientists published the first draft of the chimp genome sequence.

Another critical model organism, the dog, has been under study in two different labs. Man’s best friend shares many diseases with humans and even lives in the same environments. Dogs come in wildly diverse breeds, often have well-documented pedigrees and veterinary records, and show different susceptibilities to specific genetic diseases. Last year, The Institute for Genomic Research in Rockville, Maryland, sequenced the genome of a poodle; this July, Broad Institute and partners in the National Human Genome Research Institute completed a much more detailed draft of a boxer.

And it won’t end there. Scientists are hard at work to uncover the genomes of other mammals, including the Rhesus macaque monkey (the major primate for biomedical research) and the cow (key not just in agriculture but in studies of everything from cardiovascular disease to reproduction). Also in the works are plans to sequence the gray short-tailed South American opossum, tammar wallaby, African elephant, European common shrew, European hedgehog, guinea pig, lesser hedgehog tenrec, nine-banded armadillo, rabbit, cat, and orangutan. “One of my headaches recently has been finding these organisms,” Lindblad-Toh says wryly.

Reading a map

It’s still early days for analyzing the initial wave of mammalian genomes. The first paper on the chimpanzee whole genome, for instance, isn’t expected until late this year or in early 2005. What’s more, the mammals don’t come two by two. Analyzing male Y chromosomes can be exasperatingly difficult due to the chromosome’s intricate makeup, so most sequencing projects examine DNA samples from female animals. To fill
in some missing Ys, Whitehead collaborates with Washington University in Saint Louis. For example, the chimp Y should give insight into human male fertility, says Jennifer Hughes, a postdoctoral researcher in the lab of Whitehead Member David Page.

Even within this exclusive club of mammals, not all animals are equal. They get quite different levels of effort and expense, particularly in the number of times their complete DNA is scanned and in the effort made to fill in the trickiest gaps in the genetic code. “It takes much more time to finish a genome than to get the first 95 percent,” as Lindblad-Toh puts it.

At one extreme, the mouse will get about the same scrutiny as the human, with each DNA base read at least seven times. At the other, wallaby DNA will be scanned only twice. Scientists in the National Human Genome Research Institute may use this abbreviated approach to sequence most of the other mammals on their to-do list. Some researchers complain the approach will give insufficient data, muddying the waters for interpretation. The pros and cons still are being kicked around, Peterson responds. As they start to decode the draft sequences, she adds, scientists will get a better grip on what will be most efficient.

Doggone important

It’s no surprise to learn that investigators hail these new cornucopias of data. This year’s dog draft is an “enormous step,” says Gustavo Aguirre, professor of medical genetics and ophthalmology at the University of Pennsylvania in Philadelphia. “It allows us to now do work that we could only have hoped to do several years ago.”

Previously, when Aguirre’s lab isolated a protein of interest, the next step was to clone the gene in the lab. Poring through a huge library of physical complementary DNA “might take as long as six months to a year,” Aguirre says. Now, the researchers usually can find the gene target by crunching through the dog sequence database.

“We’re eternally grateful for it,” says Aguirre, “but we’re not satisfied.”

Fortunately, the advances in sequencing technologies are keeping pace with scientists’ demands. “Now a single center can finish a mammalian genome in a year,” notes Baylor’s Weinstock. “Almost everything of significant value will be sequenced at some level in the next few years.”

Written by Eric Bender. This article first appeared in the fall 2004 issue of Paradigm magazine.


Communications and Public Affairs
Phone: 617-258-6851

Whitehead Institute is a world-renowned non-profit research institution dedicated to improving human health through basic biomedical research.
Wholly independent in its governance, finances, and research programs, Whitehead shares a close affiliation with Massachusetts Institute of Technology
through its faculty, who hold joint MIT appointments.

© Whitehead Institute for Biomedical Research              455 Main Street          Cambridge, MA 02142