“SOMs” Help Analyze Thousands of Genes

March 12, 1999

Tags: Genetics + Genomics

CAMBRIDGE, Mass. — Using a sophisticated computer algorithm, a team of scientists at the Whitehead Institute has designed a new technique to analyze the massive amounts of data generated by DNA microarrays, also known as DNA chips. This technique will help scientists decipher how our 100,000 genes work together to keep us healthy and how diseases result when they fail.

"DNA arrays have revolutionized DNA analysis by allowing us to observe the activities of thousands of genes simultaneously," says Todd Golub, research scientist at the Whitehead/MIT Center for Genome Research. "But until now, it's been really difficult to interpret this extraordinarily complex raw data. Our technique is among the first in a new generation of tools that will speed up the analysis of the enormous amounts of genetic data emerging from laboratories worldwide."

Dr. Golub and his colleagues at the Whitehead Institute, Dana-Farber Cancer Institute, Dartmouth Medical School, and the Massachusetts Institute of Technology, report their technique in the March 16 issue of the Proceedings of the National Academy of Sciences. The Whitehead/MIT Center for Genome Research is one of the flagship centers of the U.S. Human Genome Project, the effort to determine the 3 billion letters that make up the human blueprint.

"The core of the technique is an algorithm, called a self-organizing map (SOM), that takes advantage of the fact that many genes in a cell behave similarly," explains Pablo Tamayo, the lead author of the paper and research scientist at the Whitehead Institute. "Instead of having 2,000 individual genes, all doing different things, you might have 25 groups of genes doing similar things."

Tamayo compares the final product of the SOM to an executive summary for CEOs. Rather than having to read every page of a 1,000-page report, CEOs can get an overview of the report by simply reading the summary. "It's impossible to visually inspect every gene," he says. "This method produces a quick scan of what's going on with thousands of genes."

The researchers created a computer package called GENECLUSTER, which organizes the activities of thousands of genes in only minutes. To test GENECLUSTER, they analyzed the genes expressed in several models of leukemia cell growth. In many cases, the algorithm identified genes known to be important in this process, but occasionally it also identified unexpected genes. This finding suggests that the method might be useful in helping to identify the function of unknown genes. "Because genes that have similar functions are generally expressed in the same basic pattern, knowing the expression pattern of a gene could help identify its function," explains Tamayo.

SOMs have been used widely in data mining, particularly for large or messy datasets like stock market data, but this study is the first to apply them to gene analysis.

The study was supported in part by consortium of three companies-Bristol-Myers Squibb Company; Affymetrix, Inc.; and Millennium Pharmaceuticals Inc.-that formed a unique corporate partnership to fund a five-year research program in functional genomics at the Whitehead/MIT Genome Center. It was also supported by grants from the National Institutes of Health to the Lander and Dmitrovsky labs.

The title of the paper is: "Interpreting patterns of gene expression with self-organzing maps: Methods and applications to hematopoietic differentiation." The authors are:

Pablo Tamayo, Whitehead Institute for Biomedical Research, Cambridge, MA

Donna Slonim, Whitehead Institute for Biomedical Research, Cambridge, MA

Jill Mesirov, Whitehead Institute for Biomedical Research, Cambridge, MA

Qing Zhu, Dana-Farber Cancer Institute, Boston, MA

Sutisak Kitareewan, Dartmouth Medical School, Hanover, NH

Ethan Dmitrovsky, Dartmouth Medical School, Hanover, NH

Eric Lander, Whitehead Institute for Biomedical Research, Cambridge, MA, and Massachusetts Institute of Technology, Cambridge, MA

Todd Golub, Whitehead Institute for Biomedical Research, Cambridge, MA, and Dana-Farber Cancer Institute, Boston, MA


Communications and Public Affairs
Phone: 617-258-6851
Email: newsroom@wi.mit.edu

Whitehead Institute is a world-renowned non-profit research institution dedicated to improving human health through basic biomedical research.
Wholly independent in its governance, finances, and research programs, Whitehead shares a close affiliation with Massachusetts Institute of Technology
through its faculty, who hold joint MIT appointments.

© Whitehead Institute for Biomedical Research              455 Main Street          Cambridge, MA 02142