NCGR and SANBI Launch New Public Database of Expressed Human Genome

SANTA FE, N.M., March 12 -- The National Center for Genome Resources and the South African National Bioinformatics Institute today announced the launch of the Sequence Tag Alignment and Consensus Knowledgebase, or STACK, a public database of gene sequences expressed in the human genome. STACK is critical to researchers who need a unified view of the genes being discovered in the human genome.

The database satisfies a growing need to make the increasingly large volume of gene fragment data more easily and efficiently useful in the analysis of human genes. Only a small fraction of the more than 50,000 human genes have been completely sequenced, and the majority of existing gene data are available primarily as gene fragments. STACK provides an independent method for processing the gene fragments, detecting errors and creating carefully joined sets of consensus sequences for each gene sequence.

STACK is unique among DNA sequence databases. It features expressed gene sequences organized according to tissue and provides a comprehensive representation of each gene with alignments of its expressed fragments. The algorithms used to generate the database include efficient error compensation methods that can create longer, more accurate consensus sequences.

Scientists at the South African institute, SANBI, together with database and biology experts at NCGR, are making STACK publicly available via NCGR’s Genome Sequence DataBase, hosted in Santa Fe. Custom computer tools for accessing, viewing and analyzing the STACK data are included.

"This new data set enriches the context in which researchers can compare newly discovered sequences. For instance, STACK potentially can be used to differentiate between different members of the same gene family or between alternative products of one gene," GSDB Manager Carol Harger said. "NCGR is enthusiastic about being the only public provider of this resource, and we look forward to working with SANBI to further enhance the data set and its functionality."

Winston Hide, director of SANBI, and postdoctoral research associate Robert Miller created a novel way to process a database of publicly available human Expressed Sequence Tags, or ESTs. SANBI scientists devised portable tools and used a system developed at the institute running on a powerful Silicon Graphics® Origin2000™ multiprocessor server to make alignments and consensi from the individual sequences, and to cluster sequences. This new data set represents an easily distributable core information resource for gene discovery.

"Because STACK provides an independent resource for the analysis of disease gene candidates, alignments and consensus sequences, it can be easily integrated to solve questions about gene expression, gene hunting and polymorphisms," Hide said. "STACK has been a truly international effort, yet it has used African technology to provide information on genes that, after all, originally came from Africa."

"STACK adds an exciting tool to the demanding process of building a better understanding of genes and their relationship to disease," said Juli Nash, biology market manager at Silicon Graphics. "EST data require specially adapted tools, like STACK, and Silicon Graphics is pleased to contribute to this important project, which will benefit the global scientific community."

In addition, a commercial version of the data is available from Pangea Systems. "The type of data that STACK provides will allow researchers to finally get real value from EST data," said Charlene Son, director of product marketing at Pangea Systems. "We’re looking forward to continuing our work with SANBI to further extend the value of STACK."

Specialized access to STACK with advanced query capabilities is available through the GSDB Web site, [url no longer available] . The sequences also may be searched at the SANBI Web site, http://www.sanbi.ac.za/stack.

About the collaborators
The National Center for Genome Resources, a nonprofit organization, develops genetic services and education for science and society. The center serves the academic and commercial research communities by providing customized bioinformatics services to support genetic research.

The nonprofit South African National Bioinformatics Institute at the University of the Western Cape, near Cape Town, is devoted to providing training, education and biotechnology for redevelopment of South Africa and research into gene discovery for the African community.

Pangea Systems’ sophisticated, intuitive software applications enhance the ability of pharmaceutical and biotech companies to discover the molecular mechanism of disease. These applications integrate data with analysis and visualization tools for biological and chemical information, effectively simplifying and accelerating the drug discovery process.

Silicon Graphics, Inc. is a leading supplier of high-performance interactive computing systems. The company offers the broadest range of products in the industry -- from low-end desktop workstations to servers and high-end Cray® supercomputers.

Silicon Graphics is a registered trademark, and Origin and Origin2000 are trademarks of Silicon Graphics, Inc. Cray is a registered trademark of Cray Research, Inc., a wholly owned subsidiary of Silicon Graphics, Inc.