In recent years, new and explosive developments in sequencing technology have allowed determination of the full genome complement of many organisms (Genomes OnLine Database: http://wit.integratedgenomics.com/GOLD/. Even if most of these are microorganisms, the ultimate challenge of the sequencing the human genome was finally fulfilled in 2001 (Lander et al., 2001; Venter et al., 2001).
Completion of the sequencing is just the first step in mining the treasures of the genome. Subsequent steps focus on determining the genes (both their location and structure) and on annotating putative functions for the proteins that these genes encode.
Whilst prediction of genes is a difficult problem to tackle for eukaryote genomes, the tools are steadily achieving better performances, and the expectation is that sequences already obtained for eukaryotes will be able to vastly improve the accuracy of the tools.
Annotation is also a difficult problem, and still relies mainly on homology searches. Nevertheless, new and more accurate systems for assessing relationships and detecting remote homologies have been developed. (Altschul and Koonin, 1998; Eddy, 1998). Comparative genomics may be a useful approach for predicting new functions, since it relies on genomic features (gene order conservation, co-occurrence of genes in genomes) and not on homologies.
Analysis of genomes has allowed us to gain many insights into the lifestyles of organisms, especially for prokaryotes. Prokaryotic genomes have turned out to be very flexible. The amount of lateral gene transfer between organisms is much higher than expected, thus offering a new vision of the evolution of prokaryotes and compromising the classical approach of assessing relationships between organisms by studying gene phylogenies (Doolittle, 1999). Several mechanisms for adaptation to the environment have also been determined, one of the most striking examples being the impressive machinery devoted to repair damage in DNA of the bacteria Deinococcus radiodurans, allowing the bacteria to grow under high levels of gamma radiation (White et al., 1999). High levels of unique, unknown genes have been detected in several bacteria, representing up to 40% of the genome in some of these (Iliopoulos et al., 2000), which indicates high plasticity and the ability to generate new genes and functions. It is likely that these genes hold the key to adaptation to different environments.
On the clinical side, sequencing of individual genomes has allowed identification of new candidate targets for antibiotics and vaccines. Mechanisms for generating antigenic variation have also been identified, allowing improvement in the design of new drugs (Fraser et al., 2000).
Regarding the human genome, the availability of the genome sequence allows the creation of dense physical maps of markers. These maps are of great help in the quest for genes responsible for diseases, since they allow the determination of regions where the candidate gene for the disease is likely to be located. Among these markers, SNPs (Single Nucleotide Polymorphisms, single-nucleotide variations in the genome; http://snp.cshl.org) are especially useful. In some cases these variations themselves are found to be the cause of diseases, or to shape individual responses to drugs. This is known as pharmacogenomics, where the goal is to find correlations between therapeutic responses to drugs and the genetic profiles of patients. (Collins, 1999).
Comparison of the information contained in different genomes is known as comparative genomics. Comparative genomics multiplies the value of the information gathered for genomes, allowing analysis of the organisms from a global perspective, something that was unthinkable only a few years ago. Even if we are still at the starting point of these comparisons, very relevant information has already been extracted. Some of the present and future uses of comparative genomics are:
We should not forget that having complete genomes is also a crucial step for the development of other technologies. Functional studies using microarrays, or proteomics using mass spectrometry, are greatly impeded when genomic sequences are not available. Genomics and sequencing hold the key to the revolution in the study of living systems.
© 2002 ALMA Bioinformatics, SL. All rights reserved.