In recent years, new and explosive developments in sequencing technology have allowed determination of the full genome complement of many organisms (Genomes OnLine Database: Even if most of these are microorganisms, the ultimate challenge of the sequencing the human genome was finally fulfilled in 2001 (Lander et al., 2001; Venter et al., 2001).

Completion of the sequencing is just the first step in mining the treasures of the genome. Subsequent steps focus on determining the genes (both their location and structure) and on annotating putative functions for the proteins that these genes encode.

Whilst prediction of genes is a difficult problem to tackle for eukaryote genomes, the tools are steadily achieving better performances, and the expectation is that sequences already obtained for eukaryotes will be able to vastly improve the accuracy of the tools.

Annotation is also a difficult problem, and still relies mainly on homology searches. Nevertheless, new and more accurate systems for assessing relationships and detecting remote homologies have been developed. (Altschul and Koonin, 1998; Eddy, 1998). Comparative genomics may be a useful approach for predicting new functions, since it relies on genomic features (gene order conservation, co-occurrence of genes in genomes) and not on homologies.

Individual genomes

Analysis of genomes has allowed us to gain many insights into the lifestyles of organisms, especially for prokaryotes. Prokaryotic genomes have turned out to be very flexible. The amount of lateral gene transfer between organisms is much higher than expected, thus offering a new vision of the evolution of prokaryotes and compromising the classical approach of assessing relationships between organisms by studying gene phylogenies (Doolittle, 1999). Several mechanisms for adaptation to the environment have also been determined, one of the most striking examples being the impressive machinery devoted to repair damage in DNA of the bacteria Deinococcus radiodurans, allowing the bacteria to grow under high levels of gamma radiation (White et al., 1999). High levels of unique, unknown genes have been detected in several bacteria, representing up to 40% of the genome in some of these (Iliopoulos et al., 2000), which indicates high plasticity and the ability to generate new genes and functions. It is likely that these genes hold the key to adaptation to different environments.

On the clinical side, sequencing of individual genomes has allowed identification of new candidate targets for antibiotics and vaccines. Mechanisms for generating antigenic variation have also been identified, allowing improvement in the design of new drugs (Fraser et al., 2000).

Regarding the human genome, the availability of the genome sequence allows the creation of dense physical maps of markers. These maps are of great help in the quest for genes responsible for diseases, since they allow the determination of regions where the candidate gene for the disease is likely to be located. Among these markers, SNPs (Single Nucleotide Polymorphisms, single-nucleotide variations in the genome; are especially useful. In some cases these variations themselves are found to be the cause of diseases, or to shape individual responses to drugs. This is known as pharmacogenomics, where the goal is to find correlations between therapeutic responses to drugs and the genetic profiles of patients. (Collins, 1999).

Comparative genomics

Comparison of the information contained in different genomes is known as comparative genomics. Comparative genomics multiplies the value of the information gathered for genomes, allowing analysis of the organisms from a global perspective, something that was unthinkable only a few years ago. Even if we are still at the starting point of these comparisons, very relevant information has already been extracted. Some of the present and future uses of comparative genomics are:

  • Aiding annotation of genomes by finding new genes, intron/exon boundaries and putative functions, based on similarity to other genomes. Some initiatives to annotate the human genome based on mouse data are already underway.

  • For nearly every human gene, a mouse homologue has been elucidated. In addition, more than 70 percent of human genes have a homologue in the roundworm C.elegans. Therefore, the study of disease-linked genes is greatly facilitated by taking advantage of these model organisms (O'Brien et al., 1999).

  • New ways for predicting function and interactions between proteins, using genomic information. These methods do not make use of homology relationships between proteins. Instead, they use genomic properties to infer relationships between genes (and therefore between the proteins encoded by them). Properties such as synteny (conservation of gene order), co-occurrence in genomes or gene fusion are useful tools for making predictions (Huynen et al., 2000; Tamames, 2001).

  • Finding specific genes in microbes, susceptible to becoming new targets for drugs. A complementary approach is finding common genes, the minimal genome, which are assumed to be essential for the development of the organism.

  • Studying the structure of pathways in different organisms to identify possible variants and/or missing genes.

  • Studying the evolution of organisms in terms of the evolution of their genomes. This can become an alternative to the use of classical phylogenetic methods for assessing the relationship between species (Tamames, 2001).

  • Studying non-sequenced genomes via comparison with sequenced genomes using microarray technology (Akman and Aksoy, 2001).


We should not forget that having complete genomes is also a crucial step for the development of other technologies. Functional studies using microarrays, or proteomics using mass spectrometry, are greatly impeded when genomic sequences are not available. Genomics and sequencing hold the key to the revolution in the study of living systems.


  • Akman, L., and Aksoy, S. (2001). A novel application of gene arrays: Escherichia coli array provides insight into the biology of the obligate endosymbiont of tsetse flies. Proc Natl Acad Sci USA 98, 7546-7551.

  • Altschul, S. F., and Koonin, E. V. (1998). Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci 23, 444-447.

  • Collins, F. S. (1999). Shattuck lecture--medical and societal consequences of the Human Genome Project. N Engl J Med 341, 28-37.

  • Doolittle, W. F. (1999). Phylogenetic classification and the universal tree. Science 284, 2124-2129.

  • Eddy, S. R. (1998). Profile hidden Markov models. Bioinformatics 14, 755-763.

  • Fraser, C. M., Eisen, J. A., and Salzberg, S. L. (2000). Microbial genome sequencing. Nature 406, 799-803.

  • Huynen, M. A., Snel, B., Lathe, W. r., and Bork, P. (2000). Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10, 1204-1210.

  • Iliopoulos, I., Tsoka, S., Andrade, M. A., Janssen, P., Audit, B., Tramontano, A., Valencia, A., Leroy, C., Sander, C., and Ouzounis, C. A. (2000). Genome sequences and great expectations. Genome Biol 2, interactions0001.0001-0001.0003.

  • Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.

  • O'Brien, S. J., Menotti-Raymond, M., Murphy, W. J., Nash, W. G., Wienberg, J., Stanyon, R., Copeland, N. G., Jenkins, N. A., Womack, J. E., and Marshall Graves, J. A. (1999). The promise of comparative genomics in mammals. Science 286, 458-462, 479-481.

  • Tamames, J. (2001). Evolution of gene order conservation in prokaryotes. Genome Biol 2, research0020.0021-0020.0011.

  • Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001). The sequence of the human genome. Science 291, 1304-1351.

  • White, O., Eisen, J. A., Heidelberg, J. F., Hickey, E. K., Peterson, J. D., Dodson, R. J., Haft, D. H., Gwinn, M. L., Nelson, W. C., Richardson, D. L., et al. (1999). Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science 286, 1571-1577.

2002 ALMA Bioinformatics, SL. All rights reserved.