Genomics programs are providing the scientific community with unprecedented opportunities to study the genetic information of living organisms. The North Carolina State University tobacco genome initiative (TGI, http://www.tobaccogenome.org) was started in 2002 in cooperation with Philip Morris USA to gather genetic information of nicotiana tabacum by means of sequencing of genomic DNA and cDNA libraries of hicks broadleaf, a variety present in the pedigree of many modern cultivated varieties. As more and more sequences are accumulated in the databases, bioinformatics analyses are needed to contribute to the understanding of gene function and genome organization. These analyses provide the basis for describing known genes present in the tobacco genome and controlling useful agronomic traits in crop plants. The comparison of genetic information of tobacco and other sequenced solanaceous and non-solanaceous plant species will also help to unravel new genetic information on unknown and tobacco-specific genes. This effort aims at the development of effective tools to accelerate and assist conventional breeding (e.g., by marker-assisted selection) for reducing the levels of harmful constituents and improve flavour characteristics in tobacco leaf. Specific information on genes of selected pathways leading to the formation of carotenoid or polyphenol compounds is of paramount importance for fundamental and applied research purposes. Carotenoids are primary metabolites with a critical role in photosynthesis, which have been shown to generate volatile compounds upon enzymatic degradation by specific cleavage enzymes. Polyphenols and flavonoids are secondary metabolites playing a role in plant response to biotic and abiotic stresses, and have antioxidant properties in vivo. We carried out homology searches in the TGI database using carotenoid and flavonoid gene sequences present in public sequence databases, to detect tobacco orthologues from genomic DNA and expressed sequence tag (EST) sequences. Sequences related to most structural genes of both pathways were found. Data on the number of gene sequences, sequence homology, coverage of gene full length sequence and gene structure are presented. The existence of families of gene paralogues is also presented and discussed.