The identification of compounds from complex matrices (such as smoke aerosols or biological samples) that may contain thousands of chemical entities remains a considerable analytical challenge, even when sophisticated analytical instruments and state-of-the-art analytical data processing software are used. This is recognized to be the most challenging step in many scientific fields, including metabolomics. In order to increase the level of confidence in compound identification from smoke aerosols, and to automate a maximum number of manual steps currently performed by scientists, a set of informatics tools to build a computational platform for compound identification is being developed at Philip Morris International. The first component of this platform is a chemical and spectral registration system that enables unambiguous referencing of chemicals and their spectral data. This platform entitled ‘Unique Compound and Spectra Database (UCSD)’ was developed internally and is based upon two software systems: oracle with Accelrys direct chemical cartridge to manage chemical structures, and ACD/Labs modules Chemfolder and mass Spectra Database to record analytical spectra. The second component of the platform entitled ‘Computer-Assisted Structure Identification (CASI)’ directly supports the compound identification process and is composed of several modules, each built to support a specific analytical chemistry workflow. Such workflows include, for example, automated non-targeted screening, comparison of chemical composition for two mixtures and puff-by-puff analysis, all of which are essential for the analysis of smoke aerosol samples. Furthermore, it is planned to consolidate all analytical data within an R&D scientific analytical warehouse, which will ease the process for comparing the chemical composition of different samples and provide a central point for data reporting. The challenges and pitfalls encountered during the development of this platform will be discussed.