Background: Smoking induces a variety of poorly understood complex molecular and physiological changes in the lungs of smokers, with adenocarcinomas sometimes being a disease outcome. A better understanding of these complex interactions is clearly needed. Objective: Mathematical modeling is a powerful approach to bringing together diverse information including high-throughput systems biology data and targeted endpoints and biomarkers measured in in vitro, in vivo, and clinical studies. The objective is develop an integrative, central mechanistic hypothesis of the early initiation processes of the development of adenocarcinomas in the lung and to build a plausible, calibrated model that includes most of the major phenomenology. Method: There are three major steps in the development of this computational model. First, the key biological behaviors are identified which distinguish the earliest transition from normal cells (prior to smoke exposure) to neoplastic cells. From these considerations, it is possible to build a model that is neither too complex nor too simplistic, but that can accurately explain the effects of smoking. In a second step, the biological diagrams are translated to corresponding ordinary differential equations. A third step consists of the acquisition and analysis of quantitative biological data to calculate the rate equations. The data are evaluated under a rigorous set of criteria to determine whether or not they are within the scope of the modeling biology and pass the stringent quality criteria. A method has been developed for the translation of animal data (doses and ages) to the corresponding human situation. Finally, an aggressive optimization strategy is used to obtain a single set of parameters so that the model predictions simultaneously provide a good fit to all the data and accurately reproduce the key biological phenomena.Results: the model is currently being built. The biology of the development of lung adenocarcinomas has been finely explored/investigated and a good sense of the mathematical challenges developed. Experience has been gained in determining what data are suitable for the surrogate strategy and in overcoming some of the poor data collection practices, e.g., using smoking pack-years instead of actually recording the amount of smoking and the duration of exposure. Discussion: We will discuss the model building process, the value of a formal mathematical language for the expression of complex biological knowledge, assumptions, and hypotheses. There are also important lessons to be learned with respect to better clinical trial designs and systems biology data collection practices.