Systems toxicology intends to quantify the effect of toxic molecules in biological systems and unravel their mechanisms of toxicity. The development of advanced computational methods is required for analyzing and integrating high throughput data generated for this purpose as well as for extrapolating predictive toxicological outcomes and risk estimates. To ensure performance and reliability of the methods and verify conclusions from systems toxicology data analysis, it is important to conduct unbiased evaluations by independent third parties. As a case study, we report here the results of an independent verification of methods and data in systems toxicology by crowdsourcing. The sbv IMPROVER systems toxicology computational challenge aimed to evaluate computational methods for the development of blood-based gene expression signature classification models with the ability to predict smoking exposure status. Participants created/trained models on blood gene expression datasets including smokers/mice exposed to 3R4F (a reference cigarette) or noncurrent smokers/Sham (mice exposed to air). Participants applied their models on unseen data to predict whether subjects classify closer to smoke-exposed or nonsmoke exposed groups. The data sets also included data from subjects that had been exposed to potential modified risk tobacco products (MRTPs) or that had switched to a MRTP after exposure to conventional cigarette smoke. The scoring of anonymized participants' predictions was done using predefined metrics. The top 3 performers' methods predicted class labels with area under the precision recall scores above 0.9. Furthermore, although various computational approaches were used, the crowd's results confirmed our own data analysis outcomes with regards to the classification of MRTP-related samples. Mice exposed directly to a MRTP were classified closer to the Sham group. After switching to a MRTP, the confidence that subjects belonged to the smoke-exposed group decreased significantly. Smoking exposure gene signatures that contributed to the group separation included core set of genes highly consistent across teams such as AHRR, LRRN3, SASH1, and P2RY6. In conclusion, crowdsourcing constitutes a pertinent approach, in complement to the classical peer review process, to independently and unbiasedly verify computational methods and data for risk assessment using systems toxicology.