A new clustering method for time series to discover geographical cancer trends from 1960 to 2000
Presented at American College of Epidemiology Meeting
* This author is not affiliated with PMI.
Purpose: The research aims at finding a typology of 32 countries over 41 years, 1960-2000 and 52 countries over 21 years, 1980-2000 by an automatic comparative treatment of their cancer mortality time series.
Methods: To extract knowledge from the databases we have been using recent original results in complex functional multivariate data mining. The result of the hierarchy proposes an order on the terminal nodes that optimizes the distances, according to the initial dissimilarity matrix. We considered initially 122 countries from the world health organization time series data, both sexes, 21 age-classes: 53% had missing data and 5% had out of the range figures. We had to reconstruct a common cancer denomination since the international classification of diseases varies over the years. We have thus taken into account the 9th ICD, from 1979 to 1998, and then the 10th.we had to compute normalized figures: the standardized age ratios deal with 5 years age-classes and the reference population is the SEGI one.
Results: as an example of the results, eight groups were finally retained for the trend typology of ‘all cancers’ on the 32 countries, and eleven groups on the 52 countries. They have been displayed on a world map and characterized in terms of levels and variations of cancer mortality. A second result is the stability of groupings for the countries which appear in both periods. For the ‘western style’ countries, a major result is that the paragons of the curve clusters tend towards an interval of values much smaller in 2000 than twenty years before.
Conclusion: The functional multivariate pyramidal clustering of time series proved to be efficient and revealed underlying and interpretable clusters among cancer evolutions across countries in the past forty years.