Novel Breast Cancer Risk Assessment Tools for Pre- and Postmenopausal Asian Women: Development and Validation in a Nationwide Mammographic Screening Cohort
Article information
Abstract
Purpose
Widely used breast cancer risk-prediction tools are based on data from Western countries, but risk factors may differ for Asian women. Hence, we aimed to develop a risk assessment tool for breast cancer in Asian women using a nationwide, population-based mammographic screening cohort.
Materials and Methods
Women aged ≥ 40 years who underwent breast cancer screening and general health examination in 2009 were included. Age, body mass index (BMI), breast density, lifestyle and reproductive factors, and comorbidities were used to develop 5-year breast cancer risk-prediction models for premenopausal (n=771,856) and postmenopausal (n=1,108,047) women at baseline. The best-fit risk prediction model was constructed using backward stepwise selection in a Cox proportional hazards model and was transformed into a risk score nomogram. The performance was assessed by discrimination and calibration.
Results
In premenopausal women, high BMI, low parity, short breastfeeding period, early age at menarche, high breast density, a history of benign breast masses, and family history of breast cancer contributed to the risk prediction of breast cancer. In postmenopausal women, age, diabetes mellitus, dyslipidemia, late-onset menopause, and hormone replacement therapy use were additional risk predictors of breast cancer. Our risk-prediction model showed a concordant statistic of 0.58 (0.57-0.59) for premenopausal women and 0.64 (0.63-0.65) for postmenopausal women. The calibration plot demonstrated good correlations for both models.
Conclusion
Our breast cancer risk-prediction model demonstrated performance comparable to that of Western countries, especially among postmenopausal women. This provides a foundation for implementing risk-based screening recommendations in Asian women.
Introduction
Breast cancer is an important global health issue. It is the leading cause of global cancer incidence, with an estimated 2.26 million new cases in 2020 [1]. Notably, the incidence and its trend are heterogeneous by world region [2]. The incidence of breast cancer increased rapidly during the 1980s and 1990s and then stabilized in the mid-2000s in many countries, including Northern America and Europe. This may reflect changes in reproductive, hormonal, and lifestyle risk factors as well as the initiation and expansion of cancer screening programs [3]. On the other hand, the current increase in breast cancer incidence in Asian countries with a historically low incidence [4] can be attributed to changes in lifestyle and reproductive factors, including obesity, early menarche, fewer childbirths, and late menopause [5], all of which may result from Westernized lifestyles.
However, epidemiologic features of breast cancer still differ between Asian and Western countries. For instance, breast cancer incidence peaks at ages 45-49 years and premenopausal women are more often affected in Asian countries compared to Western countries [6]. Asian women tend to have more dense breasts, which is an established risk factor of breast cancer [7], but present a relatively modest association with the risk of premenopausal breast cancer compared to White, Black, or Hispanic women [8]. In addition, the different associations between obesity and breast cancer risk according to menopausal status in Asian women [9] may support the separate construction of breast cancer risk-prediction models by menopausal status.
Numerous risk assessment models for breast cancer have been pioneered in North America and Europe, incorporating various risk factors such as age, body mass index (BMI) reproductive factors (age at menarche, age at first birth, and menopausal status), family history, and atypical ductal hyperplasia (S1 Table) [10-13]. Diverse distributions of risk factors according to race/ethnicity, however, may attenuate the predictive performance in Asian women [8,14]. A previous risk-prediction study in Korean women did not construct models exclusively for pre- or postmenopausal women [15], albeit risk factors for these groups are not identical.
Hence, we aimed to develop novel breast cancer risk assessment tools for pre- and postmenopausal Asian women to reflect the discrepancy in breast cancer incidences, using a nationwide, population-based mammographic screening cohort from Korea.
Materials and Methods
1. Database source
Development of the risk-prediction model used a nationwide database from the Korean National Health Insurance Service (NHIS). The NHIS is a single insurer that provides mandatory universal coverage to 97% of the Korean population and additional Medicaid beneficiaries to the 3% of the population with the lowest income. The NHIS integrates various types of data, including sociodemographic, anthropometric, diagnostic, and medication data, obtained through its nationwide database supplemented by the National Breast Cancer Screening Program (NCSP) and national general health examination programs. This includes International Classification of Diseases 10th revision (ICD-10) diagnostic codes as well as a list of prescribed medications.
The NCSP, administered by the NHIS, encourages every Korean woman aged ≥ 40 years to undergo mammography and provide information on reproductive factors through a survey questionnaire administered every 2 years. This enables an analysis of mammographic breast density and detailed reproductive factors. Additionally, all individuals aged ≥ 40 years and all employees regardless of age are eligible to participate in national general health examination programs at least every 2 years. The examination includes a standardized questionnaire that evaluates past medical history and lifestyle behaviors like smoking, drinking, and physical activity.
2. Study population
We initially identified study subjects (n=2,755,730) who participated in the NHIS program for both breast cancer screening and general health examination in 2009. Subjects with missing or erroneous values (n=733,672), with a history of hysterectomy (n=100,933), or with a history of any cancer before health examination (n=34,605) were sequentially excluded. After excluding person-time entries within the first year to reduce bias related to undetected breast cancer present at baseline (n=6,617), 1,879,903 women (771,856 premenopausal; 1,108,047 postmenopausal) were included in the analysis (S2 Fig.).
In this study, we developed separate breast cancer risk-prediction models for premenopausal and postmenopausal women. Menopausal status was determined through a self-administered questionnaire, which asked about menstrual periods, hysterectomy, and age at menopause. Women who provided unclear or inconsistent information on their menopausal status (e.g., those who claimed to still have menstrual periods but admitted to taking postmenopausal hormone therapeutics) were classified as postmenopausal.
To generate the development and validation datasets for each model, we randomly split the final study population; as such, 70% of the subjects were included in the development dataset, and the remaining 30% were included in the validation dataset.
This study adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines.
3. Outcome
The primary outcome was the first breast cancer diagnosis. Breast cancer diagnosis was defined when both the ICD-10 code (C50) and cancer-specific insurance claim code (V193 code) were relevant. According to the NHIS, patients with cancer pay only 5% of the total medical bill incurred for cancer-related medical care by a special co-payment reduction code for cancer (V193) that requires a medical certificate from a physician to be enrolled. Therefore, the cancer diagnoses in this study are considered to be sufficiently reliable. The subjects were followed from the date of the health examination date until the first breast cancer diagnosis, censoring date, or until the end of the study period (December 31, 2019), whichever came first.
4. Predictors
Potential predictors were selected based on a literature review and the availability of data. Premenopausal women were classified into three groups according to age (40-44, 45-49, and ≥ 50 years), while postmenopausal women were divided into six groups according to age (< 50, 50-54, 55-59, 60-64, 65-69, and ≥ 70 years). Additionally, the Asia-Pacific criteria were used to classify women into five groups by BMI (< 18.5, 18.5-23, 23-25, 25-30, and ≥ 30 kg/m2). Information on lifestyle-related factors was obtained using the self-administered questionnaires at enrollment, and were dichotomized in the analysis. Comorbidities of participants were identified based on laboratory measures, claims, and prescription information prior to the index date (Supplementary Material).
Detailed reproductive factors were assessed, including age at menarche, number of children (0, 1, ≥ 2), lifetime duration of breastfeeding (never, ≤ 3 months, 4-12 months, or ≥ 1 year), history of benign breast mass (yes/no), mammographic breast density (< 25%, 25%-50%, 51%-75%, or 76%-100%), and family history of breast cancer (yes/no) in the breast cancer risk-prediction models for both premenopausal and postmenopausal women. However, age at menopause and use of postmenopausal hormonal replacement therapy (HRT) were considered only in the breast cancer risk-prediction model for postmenopausal women.
5. Statistical analysis
The baseline characteristics are presented as mean and standard deviation values or number with percentage for categorical variables. Chi-square tests and Student’s t tests were used to examine the difference between the proportions or means of a pair of variables. The incidence rates of breast cancer were assessed as the number of incident cases divided by 1,000 person-years.
A multivariable Cox proportional hazards model was employed to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the association between selected predictors and breast cancer risk. The Schoenfeld residuals were used to evaluate the proportional hazards assumption based on the Kaplan-Meier curves. The best-fit risk-prediction models were constructed using backward stepwise selection.
Model construction and validation were conducted using a maximum 5-year follow-up for each participant. Data collection through 2019 ensured comprehensive outcome capture within this 5-year timeframe, rather than implying a 10-year risk model. The 5-year horizon was chosen for its clinical relevance and ease of interpretation. Although the model’s hazard functions could theoretically be used for other intervals (e.g., 7 years), we focused on the 5-year period so that the HRs specifically reflect breast cancer risk within that interval. The extended follow-up through 2019 ensured all participants completed the full 5-year observation window, without affecting HR calculations beyond that timeframe.
The final predictors were assigned weighted-risk scores ranging from 0-100 points based on the beta coefficients for each predictor in the final Cox proportional hazards model. The scores for each predictor were summed to determine the total scores. A predictive model for breast cancer risk was then transformed into a risk score nomogram, representing an individual’s numerical probability of incident breast cancer.
6. Performance and validation of the risk-prediction model
To evaluate the performance of the models, discrimination and calibration were assessed. Discrimination was evaluated using the area under the receiver operating characteristic curve (AUROC) and concordant statistic (C-statistic), which measure how well the model can distinguish between women who developed breast cancer and those who did not. Internal validation of the discrimination was conducted by calculating the bootstrap optimism-corrected area under curve with 100 bootstrap replications. Calibration was evaluated by comparing the predicted and observed incidence rates in each decile of absolute breast cancer risk using the expected-to-observed ratio and the Hosmer-Lemeshow chi-square test.
The statistical analyses were performed using SAS ver. 9.4 (SAS Institute Inc.). The p-values provided are two-sided, with the level of significance at 0.05.
Results
1. Baseline characteristics
During a mean follow-up of 9.22 years for pre- and 9.14 years for postmenopausal women, we identified 9,542 and 8,296 incident breast cancer cases among pre- and postmenopausal women, respectively.
Premenopausal women who developed breast cancer were younger and had fewer children, a shorter duration of breastfeeding, denser breasts, and an earlier age at menarche than the non-cancer comparison group. They also more often had a history of benign breast masses and a family history of breast cancer (Table 1).
Meanwhile, postmenopausal women who developed breast cancer were more likely to be younger and obese, to consume greater amounts of alcohol, to perform physical activity more regularly, and to have a higher prevalence of dyslipidemia compared to the non-cancer comparison group. They were also more likely to have later-onset menopause; to use more postmenopausal HRT drugs; and to have fewer children, denser breasts, and a shorter duration of breastfeeding. Finally, they more frequently had a history of benign breast masses and a family history of breast cancer (Table 2).
2. Predictors of breast cancer risk
The associations between each predictor and the breast cancer risk among pre- and postmenopausal women are shown in Table 3. Age was associated with breast cancer in postmenopausal women by an inverted U-shape pattern, whereas no such association was found in premenopausal women. Overweight and obesity were associated with increased risk of breast cancer in both pre- and postmenopausal women, and the association was stronger among postmenopausal women. In contrast, underweight was associated with decreased risk of breast cancer in both pre- and postmenopausal women.
Lifestyle behaviors and comorbid conditions were not associated with breast cancer risk in premenopausal women. In contrast, hypertension and diabetes mellitus were associated with increased breast cancer risk in postmenopausal women.
Among reproductive factors, short breastfeeding period, early age at menarche, high breast density, history of benign breast mass, and family history of breast cancer were associated with increased risk of breast cancer in both pre- and postmenopausal women. Low parity was associated with an increased risk of breast cancer in premenopausal women, whereas late-onset menopause and use of HRT were associated with an increased risk of breast cancer in postmenopausal women.
3. Risk scores for breast cancer incidence
The risk-prediction models for breast cancer were converted into nomograms of risk scores (Fig. 1). The sum of scores (matched breast cancer risk) ranged from 0-419 (4.9%) points for premenopausal women (14 predictors) and 0-405 (5.6%) points for postmenopausal women (16 predictors) (Fig. 2, S3 and S4 Tables). When the total risk scores are categorized into deciles, premenopausal women in the highest decile correspond to a group with the highest incidence rate of 3.09 per 1,000 person-years, whereas their postmenopausal counterparts correspond to a group with the highest incidence rate of 2.26 per 1,000 person-years (S5 Fig.).
Nomogram for 5-year risk of breast cancer to determine probability in premenopausal (A) and postmenopausal (B) women. BMI, body mass index; DM, diabetes mellitus.
4. Validation of the risk-prediction model
The AUROC of the risk-prediction model was 0.58 (95% CI, 0.57 to 0.59) for premenopausal women and 0.64 (95% CI, 0.63 to 0.65) for postmenopausal women (Fig. 3). The calibration plots indicate that the predicted and observed incidence rates of breast cancer were correlated well in both premenopausal (S6 Fig.) and postmenopausal (S7 Fig.) women.
Discussion
In the present study, we developed and validated a risk assessment tool for 5-year breast cancer risk prediction in pre- and postmenopausal women using a nationwide, population-based screening cohort from Korea. Notably, our model incorporated a wide range of risk factors for breast cancer (e.g., age, BMI, lifestyle and reproductive factors, metabolic dysfunction, and mammographic breast density) into a distinct risk assessment by menopausal status at baseline. The breast risk-prediction model revealed overall modest discrimination, particularly for postmenopausal women. The performance of the model was also good in terms of calibration ability.
To the best of our knowledge, only a single previous model has predicted breast cancer by menopause-stratified estimation [16] or addressed women participating in population-based screening programs [17]. Given that the nationwide screening program in Korea is the world’s largest health screening program with extensive screening items and minimal follow-up loss, our model provides highly representative risk-prediction values. In addition, considering the etiologic heterogeneity of breast cancer by race/ethnicity, our model may provide more accurate risk predictions for Asian women than those developed based on other races/ethnicities.
Our model revealed that women with a greater proportion of dense tissue had an increased risk of breast cancer by dose–response relationships in both pre- and postmenopausal women, which is consistent with previous meta-analyses [18]. Interestingly, while the association between mammographic breast density and breast cancer risk was relatively strong in postmenopausal women, its contribution to the risk-prediction model was also relatively high in premenopausal women. Remarkably, this is the first breast cancer risk-prediction model to incorporate breast density in Asian women. To our knowledge, predictive values of mammographic breast density arranged by menopausal status have not been explored in previous risk models. Adding mammographic breast density to the breast cancer risk-prediction model improves the model performance [19]. For instance, the modified Gail model (or Breast Cancer Risk Assessment Tool [BCRAT] model), originally developed for Caucasian women aged 20-79 years, does not include breast density as a predictor. Instead, it accounts for factors such as age, age at menarche, number of previous biopsies, age at first birth, number of first-degree relatives with breast cancer, and the presence of atypical ductal hyperplasia (S1 Table). When applied to Korean women, this model showed a 0.55 (95% CI, 0.50 to 0.59) discriminatory accuracy for Korean women [11,20]. On the other hand, the Breast Cancer Surveillance Consortium (BCSC) model, which includes breast density as a predictor (age, ethnicity, breast density, number of previous biopsies, and number of first-degree relatives with breast cancer), showed a greater discriminatory accuracy (0.66; 95% CI, 0.65 to 0.67) [11] compared to the BCRAT model. However, the BCSC model was not exclusively validated for Asian women. In contrast, the Tyrer-Cuzick model, which also included breast density as a predictor, showed a discriminatory accuracy of 0.59 (95% CI, 0.56 to 0.61) [21], although it was not validated in Asian women.
Notably, in this study, BMI showed a positive association with breast cancer risk in both pre- and postmenopausal women. However, this association was stronger and its contribution to the risk prediction was much higher among postmenopausal women. According to the earlier study of North American and European populations, obesity is an established risk factor for postmenopausal breast cancer [22], whereas it is a protective factor for premenopausal breast cancer [23], although its effect is likely restricted to estrogen receptor–positive breast cancer [24]. On the other hand, the positive association between obesity and premenopausal breast cancer has been reported in triple-negative breast cancer (TNBC) [25] and in Asian women [26]. The disparity of breast cancer subtypes and its association with obesity by race/ethnicity (i.e., there is a relatively smaller proportion of TNBC in the Asian population compared to other races/ethnicities) [27] may contribute to the heterogeneous associations between BMI and breast cancer risk in Asian and the Western countries.
We observed that reproductive factors (i.e., parity, breastfeeding duration, and age at menarche) contribute to predict the breast cancer risk, irrespective of menopausal status. It is noteworthy that the association between breastfeeding duration and risk of breast cancer tended to be stronger in postmenopausal women. Previous studies have provided mixed evidence on the impact of menopause on the association between breastfeeding and breast cancer risk [28,29] Considering the changes in reproductive factors that have occurred over the past decades in Asian women [5], our findings may help to clarify the role of these risk factors in breast cancer development.
Our prediction model may provide the background for risk-based screening recommendations. In Korea, the breast is the leading primary site of cancer. The crude incidence rate (per 100,000) for breast cancer was 103.2 in 2021, whereas the counterpart in 2019 was 92.9 [30], suggesting an increasing trend. While the breast cancer screening program only considers age (40 years old) when defining the target population in South Korea, risk-based screening can be a promising option to maximize the effectiveness of cancer screening. For example, women aged > 70 years may choose not to undergo screening for breast cancer based on risk-based decisions together with clinicians. This can prevent excess healthcare use and costs and reduce possible anxiety associated with waiting for examination results.
Despite the great clinical implications of this study, however, there are also some limitations. First, our prediction model was designed based on the population who attended NHIS-supported breast cancer screenings. People who undergo routine cancer screenings have better health behaviors but are more likely to have higher incidence rates than their unscreened counterparts. Thus, selection bias could be present but may not be large because of the high participation rate in breast cancer screenings. Second, estimating the breast cancer risk among screening-naïve women is limited. Third, participants with an unclear menopausal status were regarded to be postmenopausal, which may elicit errors. Fourth, incident breast cancer was not identified based on a time-varying menopausal status. Furthermore, women who experience menopause earlier or later than the typical age range may not be accurately represented by the risk prediction model. Lastly, genetic predictors such as polygenic risk score and dietary factors were not considered. Future studies incorporating more granular data, including molecular biomarkers, genetic variants, and detailed dietary assessments, could improve the predictive accuracy (e.g., c-index, calibration) of breast cancer risk models.
To summarize, our breast cancer risk-prediction models demonstrated predictive performance comparable to those in Western countries, particularly for postmenopausal women. These findings underscore the significance of including mammographic breast density and considering menopause when assessing breast cancer risk for Asian women, and suggest that the model could be useful for risk stratification of older Asian women who may no longer undergo breast cancer screening.
Electronic Supplementary Material
Supplementary materials are available at Cancer Research and Treatment website (https://www.e-crt.org).
Notes
Ethical Statement
This study was approved by the Institutional Review Board of Samsung Medical Center (SMC 2017-12-039). Anonymized and de-identified information was used for analyses; therefore, informed consent was not required.
Author Contributions
Conceived and designed the analysis: Jung W, Park YMM, Shin DW.
Collected the data: Jung W, Park YMM, Park SH, Han K, Shin DW.
Contributed data or analysis tools: Jung W, Park YMM, Park SH, Han K, Shin DW.
Performed the analysis: Park SH, Han K.
Wrote the paper: Jung W, Park YMM, Park SH, Han K, Park J, Yeo Y, Lee JK, Sandler DP, Shin DW.
Conflicts of Interest
Conflict of interest relevant to this article was not reported.
Funding
This study was supported by a grant funded in 2017 (KFCR-2017-C-1) for the “research and development of self-decision aids useful for Koreans to make a decision on getting cancer screenings” by the Korean Foundation for Cancer Research and the Korean Cancer Society, Republic of Korea. This research was supported in part by a grant from the Arkansas Breast Cancer Research Program and by the Intramural Research Program of the National Institute of Environmental Health Sciences, National Institutes of Health.
