利用组学数据建立针对四种女性癌症的基于机器学习方法的生存预测模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Development of omics data based survival models for four female cancers using machine learning approaches
  • 作者:桑浩凯 ; 郭树理 ; 曲红 ; 赵敏 ; 曲大成
  • 英文作者:SANG HaoKai;GUO ShuLi;QU Hong;ZHAO Min;QU DaCheng;School of Computer Science & Technology, Beijing Institute of Technology;State Key Laboratory of Intelligent Control and Decision of Complex Systems, School of Automatic Control, Beijing Institute of Technology;Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University;School of Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast;
  • 关键词:癌症基因组学 ; 生存分析 ; Cox比例风险模型 ; 随机生存森林模型 ; LASSO回归
  • 英文关键词:cancer genomics;;survival analysis;;Cox proportional hazards model;;random survival forest;;LASSO regression
  • 中文刊名:JCXK
  • 英文刊名:Scientia Sinica(Vitae)
  • 机构:北京理工大学计算机学院;北京理工大学自动化学院北京理工大学复杂系统智能控制与决策教育重点实验室;北京大学生命科学学院生物信息中心蛋白质与植物基因研究国家重点实验室;School of Engineering,Faculty of Science,Health,Education and Engineering,University of the Sunshine Coast;
  • 出版日期:2019-05-17 11:06
  • 出版单位:中国科学:生命科学
  • 年:2019
  • 期:v.49
  • 基金:国家自然科学基金(批准号:31671375);; 国家重点研发计划(批准号:2017YFC1201200,2017YFF0207400);; 澳大利亚阳光海岸大学研究启动基金资助
  • 语种:中文;
  • 页:JCXK201906006
  • 页数:11
  • CN:06
  • ISSN:11-5840/Q
  • 分类号:68-78
摘要
乳腺癌、宫颈鳞状细胞癌、子宫内膜癌、卵巢癌是女性常见的癌症.由于癌症的恶性发展并缺乏有效的早期诊疗手段,这些癌症已成为当今世界女性患者的头号杀手.为了探索高通量组学数据能否促进癌症患者的预后,本研究利用美国癌症基因组图谱项目中四种女性癌症的1861个样本的临床数据和多维组学数据(包括DNA甲基化、mRNA表达、miRNA表达和基于芯片的蛋白表达组学数据),建立了Cox比例风险模型和随机生存森林模型用来回顾性地预测患者的生存率.本研究发现,在宫颈鳞状细胞癌中,通过整合临床与DNA甲基化及miRNA表达组学数据建立的模型,生存预测能力显著高于仅使用临床数据的模型(一致性指数c-index中位数提高了8.73%~15.03%).本研究虽然验证了有些组学数据对特定癌症生存模型的预测能力有提升作用,但也存在着相对于临床数据,组学数据对模型的预测能力无显著提升的情况.这些结果为系统地开展基于癌症基因组学的生存预测研究及提升临床生存分析的预测准确性提供了有用经验.
        Breast cancer, cervical and endocervical cancer, endometrial cancer and ovarian cancer are common cancers in women. Due to the malignant development of cancer and the lack of effective early diagnosis and prognosis monitor, these cancers are the top diseases causing death among female patients. To explore whether high-throughput omics data can contribute to the prognosis of cancer patients, this study used clinical data and multidimensional omics data(including DNA methylation, m RNA expression, miRNA expression and chip-based protein expression data) of 1861 samples of four female cancers in the Cancer Genome Atlas project to construct Cox proportional hazards models and random survival forest models for retrospective prediction of patient survival. Our systematic integration found that DNA methylation and miRNA expression data could significantly improve the survival predictability in patients with cervical and endometrial cancers compared with clinical data alone(the prediction efficiency increased by 8.73%–15.03%). Although some omics data contribute to the performance improvement of survival prediction models for specific cancer patients, it does not improve the predictive performance of models in other cancers. In conclusion, our study provide the insights into the omics-based survival predictions, which may have important contributions to improving the predictive accuracy of clinical survival analysis.
引文
1 Ghiasvand R, Adami H O, Harirchi I, et al. Higher incidence of premenopausal breast cancer in less developed countries; myth or truth? BMC Cancer, 2014, 14:343
    2 Merlo D F, Ceppi M, Filiberti R, et al. Breast cancer incidence trends in European women aged 20–39 years at diagnosis. Breast Cancer Res Treat, 2012, 134:363–370
    3 Peccatori F A, Lambertini M, Scarfone G, et al. Biology, staging, and treatment of breast cancer during pregnancy:Reassessing the evidences.Cancer Biol Med, 2018, 15:6
    4 Marquina G, Manzano A, Casado A. Targeted agents in cervical cancer:Beyond bevacizumab. Curr Oncol Rep, 2018, 20:40
    5 Waggoner S E. Cervical cancer. Lancet, 2003, 361:2217–2225
    6 Sales K J. Human papillomavirus and cervical cancer. In:Cancer and Inflammation Mechanisms:Chemical, Biological, and Clinical Aspects.Hoboken:John Wiley&Sons, 2014. 165–180
    7 Siegel R, Ma J, Zou Z, et al. Cancer statistics, 2014. CA A Cancer J Clin, 2014, 64:9–29
    8 Kim K, Zang R, Choi S C, et al. Current status of gynecological cancer in China. J Gynecol Oncol, 2009, 20:72–76
    9 Lee J Y, Kim E Y, Jung K W, et al. Trends in gynecologic cancer mortality in East Asian regions. J Gynecol Oncol, 2014, 25:174–182
    10 Wu Q J, Vogtmann E, Zhang W, et al. Cancer incidence among adolescents and young adults in urban Shanghai, 1973–2005. PLoS ONE, 2012, 7:e42607
    11 Feng S Y. Survival analysis(II)(in Chinese). J Math Prac Theory, 1982, 3:72–80[冯士雍.生存分析(Ⅱ).数学的实践与认识, 1982, 3:72–80]
    12 Cox D R. Regression Models and Life-Tables. Breakthroughs in Statistics. New York:Springer, 1992
    13 Ishwaran H, Kogalur U B, Blackstone E H, et al. Random survival forests. Ann Appl Stat, 2008, 2:841–860
    14 Gómez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, et al. Integration and comparison of different genomic data for outcome prediction in cancer. Biodata Min, 2015, 8:32
    15 Lezcano-Valverde J M, Salazar F, León L, et al. Development and validation of a multivariate predictive model for rheumatoid arthritis mortality using a machine learning approach. Sci Rep, 2017, 7:10189
    16 Zhu B, Song N, Shen R, et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep, 2017, 7:16954
    17 Yuan Y, Van Allen E M, Omberg L, et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat Biotechnol,2014, 32:644–652
    18 Harrell F E, Pryor D B, Lee K L, et al. Evaluating the yield of medical tests. JAMA, 1982, 247:2543–2546
    19 Harrell F E, Lee K L, Mark D B. Multivariable prognostic models:issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statist Med, 1996, 15:361–387
    20 Jiang D, Wang H, Li Z, et al. MiR-142 inhibits the development of cervical cancer by targeting HMGB1. Oncotarget, 2017, 8:4001
    21 Jiménez-Wences H, Peralta-Zaragoza O, Fernández-Tilapa G. Human papilloma virus, DNA methylation and microRNA expression in cervical cancer(Review). Oncol Rep, 2014, 31:2467–2476
    22 Chen D, Chen Z, Jin Y, et al. MicroRNA-99 family members suppress Homeobox A1 expression in epithelial cells. PLoS ONE, 2013, 8:e80625
    23 Granados-López A J, Ruiz-Carrillo J L, Servín-González L S, et al. Use of mature miRNA strand selection in miRNAs families in cervical cancer development. Inter J Mol Sci, 2017, 18:407
    24 Shu L, Zhang Z, Cai Y. MicroRNA-204 inhibits cell migration and invasion in human cervical cancer by regulating transcription factor 12. Oncol Lett, 2018, 15:161–166
    25 Zhang R, Lu H, Lyu Y Y, et al. E6/E7-P53-POU2F1-CTHRC1 axis promotes cervical cancer metastasis and activates Wnt/PCP pathway. Sci Rep, 2017, 7:44744
    26 Dong P, Ihira K, Hamada J, et al. Reactivating p53 functions by suppressing its novel inhibitor iASPP:A potential therapeutic opportunity in p53wild-type tumors. Oncotarget, 2015, 6:19968
    27 Tao X, Shen J, Pan W, et al. Significance of SHP-1 and SHP-2 expression in human papillomavirus infected Condyloma acuminatum and cervical cancer. Pathol Oncol Res, 2008, 14:365–371