全基因组关联分析阈值的快速算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Quick Approximation of Threshold Values for Genome-wide Association Studies
  • 作者:野金花 ; 方铭 ; 徐艳 ; 高德宝 ; 周晓晶 ; 张巧生 ; 张莹
  • 英文作者:Ye Jinhua;Fang Ming;Xu Yan;Gao Debao;Zhou Xiaojing;Zhang Qiaosheng;Zhang Ying;College of Science,Heilongjiang Bayi Agricultural University;College of Fisheries,Jimei University;College of Science and Veterinary Medicine,Heilongjiang Bayi Agricultural University;
  • 关键词:统计量 ; 阈值 ; 全基因组关联分析 ; 模拟
  • 英文关键词:statistics;;threshold;;Genome-wide association study;;simulation
  • 中文刊名:HLJK
  • 英文刊名:Journal of Heilongjiang Bayi Agricultural University
  • 机构:黑龙江八一农垦大学理学院;集美大学水产学院;黑龙江八一农垦大学动物科学技术学院;
  • 出版日期:2019-06-20
  • 出版单位:黑龙江八一农垦大学学报
  • 年:2019
  • 期:v.31;No.152
  • 基金:2017年大庆市指导项目(采用随机森林法解析猪的经济性状位点:zd-2017-76);; 2017年校内培育课题(动物全基因组关联分析的分层求解策略——基于随机回归模型:XZR2017-13)
  • 语种:中文;
  • 页:HLJK201903020
  • 页数:6
  • CN:03
  • ISSN:23-1275/S
  • 分类号:129-134
摘要
逐个SNP的全基因组关联分析中的统计量不仅依赖于遗传效应,而且依赖于SNP,因此统计量不能直接应用于推断无遗传效应的零假设中。检测不同样本的QTN,常用的统计量除了卡方统计量,还有t统计量、F统计量和标准正态统计量。首先给出了各个统计量之间的关系,接下来针对冗余参数背景下的全基因组关联分析,提出了检验统计量阈值的快速计算方法。再次利用获得的高通量SNP的统计概率构建卡方统计量,进而估计临界值。最后模拟不同样本的阈值并与文献上提出的阈值算法比较。大量模拟实验证明,提出的方法快速而且有效。
        In a genome-wide association study(GWAS)of single nucleotide polymorphism(SNP)by SNP,the distributions of these statistics depended on both the genetic effects and the SNPs,so that the statistics were not directly applicable to infer null hypothesis of no genetic effects. Standard normal statistics and student t or F statistics,besides chi-square statistics,were used to map quantitative trait nucleotides(QTN)for small or large samples.The relationship of statistics was given first in this paper.Hypothesis testing when a nuisance parameter was present only under the alternative was introduced to quickly approximate the critical thresholds of these test statistics for a genome-wide association study.When only the statistical probabilities were available for highthroughput SNPs,the critical thresholds were also estimated approximately with the chi-square statistics formulated by statistical probabilities.High similarities in critical thresholds between accurate and approximate estimations were demonstrated by extensive simulations. Finally,the threshold values of different samples were simulated and compared with the threshold algorithm proposed in the literature.A large number of simulation experiments had proved that the proposed method was fast and effective.
引文
[1] Sham P C,Purcell S M. Statistical power and significance testing in large-scale genetic studies[J].Nature Reviews Genetics,2014(15):335-346.
    [2] Hochberg Y,Tamhane A C.Multiple Comparison Procedures[M].New York:Wiley,1987.
    [3] Young SS,Young S S. Resampling-Based Multiple Testing:Examples and Methods for P-Value Adjustment[J].John Wiley&Sons,1993(5):163-169.
    [4] Holland BS,Copenhaver DP.An Improved Sequentially Rejective Bonferroni Test Procedure[J].Biometrics,1987,43:417-423.
    [5]王艳,矩阵指数的计算[J].黑龙江八一农垦大学学报,2010(4):85-88.
    [6] Li M X.Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets[J].Human Genetics,2012,131:747-756.
    [7] Davies RB.Hypothesis testing when a nuisance parameter is present only under the alternative[J].Biometrika,1987,74:33-43.
    [8] Piepho HP.A quick method for computing approximate thresholds for quantitative trait loci detection[J].Genetics,2001,157:425-437.
    [9] Pahl R,Schafer H.PERMORY:an LD-exploiting permutation test algorithm for powerful genome-wide association testing[J].Bioinformatics,2010,26:2093-2100.
    [10] Davies R B.Hypothesis testing when a nuisance parameter is present only under the alternative:Linear model case[J].Biometrika,2002,89(2):484-489.
    [11] Balding DJ,Nichols RA.A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity[J].Genetica,1995,96:311-326.
    [12] Svishcheva G R,Axenovich T I,Belonogova N M,et al.Rapid variance components-based method for wholegenome association analysis[J].Nat Genet,2012,44(10):1166-1170.
    [13] Fisher R A.The correlation between relatives on the supposition of Mendelian inheritance[J].Transactions of the Royal Society of Edinburgh,1918,52:399-433.
    [14] Aitken A C.On Least-squares and Linear Combinations of Observations[J].Proc R Soc Edinb Sect B(Biol Sci),1934,55:42-48.
    [15] Kang H M,Zaitlen N A,Wade C M,et al. Efficient control of population structure in model organism association mapping[J].Genetics,2008,178(3):1709-1723.
    [16] Yang J,Benyamin B,Mcevoy B P,et al. Common SNPs explain a large proportion of the heritability for human height[J].Nat Genet,2010,42(7):565-569.
    [17]白冬雪,贾永全,王宁,等.黑龙江省中规模生猪生产者盈亏平衡价格分析[J],黑龙江八一农垦大学学报,2017,29(2):129-132.
    [18] Tibshirani R J.Regression shrinkage and selection via the LASSO.J R Stat Soc B[J].Journal of the Royal Statistical Society,1996,58:267-288.
    [19] Wu T T,Chen Y F,Hastie T,et al.Genome-wide association analysis by lasso penalized logistic regression[J].Bioinformatics,2009,25(6):714-721.
    [20] Barton R R,Ivey J S.Nelder-Mead Simplex Modifications for Simulation Optimization[J].Management Science,1996,42:954-973.