序列比对算法中的BW变换索引技术研究及其改进
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on BW Transform Index Technology in Sequence Alignment Algorithm and Its Improvement
  • 作者:赵雅男 ; 徐云 ; 程昊宇
  • 英文作者:ZHAO Yanan;XU Yun;CHENG Haoyu;School of Computer Science and Technology,University of Science and Technology of China;Key Laboratory of High Performance Computing of Anhui Province;
  • 关键词:序列比对 ; 索引 ; BW变换索引 ; 第二代测序 ; 第三代测序 ; 大规模长序列比对
  • 英文关键词:sequence alignment;;index;;Burrows-Wheeler Transform(BWT) index;;next-generation sequencing;;third generation sequencing;;alignment of large-scale and long sequences
  • 中文刊名:JSJC
  • 英文刊名:Computer Engineering
  • 机构:中国科学技术大学计算机科学与技术学院;安徽省高性能计算重点实验室;
  • 出版日期:2016-01-15
  • 出版单位:计算机工程
  • 年:2016
  • 期:v.42;No.459
  • 基金:国家自然科学基金资助重点项目(61033009);; 国家“111”计划基金资助项目(B07033)
  • 语种:中文;
  • 页:JSJC201601050
  • 页数:5
  • CN:01
  • ISSN:31-1289/TP
  • 分类号:288-292
摘要
面向大规模长序列的序列比对问题是生物信息学中最重要的基础问题之一。针对序列比对算法的主流索引技术BW变换(BWT)进行研究,提出一种新的二阶BWT索引方法。与传统BWT方法的逐位索引查找不同,改进后的BWT方法按双位索引查找。实验结果表明,改进后的方法减少了序列比对算法中的循环遍历和计算次数,降低了序列比对算法中索引方法的复杂度,提高了查找效率,尤其适合长序列和大规模序列的索引和查找。
        Sequence alignment of large-scale and long sequences is one of the most important and basic issues in bioinformatics.This paper focuses on Burrows-Wheeler Transform(BWT) which is the major index technology in sequence alignment algorithms and proposes a new second-order BWT index concept as well as its implementation.Different from the traditional BWT algorithm while searching with a single character,the algorithm can find two characters at one time.Experimental results show that the second-order BWT index algorithm can reduce the frequency of loop and calculation in sequence alignment algorithm.It can also reduce the alignment algorithm complexity by half and improve the search efficiency /especially for large-scale and long sequence s index and searching process.
引文
[1]Mount D W.Bioinformatics:Sequence and Genome Analysis[M].Berlin,Germany:Springer,2002.
    [2]国宏哲,王亚东.基因组Mapping系统索引构建原理[J].智能计算机与应用,2012,2(4):47-49.
    [3]Altschul S F,Gish W,Miller W,et al.Basic Local Alignment Search Tool[J].Journal of Molecular Biology,1990,215(3):403-410.
    [4]Li Heng,Homer N.A Survey of Sequence Alignment Algorithms for Next-generation Sequencing[J].Briefings in Bioinformatics,2010,11(5):473-483.
    [5]Hach F,Hormozdiari F,Alkan C,et al.mrs FAST:A Cache-oblivious Algorithm for Short-read Mapping[J].Nature Methods,2010,7(8):576-577.
    [6]Li Ruiqiang,Li Yingrui,Kristiansen K,et al.SOAP:Short Oligonucleotide Alignment Program[J].Bio-informatics,2008,24(5):713-714.
    [7]Langmead B,Trapnell C,Pop M,et al.Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome[J].Genome Biology,2009,10(3).
    [8]Langmead B,Salzberg S L.Fast Gapped-read Alignment with Bowtie 2[J].Nature Methods,2012,9(4):357-359.
    [9]Abouelhoda M I,Kurtz S,Ohlebusch E.Replacing Suffix Trees with Enhanced Suffix Arrays[J].Journal of Discrete Algorithms,2004,2(1):53-86.
    [10]Li Ruiqiang,Yu Chang,Li Yingrui,et al.SOAP2:An Improved Ultrafast Tool for Short Read Alignment[J].Bioinformatics,2009,25(15):1966-1967.
    [11]Lam T W,Sung W K,Tam S L,et al.Compressed Indexing and Local Alignment of DNA[J].Bioinformatics,2008,24(6):791-797.
    [12]Li Heng,Durbin R.Fast and Accurate Long-read Alignment with Burrows-wheeler Transform[J].Bioinformatics,2010,26(5):589-595.
    [13]Burrow M,Wheeler D J.A Block-sorting Lossless Data Compression Algorithm,SRC-RR-124[R].Digital Equipment Corporation,1994.
    [14]倪桂强,李彬,罗健欣,等.BWT与经典压缩算法研究[J].计算机与数字工程,2010,38(11):26-41.
    [15]Ferragina P,Manzini G.An Experimental Study of an Opportunistic Index[C]//Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms.New York,USA:ACM Press,2001:269-278.