摘要
面向大规模长序列的序列比对问题是生物信息学中最重要的基础问题之一。针对序列比对算法的主流索引技术BW变换(BWT)进行研究,提出一种新的二阶BWT索引方法。与传统BWT方法的逐位索引查找不同,改进后的BWT方法按双位索引查找。实验结果表明,改进后的方法减少了序列比对算法中的循环遍历和计算次数,降低了序列比对算法中索引方法的复杂度,提高了查找效率,尤其适合长序列和大规模序列的索引和查找。
Sequence alignment of large-scale and long sequences is one of the most important and basic issues in bioinformatics.This paper focuses on Burrows-Wheeler Transform(BWT) which is the major index technology in sequence alignment algorithms and proposes a new second-order BWT index concept as well as its implementation.Different from the traditional BWT algorithm while searching with a single character,the algorithm can find two characters at one time.Experimental results show that the second-order BWT index algorithm can reduce the frequency of loop and calculation in sequence alignment algorithm.It can also reduce the alignment algorithm complexity by half and improve the search efficiency /especially for large-scale and long sequence s index and searching process.
引文
[1]Mount D W.Bioinformatics:Sequence and Genome Analysis[M].Berlin,Germany:Springer,2002.
[2]国宏哲,王亚东.基因组Mapping系统索引构建原理[J].智能计算机与应用,2012,2(4):47-49.
[3]Altschul S F,Gish W,Miller W,et al.Basic Local Alignment Search Tool[J].Journal of Molecular Biology,1990,215(3):403-410.
[4]Li Heng,Homer N.A Survey of Sequence Alignment Algorithms for Next-generation Sequencing[J].Briefings in Bioinformatics,2010,11(5):473-483.
[5]Hach F,Hormozdiari F,Alkan C,et al.mrs FAST:A Cache-oblivious Algorithm for Short-read Mapping[J].Nature Methods,2010,7(8):576-577.
[6]Li Ruiqiang,Li Yingrui,Kristiansen K,et al.SOAP:Short Oligonucleotide Alignment Program[J].Bio-informatics,2008,24(5):713-714.
[7]Langmead B,Trapnell C,Pop M,et al.Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome[J].Genome Biology,2009,10(3).
[8]Langmead B,Salzberg S L.Fast Gapped-read Alignment with Bowtie 2[J].Nature Methods,2012,9(4):357-359.
[9]Abouelhoda M I,Kurtz S,Ohlebusch E.Replacing Suffix Trees with Enhanced Suffix Arrays[J].Journal of Discrete Algorithms,2004,2(1):53-86.
[10]Li Ruiqiang,Yu Chang,Li Yingrui,et al.SOAP2:An Improved Ultrafast Tool for Short Read Alignment[J].Bioinformatics,2009,25(15):1966-1967.
[11]Lam T W,Sung W K,Tam S L,et al.Compressed Indexing and Local Alignment of DNA[J].Bioinformatics,2008,24(6):791-797.
[12]Li Heng,Durbin R.Fast and Accurate Long-read Alignment with Burrows-wheeler Transform[J].Bioinformatics,2010,26(5):589-595.
[13]Burrow M,Wheeler D J.A Block-sorting Lossless Data Compression Algorithm,SRC-RR-124[R].Digital Equipment Corporation,1994.
[14]倪桂强,李彬,罗健欣,等.BWT与经典压缩算法研究[J].计算机与数字工程,2010,38(11):26-41.
[15]Ferragina P,Manzini G.An Experimental Study of an Opportunistic Index[C]//Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms.New York,USA:ACM Press,2001:269-278.