Performance of DNA Sequences Compression on Multicores and GPUs using Adaptive-GenCodexMethod

Ajith Padyana, Pallav Kumar Baruah

Abstract


The DNA sequences are huge in size and the databases are growing at an exponential rate. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The main reason for this is the invention of new species and increasing number of DNA profiles. The growth of the DNA affects the storage as well as bandwidth when these sequences need to be transferred. Applications such as DNA profiling, Real time DNA crime investigation require access to the DNA sequences in real time. The inherent property of DNA is that it contains many repeats which makes it highly compressible. However, the applications mentioned not only require good compression ratio but also needs faster compression. Multicores and GPUs can be used to perform the compression quickly. In this paper, we propose a new algorithm with a focus on the throughput along with the compression ratio. The algorithm scales well on GPUs and achieves a speedup of  11 on multicores and upto 23 on GPUs when run on M2070 Tesla card and upto 57 on K20 Kepler GPUs. We also extended this algorithm such that it adapts to the input  sequence depending on the number of consecutive repeats and accordingly chooses the
right algorithm which leads to a better compression.

Keywords


DNA Sequence; Bandwidth; Throughput; Compression Ratio; Speedup; Code byte

Full Text:

PDF

References


“DNA Sequence Compression using the Burrows-Wheeler Transform”, IEEE, 2002.

TC Bell et al., Prentice Hall, 1990.

A Grumbach & F Tahi.,“Compression of DNA sequences.”, Proceedings of the “IEEE Data Compression Conference, Snowbird, UT, USA.”,March 30April 2. 1993.

Xin Chen, Sam Kwong, Ming Li,“A Compression Algorithm for DNA Sequences.”,Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, Tokyo, Japan,April 8-11, 2000.

Xin Chen 1, Ming Li, Bin Ma, and John Tromp. “DNACompress: fast and effective DNA sequence compression”,volume 18, pages 1696-1698, 2002.

P. Raja Rajeswari & Dr Allam AppaRao, “DNABIT Compress Genome compression algorithm”, Book on Bioinformation, 2011, volume 5, pages 350-360.

P Raja Rajeswari & Dr Allam AppaRao,GENBIT Compress-Algorithm for repetitive and non repetitive DNA sequences, International Journal of Computer Science and Information Technology,2010 ,volume 2,pages 25-29.

Matsumoto,T., Sadakane,K. and Imai,H, “Biological sequence compression algorithms”, Genome Informatics Workshop, Universal Academy Press, 2002,43-52.

J. Ziv and A. Lempel,“A universal algorithm for sequential data compression ”,IEEE Trans. Inform. Theory,1977,volume -23, pages 337-343.

Ziv and Lempel,“Compression of individual sequences via variable-rate coding.”, IEEE Transactions on Information Theory, September 1978, volume 24, pages 530-536.

Ma,B., Tromp,J. and Li,M. “PatternHunter : faster and more sensitive homology search.”, Bioinformatics, 2002, pages 440-445.

Jackson, Peter, Book Chapter “Introduction To Expert Systems”, Addison Wesley, 1998 , 3rd edition,ISBN 978-0-201-87686-4.

Regina Barzilay and Daryl Mccullough and Owen Rambow and Jonathan Decristofaro and Tanya Korelsky and Benoit Lavoie and Cogentex Inc ,“A New Approach to Expert System Explanations”, In 9th International Workshop on Natural

Language Generation, 1998, pages 78-87.

Russell, Stuart J.; Norvig, Peter, “Artificial Intelligence: A modern approach”, Prentice Ha, 2003, 2nd Edition.

Nils J. Nilsson, “Artificial Intelligence: A New Synthesis”, Morgan Kaufmann Publishers, San Francisco, 1998. [16] McCorduck, Pamela, “Machines Who Think”, A. K. Peters, Ltd., 2004.

Koch, C.G., Isle, B.A. ; Butler, A.W.,“Intelligent user interface for expert systems applied to power plant maintenance and troubleshooting”, IEEE Transactions, March 1988, 3rd volume.




DOI: http://dx.doi.org/10.26713%2Fjims.v7i3.309

eISSN 0975-5748; pISSN 0974-875X