Research Article
BibTex RIS Cite

DNA Hizalama Algoritmalarının Karşılaştırılmalı Çalışması ve Farklı Derleme Stratejileri ile Performans İyileştirmesi

Year 2024, Volume: 45 Issue: 4, 663 - 667, 30.12.2024
https://doi.org/10.17776/csj.1511642

Abstract

Yeni nesil sekanslama teknolojilerinin yaygınlaşması ve biyolojik verilerin çoğalması ile beraber DNA ve protein dizi hizalama algoritmalarının daha yüksek performans ile calıştırılmasına olan ihtiyaç artmıştır. Bu çalışma iki iyi bilinen, Needleman-Wunsch ve Smith-Waterman, DNA ve Protein dizi hizalama algoritmalarının Python programlama dilinde farklı derlenme stratejilerinin sistematik kıyaslamasıdır. Sıkça kullanılan Biopython’un ikili hizalama modülü ile kendi yazdığımız yazılımın farklı derleme yaklaşımlarının performans farkının kıyaslanması amaçlanmıştır. Sonuçlar Numba’nın just-in-time derleme yönteminin PyPy ve Cython derleme yöntemleri ya da Biopython modülüne kıyasla genel olarak daha yüksek performans gösterdiğini ortaya koymuştur. Bu çalışmanın geniş çapta dizi hizalama gerektiren yazılım protipleme çalışmalarında verimliliği arttıracağı düşünülmektedir.

References

  • [1] Zhang J., Chiodini R., Badr A., Zhang G., The impact of next-generation sequencing on genomics, Journal of Genetics and Genomics, 38 (2011) 95-109.
  • [2] McPherson J.D., Next-generation gap, Nature Methods, 6 (2019) S2-S5.
  • [3] Branton D., Deamer D. The Development of Nanopore Sequencing, Nanopore Sequencing, (2019) 1-16
  • [4] Theis T.N., Wong P.H.S., The End of Moore’s Law: A New Beginning for Information Technology, Computing in Science & Engineering, 19 (2017) 41-50.
  • [5] Needleman S.B., Wunsch C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, 48 (1970) 443-453.
  • [6] Smith T.F., Waterman M.S., Identification of common molecular subsequences, Journal of Molecular Biology, 147 (1981) 195-197.
  • [7] Gotoh O., An improved algorithm for matching biological sequences, Journal of Molecular Biology, 162 (1982) 705-708.
  • [8] Marco-Sola, S., Moure, J. C., Moreto, M., Espinosa, A., Fast Gap-Affine Pairwise Alignment Using the Wavefront Algorithm, Bioinformatics, 37 (2020) 456–463.
  • [9] Song Y.-J., Ji D. J., Seo H., Han G.-B., Cho D.-H., Pairwise Heuristic Sequence Alignment Algorithm Based on Deep Reinforcement Learning, IEEE Open Journal of Engineering in Medicine and Biology, 2 (2021) 36–43.
  • [10] Rashed A. E. E.-D., Amer H. M., El-Seddek M., Moustafa H. E.-D., Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm, IEEE Access, 9 (2021) 109522–109535.
  • [11] Nagpal A., Gabrani G., Python for Data Analytics, Scientific and Technical Applications, 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, (2019).
  • [12] Mondal S., Khatua S., Accelerating Pairwise Sequence Alignment Algorithm by MapReduce Technique for Next-Generation Sequencing (NGS) Data Analysis, Advances in Intelligent Systems and Computing, (2019) 213-220.
  • [13] Marçais G., Delcher A.L., Phillippy A.M., Coston R., Salzberg S.L., Zimin A., MUMmer4: A fast and versatile genome alignment system, PLoS Computational Biology, 14 (2018) e1005944.
  • [14] Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P., Sambamba: fast processing of NGS alignment formats, Bioinformatics, 31 (2015) 2032-2034.
  • [15] Marowka A., Python accelerators for high-performance computing, The Journal of Supercomputing, 74 (2018) 1449-1460.
  • [16] Haghi A., Marco-Sola S., Alvarez L., Diamantopoulos D., Hagleitner C., Moreto M., An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment, 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, (2021).
  • [17] Rognes, T., Faster Smith-Waterman database searches with inter-sequence SIMD parallelization, BMC Bioinformatics, 12 (2011) 1.
  • [18] Liu Y., Maskell D. L., Schmidt, B., CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes, 2, 1 (2009) 73.

A Comparative study of DNA Alignment Algorithms and Boosting Performance Using Different Compilation Strategies

Year 2024, Volume: 45 Issue: 4, 663 - 667, 30.12.2024
https://doi.org/10.17776/csj.1511642

Abstract

With the development of next generation sequencing technologies, the requirement of higher performance from DNA and Protein sequence alignment algorithms has become even greater. This work is a systematic comparison of different compilation strategies for two common DNA or Protein sequence alignment algorithms, Needleman-Wunsch and Smith-Waterman, using Python programming language. It aims to investigate the performance benefits of already widely used Biopython’s pairwise alignment module versus different compilation approaches of an in-house software. It is shown that using Numba just-in-time compiler provide greater performance overall in comparison to PyPy and Cython compilers or the Biopython module. This work may increase the efficiency of software prototyping where large-scale sequence alignment is necessary.

References

  • [1] Zhang J., Chiodini R., Badr A., Zhang G., The impact of next-generation sequencing on genomics, Journal of Genetics and Genomics, 38 (2011) 95-109.
  • [2] McPherson J.D., Next-generation gap, Nature Methods, 6 (2019) S2-S5.
  • [3] Branton D., Deamer D. The Development of Nanopore Sequencing, Nanopore Sequencing, (2019) 1-16
  • [4] Theis T.N., Wong P.H.S., The End of Moore’s Law: A New Beginning for Information Technology, Computing in Science & Engineering, 19 (2017) 41-50.
  • [5] Needleman S.B., Wunsch C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, 48 (1970) 443-453.
  • [6] Smith T.F., Waterman M.S., Identification of common molecular subsequences, Journal of Molecular Biology, 147 (1981) 195-197.
  • [7] Gotoh O., An improved algorithm for matching biological sequences, Journal of Molecular Biology, 162 (1982) 705-708.
  • [8] Marco-Sola, S., Moure, J. C., Moreto, M., Espinosa, A., Fast Gap-Affine Pairwise Alignment Using the Wavefront Algorithm, Bioinformatics, 37 (2020) 456–463.
  • [9] Song Y.-J., Ji D. J., Seo H., Han G.-B., Cho D.-H., Pairwise Heuristic Sequence Alignment Algorithm Based on Deep Reinforcement Learning, IEEE Open Journal of Engineering in Medicine and Biology, 2 (2021) 36–43.
  • [10] Rashed A. E. E.-D., Amer H. M., El-Seddek M., Moustafa H. E.-D., Sequence Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm, IEEE Access, 9 (2021) 109522–109535.
  • [11] Nagpal A., Gabrani G., Python for Data Analytics, Scientific and Technical Applications, 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, (2019).
  • [12] Mondal S., Khatua S., Accelerating Pairwise Sequence Alignment Algorithm by MapReduce Technique for Next-Generation Sequencing (NGS) Data Analysis, Advances in Intelligent Systems and Computing, (2019) 213-220.
  • [13] Marçais G., Delcher A.L., Phillippy A.M., Coston R., Salzberg S.L., Zimin A., MUMmer4: A fast and versatile genome alignment system, PLoS Computational Biology, 14 (2018) e1005944.
  • [14] Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P., Sambamba: fast processing of NGS alignment formats, Bioinformatics, 31 (2015) 2032-2034.
  • [15] Marowka A., Python accelerators for high-performance computing, The Journal of Supercomputing, 74 (2018) 1449-1460.
  • [16] Haghi A., Marco-Sola S., Alvarez L., Diamantopoulos D., Hagleitner C., Moreto M., An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment, 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, (2021).
  • [17] Rognes, T., Faster Smith-Waterman database searches with inter-sequence SIMD parallelization, BMC Bioinformatics, 12 (2011) 1.
  • [18] Liu Y., Maskell D. L., Schmidt, B., CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes, 2, 1 (2009) 73.
There are 18 citations in total.

Details

Primary Language English
Subjects Bioinformatic Methods Development, Sequence Analysis
Journal Section Natural Sciences
Authors

Osman Doluca 0000-0003-0412-6148

Publication Date December 30, 2024
Submission Date July 6, 2024
Acceptance Date December 3, 2024
Published in Issue Year 2024Volume: 45 Issue: 4

Cite

APA Doluca, O. (2024). A Comparative study of DNA Alignment Algorithms and Boosting Performance Using Different Compilation Strategies. Cumhuriyet Science Journal, 45(4), 663-667. https://doi.org/10.17776/csj.1511642