Multiple sequence alignment quality comparison in T-Coffee, MUSCLE and M-Coffee based on different benchmarks

Tuğcan Korak; Fırat Aşır; Esin Işık; Nur Cengiz

doi:10.17776/csj.842265

Research Article

BibTex

RIS

Cite

Year 2021, Volume: 42 Issue: 3, 526 - 535, 24.09.2021

Tuğcan Korak , Fırat Aşır , Esin Işık Nur Cengiz

https://doi.org/10.17776/csj.842265

Abstract

References

[1]Notredame C., Recent evolutions of multiple sequence alignment algorithms, PLoS Comput. Biol., 3(8) (2007) e123.
[2] Edgar R.C., MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 32(5) (2004) 1792-1797.
[3] Moretti S., Armougom F., Wallace I.M., Higgins D.G., Jongeneel C.V., Notredame C., The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods, Nucleic Acids Res., 35(Web Server issue) (2007) W645-648.
[4] Wang Y., Wu H., Cai Y., A benchmark study of sequence alignment methods for protein clustering, BMC Bioinformatics, 19(Suppl 19) (2018) 529.
[5] Maiolo M., Zhang X., Gil M., Anisimova M., Progressive multiple sequence alignment with indel evolution, BMC Bioinformatics, 19(1) (2018) 331.
[6] Bawono P., Dijkstra M., Pirovano W., Feenstra A., Abeln S., Heringa J., Multiple Sequence Alignment, Methods Mol. Biol., 1525 (2017) 167-189.
[7] Ugurel O.M., Ata O., Turgut-Balik D., An updated analysis of variations in SARS-CoV-2 genome, Turk. J. Biol., 44(3) (2020) 157-167.
[8] Notredame C., Higgins D.G., Heringa J., T-Coffee: A novel method for fast and accurate multiple sequence alignment, J. Mol. Biol., 302(1) (2000) 205-217.
[9] Edgar R.C., MUSCLE: a multiple sequence alignment method with reduced time and space complexity, BMC Bioinformatics, 5 (2004) 113.
[10] Edgar R.C., Batzoglou S., Multiple sequence alignment, Current Opinion in Structural Biology, 16(3) (2006) 368-373.
[11] Thompson J.D., Higgins D.G., Gibson T.J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res., 22(22) (1994) 4673-4680.
[12] Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G., Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol. Syst. Biol., 7 (2011) 539.
[13] Van Walle I., Lasters I., Wyns L., Align-m--a new algorithm for multiple alignment of highly divergent sequences, Bioinformatics, 20(9) (2004) 1428-1435.
[14] Morgenstern B., Frech K., Dress A., Werner T., DIALIGN: finding local similarities by multiple sequence alignment, Bioinformatics, 14(3) (1998) 290-294.
[15] Lassmann T., Sonnhammer E.L.L., Kalign – an accurate and fast multiple sequence alignment algorithm, BMC Bioinformatics, 6(1) (2005) 298.
[16] Katoh K., Misawa K., Kuma K., Miyata T., MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res., 30(14) (2002) 3059-3066.
[17] Katoh K., Standley D.M., MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability, Molecular Biology and Evolution, 30(4) (2013) 772-780.
[18] Do C.B., Mahabhashyam M.S., Brudno M., Batzoglou S., ProbCons: Probabilistic consistency-based multiple sequence alignment, Genome Res., 15(2) (2005) 330-40.
[19] Pei J., Kim B.H., Grishin N.V., PROMALS3D: a tool for multiple protein sequence and structure alignments, Nucleic Acids Res., 36(7) (2008) 2295-2300.
[20] O'Sullivan O., Suhre K., Abergel C., Higgins D.G., Notredame C., 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments, Journal of Molecular Biology, 340(2) (2004) 385-395.
[21] Zou Q., Hu Q., Guo M., Wang G., HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy, Bioinformatics, 31(15) (2015) 2475-2481.
[22] Armougom F., Moretti S., Poirot O., Audic S., Dumas P., Schaeli B., Keduas V., Notredame C., Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee, Nucleic acids research, 34(Web Server issue) (2006) W604-W608.
[23] Löytynoja A., Goldman N., An algorithm for progressive multiple alignment of sequences with insertions, Proc. Natl. Acad. Sci. U S A, 102(30) (2005) 10557-10562.
[24] Löytynoja A., Goldman N., Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis, Science, 320(5883) (2008) 1632-1635.
[25] Pei J., Grishin N.V., MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information, Nucleic Acids Research, 34(16) (2006) 4364-4374.
[26] Kemena C., Notredame C., Upcoming challenges for multiple sequence alignment methods in the high-throughput era, Bioinformatics, 25(19) (2009) 2455-2465.
[27] Wallace I.,M., O'Sullivan O., Higgins D.G., Notredame C., M-Coffee: combining multiple sequence alignment methods with T-Coffee, Nucleic Acids Research, 34(6) (2006) 1692-1699.
[28] Rosenberg M.S., Sequence alignment: Methods, models, concepts, and strategies, In: Rosenberg M.S., (Ed) California: University of California Press, (2009).
[29] Subramanian A.R., Kaufmann M., Morgenstern B., DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment, Algorithms for molecular biology:AMB, 3 (2008) 6.
[30] Pais F.S., Ruy P.C.,Oliveira G., Coimbra R.S., Assessing the efficiency of multiple sequence alignment programs, Algorithms Mol. Biol., 9(1) (2014) 4.
[31] Menke M., Berger B., Cowen L., Matt: local flexibility aids protein multiple structure alignment, PLoS Comput. Biol., 4(1) (2008) e10.
[32] Van Walle I., Lasters I., Wyns L., SABmark--a benchmark for sequence alignment that covers the entire known fold space, Bioinformatics, 21(7) (2005) 1267-1268.
[33] Anderson C.L., SuiteMSA User's Manual. Nebreska, USA: University of Nebraska–Lincoln, (2011).
[34] Saeed F., Perez-Rathke A., Gwarnicki J., Berger-Wolf T., Khokhar A., A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes, Journal of Parallel and Distributed Computing, 72(1) (2012) 83-93.
[35] Waterhouse A.M., Procter J.B., Martin D.M., Clamp M., Barton G.J., Jalview Version 2--a multiple sequence alignment editor and analysis workbench, Bioinformatics, 25(9) (2009) 1189-1191.
[36] Anderson C.L., Strope C.L., Moriyama E.N., SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation, BMC Bioinformatics, 12(1) (2011) 184.

Multiple sequence alignment quality comparison in T-Coffee, MUSCLE and M-Coffee based on different benchmarks

Year 2021, Volume: 42 Issue: 3, 526 - 535, 24.09.2021

Tuğcan Korak , Fırat Aşır , Esin Işık Nur Cengiz

https://doi.org/10.17776/csj.842265

Abstract

Multiple sequence alignment (MSA) is a fundamental process in the studies for determination of evolutionary, structural and functional relationships of biological sequences or organisms. There are various heuristic approaches comparing more than two sequences to generate MSA. However, each tool used for MSA is not suitable for every dataset. Considering the importance of MSA in wide range of relationship studies, we were interested in comparing the performance of different MSA tools for various datasets. In this study, we applied three different MSA tools, T-Coffee, MUSCLE and M-Coffee, on several datasets, BAliBase, SABmark, DIRMBASE, ProteinBali and DNABali. It was aimed to evaluate the differences in the performance of these tools based on the stated benchmarks regarding the % consistency, sum of pairs (SP) and column scores (CS) by using Suite MSA. We also calculated the average values of these scores for each tool to examine the results in comparative perspective. Eventually, we conclude that all three tools performed their best with the datasets from ProteinBali (average % consistency: 29.6, 32.3, 29.7; SP: 0.74, 0.73, 0.74; CS with gaps: 0.27, 0.27, 0.26 for T-Coffee, MUSCLE, M-Coffee, respectively), whereas the lowest performance was obtained in datasets from DIRMBASE (average % consistency: 1.8, 1.1, 4.3; SP: 0.05, 0.04, 0.04 CS with gaps: 0.01, 0, 0.008 for T-Coffee, MUSCLE, M-Coffee, respectively)

Keywords

Multiple sequence alignment , T-Coffee , MUSCLE , M-Coffee

Thanks

The authors acknowledge Prof.Dr. Jens Allmer for his guidance in the conceptualization of this study.

References

[1]Notredame C., Recent evolutions of multiple sequence alignment algorithms, PLoS Comput. Biol., 3(8) (2007) e123.
[2] Edgar R.C., MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 32(5) (2004) 1792-1797.
[3] Moretti S., Armougom F., Wallace I.M., Higgins D.G., Jongeneel C.V., Notredame C., The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods, Nucleic Acids Res., 35(Web Server issue) (2007) W645-648.
[4] Wang Y., Wu H., Cai Y., A benchmark study of sequence alignment methods for protein clustering, BMC Bioinformatics, 19(Suppl 19) (2018) 529.
[5] Maiolo M., Zhang X., Gil M., Anisimova M., Progressive multiple sequence alignment with indel evolution, BMC Bioinformatics, 19(1) (2018) 331.
[6] Bawono P., Dijkstra M., Pirovano W., Feenstra A., Abeln S., Heringa J., Multiple Sequence Alignment, Methods Mol. Biol., 1525 (2017) 167-189.
[7] Ugurel O.M., Ata O., Turgut-Balik D., An updated analysis of variations in SARS-CoV-2 genome, Turk. J. Biol., 44(3) (2020) 157-167.
[8] Notredame C., Higgins D.G., Heringa J., T-Coffee: A novel method for fast and accurate multiple sequence alignment, J. Mol. Biol., 302(1) (2000) 205-217.
[9] Edgar R.C., MUSCLE: a multiple sequence alignment method with reduced time and space complexity, BMC Bioinformatics, 5 (2004) 113.
[10] Edgar R.C., Batzoglou S., Multiple sequence alignment, Current Opinion in Structural Biology, 16(3) (2006) 368-373.
[11] Thompson J.D., Higgins D.G., Gibson T.J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res., 22(22) (1994) 4673-4680.
[12] Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G., Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol. Syst. Biol., 7 (2011) 539.
[13] Van Walle I., Lasters I., Wyns L., Align-m--a new algorithm for multiple alignment of highly divergent sequences, Bioinformatics, 20(9) (2004) 1428-1435.
[14] Morgenstern B., Frech K., Dress A., Werner T., DIALIGN: finding local similarities by multiple sequence alignment, Bioinformatics, 14(3) (1998) 290-294.
[15] Lassmann T., Sonnhammer E.L.L., Kalign – an accurate and fast multiple sequence alignment algorithm, BMC Bioinformatics, 6(1) (2005) 298.
[16] Katoh K., Misawa K., Kuma K., Miyata T., MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res., 30(14) (2002) 3059-3066.
[17] Katoh K., Standley D.M., MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability, Molecular Biology and Evolution, 30(4) (2013) 772-780.
[18] Do C.B., Mahabhashyam M.S., Brudno M., Batzoglou S., ProbCons: Probabilistic consistency-based multiple sequence alignment, Genome Res., 15(2) (2005) 330-40.
[19] Pei J., Kim B.H., Grishin N.V., PROMALS3D: a tool for multiple protein sequence and structure alignments, Nucleic Acids Res., 36(7) (2008) 2295-2300.
[20] O'Sullivan O., Suhre K., Abergel C., Higgins D.G., Notredame C., 3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments, Journal of Molecular Biology, 340(2) (2004) 385-395.
[21] Zou Q., Hu Q., Guo M., Wang G., HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy, Bioinformatics, 31(15) (2015) 2475-2481.
[22] Armougom F., Moretti S., Poirot O., Audic S., Dumas P., Schaeli B., Keduas V., Notredame C., Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee, Nucleic acids research, 34(Web Server issue) (2006) W604-W608.
[23] Löytynoja A., Goldman N., An algorithm for progressive multiple alignment of sequences with insertions, Proc. Natl. Acad. Sci. U S A, 102(30) (2005) 10557-10562.
[24] Löytynoja A., Goldman N., Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis, Science, 320(5883) (2008) 1632-1635.
[25] Pei J., Grishin N.V., MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information, Nucleic Acids Research, 34(16) (2006) 4364-4374.
[26] Kemena C., Notredame C., Upcoming challenges for multiple sequence alignment methods in the high-throughput era, Bioinformatics, 25(19) (2009) 2455-2465.
[27] Wallace I.,M., O'Sullivan O., Higgins D.G., Notredame C., M-Coffee: combining multiple sequence alignment methods with T-Coffee, Nucleic Acids Research, 34(6) (2006) 1692-1699.
[28] Rosenberg M.S., Sequence alignment: Methods, models, concepts, and strategies, In: Rosenberg M.S., (Ed) California: University of California Press, (2009).
[29] Subramanian A.R., Kaufmann M., Morgenstern B., DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment, Algorithms for molecular biology:AMB, 3 (2008) 6.
[30] Pais F.S., Ruy P.C.,Oliveira G., Coimbra R.S., Assessing the efficiency of multiple sequence alignment programs, Algorithms Mol. Biol., 9(1) (2014) 4.
[31] Menke M., Berger B., Cowen L., Matt: local flexibility aids protein multiple structure alignment, PLoS Comput. Biol., 4(1) (2008) e10.
[32] Van Walle I., Lasters I., Wyns L., SABmark--a benchmark for sequence alignment that covers the entire known fold space, Bioinformatics, 21(7) (2005) 1267-1268.
[33] Anderson C.L., SuiteMSA User's Manual. Nebreska, USA: University of Nebraska–Lincoln, (2011).
[34] Saeed F., Perez-Rathke A., Gwarnicki J., Berger-Wolf T., Khokhar A., A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes, Journal of Parallel and Distributed Computing, 72(1) (2012) 83-93.
[35] Waterhouse A.M., Procter J.B., Martin D.M., Clamp M., Barton G.J., Jalview Version 2--a multiple sequence alignment editor and analysis workbench, Bioinformatics, 25(9) (2009) 1189-1191.
[36] Anderson C.L., Strope C.L., Moriyama E.N., SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation, BMC Bioinformatics, 12(1) (2011) 184.

There are 36 citations in total.

Details

Primary Language	English
Subjects	Structural Biology
Journal Section	Research Article
Authors	Tuğcan Korak 0000-0003-4902-4022 Fırat Aşır 0000-0002-6384-9146 Esin Işık 0000-0001-7635-8496 Nur Cengiz 0000-0001-5248-3612
Publication Date	September 24, 2021
Submission Date	December 17, 2020
Acceptance Date	August 29, 2021
Published in Issue	Year 2021 Volume: 42 Issue: 3

Cite

APA	Korak, T., Aşır, F., Işık, E., Cengiz, N. (2021). Multiple sequence alignment quality comparison in T-Coffee, MUSCLE and M-Coffee based on different benchmarks. Cumhuriyet Science Journal, 42(3), 526-535. https://doi.org/10.17776/csj.842265

Download Cover Image

Article Files

Full Text