Research Article
BibTex RIS Cite

Molecular pKa Prediction with Deep Learning and Chemical Fingerprints

Year 2025, Volume: 46 Issue: 2, 233 - 239, 30.06.2025
https://doi.org/10.17776/csj.1576821

Abstract

Today, drug discovery and design, the determination of molecular properties, in particular the determination of a molecule's pKa value, is essential for understanding and optimising the biological activity of drugs. In this context, in addition to traditional chemical methods, artificial intelligence techniques such as machine learning and deep learning are increasingly used to predict molecular properties and drug design processes. In this paper, we present an approach that investigates the effect of molecular properties on pKa prediction and implements this prediction using a deep learning model. The model considers molecular weight together with chemical fingerprinting methods such as Morgan fingerprinting to represent molecular structures. The dataset used in this study contains 2093 molecular data points obtained from PubChem. The method presented in the paper predicts the pKa values of many molecules with 96.66% accuracy. This can save time and money in the drug discovery, design process, and provide valuable guidance for experimental studies. The paper also presents a comprehensive analysis of the training process, accuracy metrics and performance of the deep learning model. Finally, this paper presents research that evaluates the impact of molecular features on pKa prediction and demonstrates the success of the deep learning model in these predictions

References

  • [1] Gao J., Truhlar D.G., Quantum mechanical methods for enzyme kinetics, Annu Rev Phys Chem., 53 (2002) 467-505.
  • [2] Ho J., Coote M.L., First-principles prediction of acidities in the gas and solution phase, WIREs Comput Mol Sci. 1(5) (2011) 649-60.
  • [3] Cramer C.J., Truhlar D.G., Density functional theory for transition metals and transition metal chemistry, Phys Chem Chem Phys. 11(46) (2009) 10757-10816.
  • [4] Xu Y., Dai Z., Chen F., Gao S., Pei J., Lai L., Deep learning for drug-induced liver injury, J Chem Inf Model. 55(10) (2015) 2085-2093.
  • [5] Wang S., Guo Y., Wang Y., Sun H., Huang J., SMILES-BERT: Large scale unsupervised pre-training for molecular property prediction. Proc 10th ACM Int Conf Bioinformatics, Comput Biol Health Inform., (2019) 429-436.
  • [6] Mayr A., Klambauer G., Unterthiner T., Hochreiter S., DeepTox: Toxicity prediction using deep learning, Front Environ Sci., (2016)
  • [7] Ramsundar B., Eastman P., Walters P., Pande V., Deep learning for the life sciences, Sebastopol (CA): O'Reilly Media, (2015).
  • [8] Feinberg E.N., Sur D., Wu Z., Husic B.E., Mai H., Li Y., PotentialNet for molecular property prediction, ACS Cent Sci., 4(11) (2018) 1520-1530.
  • [9] Gilmer J., Schoenholz S.S., Riley P.F., Vinyals O., Dahl G.E., Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning, (2017) 1263-1272.
  • [10] Wu Z., Ramsundar B., Feinberg E.N., Gomes J., Geniesse C., Pappu A.S., MoleculeNet: A benchmark for molecular machine learning, Chem Sci., 9 (2018) 513-530.
  • [11] Rupp M., Tkatchenko A., Müller K.R., von Lilienfeld O.A., Fast and accurate modeling of molecular atomization energies with machine learning, Phys Rev Lett., 108(5) (2012) 058301.
  • [12] Faber F.A., Hutchison L., Huang B., von Lilienfeld O.A., Baitz G.J., Prediction errors of molecular machine learning models lower than hybrid DFT error, J. Chem. Theory Comput., 13(11) (2017) 5255-5264.
  • [13] Schütt K.T., Kindermans P.J., Sauceda H.E., Chmiela S., Tkatchenko A., Müller K.R., SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In: Proceedings of the 31st Conference on Neural Information Processing Systems, (2017) 991-1001.
  • [14] De Cao N., Kipf T., MolGAN: An implicit generative model for small molecular graphs. arXiv Preprint arXiv:1805.11973, (2018).
  • [15] Kearnes S., McCloskey K., Berndl M., Pande V., Riley P., Molecular graph convolutions: Moving beyond fingerprints, J. Comput. Aided Mo.l Des., 30 (2016) 595-608.
  • [16] Winter R., Montanari F., Noé F., Clevert D.A., Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations, Chem. Sci., 10(6) (2019) 1692-701.
  • [17] Altae-Tran H., Ramsundar B., Pappu A.S., Pande V., Low data drug discovery with one-shot learning, ACS Cent. Sci., 3(4) (2017) 283-93.
  • [18] Ragoza M., Hochuli J., Idrobo E., Sunseri J., Koes D.R., Protein-ligand scoring with convolutional neural networks, J. Chem. Inf. Model., 57(4) (2017) 942-957.
  • [19] Rogers D., Hahn M., Extended-connectivity fingerprints, J. Chem. Inf Model, 50(5) (2010) 742-54.
  • [20] Weininger D., SMILES, a chemical and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Model, 28(1) (1988) 31-6.
  • [21] Daylight Chemical Information Systems. SMILES: Simplified molecular input line entry system. Available at: https://www.daylight.com/smiles/. Retrieved October 2, 2023.
  • [22] LeCun Y., Bengio Y., Hinton G., Deep learning. Nature, 521(7553) (2015) 436-44.
  • [23] Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge (MA): MIT Press, (2016).
  • [24] Avcu F.M., Clustering honey samples with unsupervised machine learning methods using FTIR data, An. Acad. Bras. Cienc., 96(1) (2024).
  • [25] Jouppi N.P., Young C., Patil N., Patterson D., Agrawal G., Bajwa R., In-datacenter performance analysis of a tensor processing unit, In: Proceedings of the 44th Annual International Symposium on Computer Architecture, (2017) 1-12.
  • [26] Jain A.K., Murty M.N., Flynn P.J., Data clustering: a review. ACM Comput Surv., 31(3) (1999) 264-323.
  • [27] Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014) 1929–1958.
  • [28] Zaremba W., Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
  • [29] Ruder S., An overview of gradient descent optimization algorithms, arXiv Preprint arXiv:1609.04747 (2017).
  • [30] Karakaplan M., Avcu F.M., Classification of some chemical drugs by genetic algorithm and deep neural network hybrid method, Concurr Comput Pract Exp., 33(13) (2021).
Year 2025, Volume: 46 Issue: 2, 233 - 239, 30.06.2025
https://doi.org/10.17776/csj.1576821

Abstract

References

  • [1] Gao J., Truhlar D.G., Quantum mechanical methods for enzyme kinetics, Annu Rev Phys Chem., 53 (2002) 467-505.
  • [2] Ho J., Coote M.L., First-principles prediction of acidities in the gas and solution phase, WIREs Comput Mol Sci. 1(5) (2011) 649-60.
  • [3] Cramer C.J., Truhlar D.G., Density functional theory for transition metals and transition metal chemistry, Phys Chem Chem Phys. 11(46) (2009) 10757-10816.
  • [4] Xu Y., Dai Z., Chen F., Gao S., Pei J., Lai L., Deep learning for drug-induced liver injury, J Chem Inf Model. 55(10) (2015) 2085-2093.
  • [5] Wang S., Guo Y., Wang Y., Sun H., Huang J., SMILES-BERT: Large scale unsupervised pre-training for molecular property prediction. Proc 10th ACM Int Conf Bioinformatics, Comput Biol Health Inform., (2019) 429-436.
  • [6] Mayr A., Klambauer G., Unterthiner T., Hochreiter S., DeepTox: Toxicity prediction using deep learning, Front Environ Sci., (2016)
  • [7] Ramsundar B., Eastman P., Walters P., Pande V., Deep learning for the life sciences, Sebastopol (CA): O'Reilly Media, (2015).
  • [8] Feinberg E.N., Sur D., Wu Z., Husic B.E., Mai H., Li Y., PotentialNet for molecular property prediction, ACS Cent Sci., 4(11) (2018) 1520-1530.
  • [9] Gilmer J., Schoenholz S.S., Riley P.F., Vinyals O., Dahl G.E., Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning, (2017) 1263-1272.
  • [10] Wu Z., Ramsundar B., Feinberg E.N., Gomes J., Geniesse C., Pappu A.S., MoleculeNet: A benchmark for molecular machine learning, Chem Sci., 9 (2018) 513-530.
  • [11] Rupp M., Tkatchenko A., Müller K.R., von Lilienfeld O.A., Fast and accurate modeling of molecular atomization energies with machine learning, Phys Rev Lett., 108(5) (2012) 058301.
  • [12] Faber F.A., Hutchison L., Huang B., von Lilienfeld O.A., Baitz G.J., Prediction errors of molecular machine learning models lower than hybrid DFT error, J. Chem. Theory Comput., 13(11) (2017) 5255-5264.
  • [13] Schütt K.T., Kindermans P.J., Sauceda H.E., Chmiela S., Tkatchenko A., Müller K.R., SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In: Proceedings of the 31st Conference on Neural Information Processing Systems, (2017) 991-1001.
  • [14] De Cao N., Kipf T., MolGAN: An implicit generative model for small molecular graphs. arXiv Preprint arXiv:1805.11973, (2018).
  • [15] Kearnes S., McCloskey K., Berndl M., Pande V., Riley P., Molecular graph convolutions: Moving beyond fingerprints, J. Comput. Aided Mo.l Des., 30 (2016) 595-608.
  • [16] Winter R., Montanari F., Noé F., Clevert D.A., Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations, Chem. Sci., 10(6) (2019) 1692-701.
  • [17] Altae-Tran H., Ramsundar B., Pappu A.S., Pande V., Low data drug discovery with one-shot learning, ACS Cent. Sci., 3(4) (2017) 283-93.
  • [18] Ragoza M., Hochuli J., Idrobo E., Sunseri J., Koes D.R., Protein-ligand scoring with convolutional neural networks, J. Chem. Inf. Model., 57(4) (2017) 942-957.
  • [19] Rogers D., Hahn M., Extended-connectivity fingerprints, J. Chem. Inf Model, 50(5) (2010) 742-54.
  • [20] Weininger D., SMILES, a chemical and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Model, 28(1) (1988) 31-6.
  • [21] Daylight Chemical Information Systems. SMILES: Simplified molecular input line entry system. Available at: https://www.daylight.com/smiles/. Retrieved October 2, 2023.
  • [22] LeCun Y., Bengio Y., Hinton G., Deep learning. Nature, 521(7553) (2015) 436-44.
  • [23] Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge (MA): MIT Press, (2016).
  • [24] Avcu F.M., Clustering honey samples with unsupervised machine learning methods using FTIR data, An. Acad. Bras. Cienc., 96(1) (2024).
  • [25] Jouppi N.P., Young C., Patil N., Patterson D., Agrawal G., Bajwa R., In-datacenter performance analysis of a tensor processing unit, In: Proceedings of the 44th Annual International Symposium on Computer Architecture, (2017) 1-12.
  • [26] Jain A.K., Murty M.N., Flynn P.J., Data clustering: a review. ACM Comput Surv., 31(3) (1999) 264-323.
  • [27] Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014) 1929–1958.
  • [28] Zaremba W., Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
  • [29] Ruder S., An overview of gradient descent optimization algorithms, arXiv Preprint arXiv:1609.04747 (2017).
  • [30] Karakaplan M., Avcu F.M., Classification of some chemical drugs by genetic algorithm and deep neural network hybrid method, Concurr Comput Pract Exp., 33(13) (2021).
There are 30 citations in total.

Details

Primary Language English
Subjects Quality Assurance, Chemometrics, Traceability and Metrological Chemistry
Journal Section Natural Sciences
Authors

Fatih Mehmet Avcu 0000-0002-1973-7745

Publication Date June 30, 2025
Submission Date October 31, 2024
Acceptance Date April 28, 2025
Published in Issue Year 2025Volume: 46 Issue: 2

Cite

APA Avcu, F. M. (2025). Molecular pKa Prediction with Deep Learning and Chemical Fingerprints. Cumhuriyet Science Journal, 46(2), 233-239. https://doi.org/10.17776/csj.1576821