Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma

Halil Arslan; İbrahim Ethem Dadaş; Yunus Emre Işık

doi:10.29130/dubited.1017422

Research Article

Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma

Year 2022, Volume: 10 Issue: 3, 1433 - 1442, 31.07.2022

Halil Arslan İbrahim Ethem Dadaş Yunus Emre Işık

https://doi.org/10.29130/dubited.1017422

Abstract

Firmalarda, ihtiyaçlara yönelik gelen taleplerin doğru şekilde işlenmesi hem iş sürecini hızlandırır hem de ortaya çıkabilecek sorunları bertaraf eder. Geliştirme, destek, sorun çözme gibi farklı konulardaki taleplerin, verimli ve doğru kişilerce çözülmesi için öncelikle ilgili alt departmana yönlendirilmesi gerekir. Yönlendirmeler belirli kişilerce elle gerçekleştirilebilir. Ancak firma büyüklüğüyle doğru orantılı olarak gelen talep sayısının çok olması süreci zorlaştırıp zaman kaybına yol açmaktadır. Özellikle bilişim sektöründe hizmet veren kurumsal firmalarda taleplerin otomatik olarak alt-departmanlara aktarılabilmesi, işin verimliliğinin ciddi şekilde arttırabilir. Bu ihtiyacın giderilmesi içi metni işleyerek içerisinden kolaylıkla bilgi çıkarımını sağlayabilen metin madenciliği ve makine öğrenmesi yöntemleri kullanılabilir. Çalışmamızda, Detaysoft Danışmanlık firmasına ait gelen taleplerin doğru şekilde alt departmana yönlendirilmesini sağlayan bir sistem önerilmiştir. Sistem performansının ölçülebilmesi amacıyla gerçek müşteri taleplerinden oluşan 2103 veri toplanmış ve işaretlenmiştir. Toplanan verilerin varsayımlardan bağımsız olarak doğru şekilde işaretlenmesi için de veriye göre sınıf etiketlerinin belirlendiği temellendirilmiş teoriden faydalanılmıştır. Ham metinlerin vektörleştirilmesi için kelime çantası ve türevlerinin (TF, TFIDF) yanı sıra GloVe ve Word2Vec gibi kelime gömme yöntemleri de denenmiş ve hangi vektörleştirme yönteminin daha başarılı olduğu irdelenmiştir. Ayrıca gereksiz kelimelerin ve sadece kelime köklerinin kullanılmasının talep sınıflandırmaya etkileri analiz edilmiştir. Yapılan analizler sonucunda SVM algoritmasını kullanan modellerin %79 gibi iyi sayılabilecek bir başarım ile gelen talebi doğru şekilde sınıflandırabildiği gözlemlenmiştir. Elde edilen sonuçların, talep sınıflandırma konularındaki gelecek çalışmalara hem vektörleştirme hem de ön işlem süreçleriyle alakalı ışık tutması beklenmektedir.

Keywords

talep sınıflandırma, metin vektörleştirme, makine öğrenmesi, metin madenciliği

Thanks

Bu çalışma, Detaysoft Ar-Ge Merkez bünyesinde yürütülen çalışmaların sonucudur. Desteklerinden dolayı Detaysoft Ar-Ge Merkezine teşekkür ederiz.

References

[1]S. Ballı and O. Karasoy, ‘Development of content-based SMS classification application by using Word2Vec-based feature extraction’, IET Software, vol. 13, no. 4, pp. 295–304, 2019.
[2]G. M. Shahariar, S. Biswas, F. Omar, F. M. Shah, and S. B. Hassan, ‘Spam review detection using deep learning’, 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2019, pp. 0027–0033.
[3]T. KAŞIKÇI and H. Gökçen, ‘Metin madenciliği ile e-ticaret sitelerinin belirlenmesi’, Bilişim Teknolojileri Dergisi, c. 7, s. 1, 2013.
[4]M. Bouazizi and T. Ohtsuki, ‘Multi-class sentiment analysis in Twitter: What if classification is not the answer’, IEEE Access, vol. 6, pp. 64486–64502, 2018.
[5]A. Arifianto et al., ‘Developing an LSTM-based Classification Model of IndiHome Customer Feedbacks’, 2020 International Conference on Data Science and Its Applications (ICoDSA), 2020, pp. 1–5.
[6]P. S. Parmar, P. K. Biju, M. Shankar, and N. Kadiresan, ‘Multiclass text classification and analytics for improving customer support response through different classifiers’, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 538–542.
[7]A. Onan, E. Atik, and A. Yalçın, ‘Machine learning approach for automatic categorization of service support requests on university information management system’, International Conference on Intelligent and Fuzzy Systems, 2020, pp. 1133–1139.
[8]N. Kim and S. Hong, ‘Automatic classification of citizen requests for transportation using deep learning: Case study from Boston city’, Information Processing & Management, vol. 58, no. 1, p. 102410, 2021.
[9]A. A. Gorbunova, A. S. Trunov, and V. I. Voronov, ‘Intelligent analysis of technical support requests in Service Desk ticketing systems’, 2020 International Conference on Engineering Management of Communication and Technology (EMCTECH), 2020, pp. 1–6.
[10]R. Mitkov, The Oxford handbook of computational linguistics. Oxford University Press, 2004.
[11]C. Manning and H. Schutze, Foundations of statistical natural language processing. MIT press, 1999.
[12]T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘Efficient estimation of word representations in vector space’, arXiv preprint arXiv:1301.3781, 2013.
[13]J. Pennington, R. Socher, and C. D. Manning, ‘Glove: Global vectors for word representation’, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[14]W. Etaiwi and G. Naymat, ‘The Impact of applying Different Preprocessing Steps on Review Spam Detection’, Procedia Computer Science, vol. 113, pp. 273–279, Jan. 2017, doi: 10.1016/j.procs.2017.08.368.
[15]G. Eryiğit, ‘ITU Turkish NLP web service’, Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, pp. 1–4.
[16]F. Pedregosa et al., ‘Scikit-learn: Machine learning in Python’, the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
[17]R. Řehůřek and P. Sojka, ‘Software Framework for Topic Modelling with Large Corpora’, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 2010, pp. 45–50.
[18]T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ‘Optuna: A next-generation hyperparameter optimization framework’, Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.

Classification of Support Tickets using Different Vectorization and Pre-Processing Methods

Year 2022, Volume: 10 Issue: 3, 1433 - 1442, 31.07.2022

Halil Arslan İbrahim Ethem Dadaş Yunus Emre Işık

https://doi.org/10.29130/dubited.1017422

Abstract

Processing of support tickets in IT is important both for speeding up business processes and for fixing problems that may arise. Although forwarding can also be done by an expert who is knowledgeable about support tasks, this process can lead to a proportional waste of time depending on the size of the organisation.
Forwarding incoming support tickets to sub-departments through an automated system can increase the efficiency of task tracking, management, and time utilisation, especially in organisations that operate in many different areas. In our study, a system was proposed to correctly forward the incoming support tickets from Detaysoft Danışmanlık to the sub-department. 2103 support tickets from real customers were collected and labelled for the training and testing process to evaluate the performance of the system. In order to correctly label the data regardless of the different approaches adopted, Grounded Theory is used where the most accurate class labels can be determined based on the data. Different techniques such as GloVe and Word2Vec as well as term frequency and term frequency inverse document frequency (TF, TFIDF) were used for vectorizing the raw tickets and examined which technique gives the better result. Also, the effects of using redundant words and word roots only on support ticket classification were analysed. It is expected that the obtained results will enlighten future studies on support ticket classification in terms of both vectorization and preprocessing.

Keywords

support ticket classification, text vectorization, machine learning, text mining

References

[1]S. Ballı and O. Karasoy, ‘Development of content-based SMS classification application by using Word2Vec-based feature extraction’, IET Software, vol. 13, no. 4, pp. 295–304, 2019.
[2]G. M. Shahariar, S. Biswas, F. Omar, F. M. Shah, and S. B. Hassan, ‘Spam review detection using deep learning’, 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2019, pp. 0027–0033.
[3]T. KAŞIKÇI and H. Gökçen, ‘Metin madenciliği ile e-ticaret sitelerinin belirlenmesi’, Bilişim Teknolojileri Dergisi, c. 7, s. 1, 2013.
[4]M. Bouazizi and T. Ohtsuki, ‘Multi-class sentiment analysis in Twitter: What if classification is not the answer’, IEEE Access, vol. 6, pp. 64486–64502, 2018.
[5]A. Arifianto et al., ‘Developing an LSTM-based Classification Model of IndiHome Customer Feedbacks’, 2020 International Conference on Data Science and Its Applications (ICoDSA), 2020, pp. 1–5.
[6]P. S. Parmar, P. K. Biju, M. Shankar, and N. Kadiresan, ‘Multiclass text classification and analytics for improving customer support response through different classifiers’, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 538–542.
[7]A. Onan, E. Atik, and A. Yalçın, ‘Machine learning approach for automatic categorization of service support requests on university information management system’, International Conference on Intelligent and Fuzzy Systems, 2020, pp. 1133–1139.
[8]N. Kim and S. Hong, ‘Automatic classification of citizen requests for transportation using deep learning: Case study from Boston city’, Information Processing & Management, vol. 58, no. 1, p. 102410, 2021.
[9]A. A. Gorbunova, A. S. Trunov, and V. I. Voronov, ‘Intelligent analysis of technical support requests in Service Desk ticketing systems’, 2020 International Conference on Engineering Management of Communication and Technology (EMCTECH), 2020, pp. 1–6.
[10]R. Mitkov, The Oxford handbook of computational linguistics. Oxford University Press, 2004.
[11]C. Manning and H. Schutze, Foundations of statistical natural language processing. MIT press, 1999.
[12]T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘Efficient estimation of word representations in vector space’, arXiv preprint arXiv:1301.3781, 2013.
[13]J. Pennington, R. Socher, and C. D. Manning, ‘Glove: Global vectors for word representation’, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[14]W. Etaiwi and G. Naymat, ‘The Impact of applying Different Preprocessing Steps on Review Spam Detection’, Procedia Computer Science, vol. 113, pp. 273–279, Jan. 2017, doi: 10.1016/j.procs.2017.08.368.
[15]G. Eryiğit, ‘ITU Turkish NLP web service’, Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, pp. 1–4.
[16]F. Pedregosa et al., ‘Scikit-learn: Machine learning in Python’, the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
[17]R. Řehůřek and P. Sojka, ‘Software Framework for Topic Modelling with Large Corpora’, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 2010, pp. 45–50.
[18]T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ‘Optuna: A next-generation hyperparameter optimization framework’, Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.

There are 18 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Articles
Authors	Halil Arslan 0000-0003-3286-5159 İbrahim Ethem Dadaş This is me Yunus Emre Işık 0000-0001-6176-7545
Publication Date	July 31, 2022
Published in Issue	Year 2022 Volume: 10 Issue: 3

Cite

APA	Arslan, H., Dadaş, İ. E., & Işık, Y. E. (2022). Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma. Düzce Üniversitesi Bilim Ve Teknoloji Dergisi, 10(3), 1433-1442. https://doi.org/10.29130/dubited.1017422
AMA	Arslan H, Dadaş İE, Işık YE. Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma. DUBİTED. July 2022;10(3):1433-1442. doi:10.29130/dubited.1017422
Chicago	Arslan, Halil, İbrahim Ethem Dadaş, and Yunus Emre Işık. “Farklı Vektörleştirme Ve Ön işlem Yöntemleri Ile Talep Sınıflandırma”. Düzce Üniversitesi Bilim Ve Teknoloji Dergisi 10, no. 3 (July 2022): 1433-42. https://doi.org/10.29130/dubited.1017422.
EndNote	Arslan H, Dadaş İE, Işık YE (July 1, 2022) Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 10 3 1433–1442.
IEEE	H. Arslan, İ. E. Dadaş, and Y. E. Işık, “Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma”, DUBİTED, vol. 10, no. 3, pp. 1433–1442, 2022, doi: 10.29130/dubited.1017422.
ISNAD	Arslan, Halil et al. “Farklı Vektörleştirme Ve Ön işlem Yöntemleri Ile Talep Sınıflandırma”. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 10/3 (July 2022), 1433-1442. https://doi.org/10.29130/dubited.1017422.
JAMA	Arslan H, Dadaş İE, Işık YE. Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma. DUBİTED. 2022;10:1433–1442.
MLA	Arslan, Halil et al. “Farklı Vektörleştirme Ve Ön işlem Yöntemleri Ile Talep Sınıflandırma”. Düzce Üniversitesi Bilim Ve Teknoloji Dergisi, vol. 10, no. 3, 2022, pp. 1433-42, doi:10.29130/dubited.1017422.
Vancouver	Arslan H, Dadaş İE, Işık YE. Farklı Vektörleştirme ve Ön işlem Yöntemleri ile Talep Sınıflandırma. DUBİTED. 2022;10(3):1433-42.

Download Cover Image

Article Files

Full Text