PERBANDINGAN METODE CART DAN NAÏVE BAYES UNTUK KLASIFIKASI CUSTOMER CHURN

Authors

  • Rahmat Ryan Adhitya Universitas Jenderal Achmad Yani
  • Wina Witanti Universitas Jenderal Achmad Yani
  • Rezki Yuniarti Universitas Jenderal Achmad Yani

DOI:

https://doi.org/10.31949/infotech.v9i2.5641

Keywords:

Classification, CART, Naive Bayes, Confusion Matrix

Abstract

Classification is the process of identifying and grouping an object into the same group or category Classification can be used to group a large-sized dataset, and some commonly used classification methods are CART (Classification And Regression Tree) and Naïve Bayes. This study discusses the comparison of CART and Naïve Bayes methods by measuring accuracy, precision, recall, and f1-score values with 3 scenarios of training and testing dataset distribution. Accuracy, precision, recall, and f1-score measurements are performed using a confusion matrix. The scenarios for training and testing dataset division are 70%, 80%, and 90% of the training dataset. From the results of the study, CART has the highest average accuracy and f1-score of 79.616% and 57.636% respectively, while the highest average accuracy and f1-score of Naïve Bayes are 75.104% and 62.004% respectively.

Downloads

Download data is not yet available.

References

Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A Survey on Churn Analysis in Various Business Domains. IEEE Access, 8, 220816–220839. https://doi.org/10.1109/ACCESS.2020.3042657

Al-Harbi, O. (2019). A Comparative Study of Feature Selection Methods for Dialectal Arabic Sentiment Classification Using Support Vector Machine. International Journal of Computer Science and Network Security, 19(1), 167–176. https://doi.org/10.48550/arXiv.1902.06242

Alverina, D., Chrismanto, A. R., & Santosa, R. G. (2018). Perbandingan Algoritma C4.5 dan CART dalam Memprediksi Kategori Indeks Prestasi Mahasiswa. Jurnal Teknologi Dan Sistem Komputer, 6(2), 76–83. https://doi.org/10.14710/jtsiskom.6.2.2018.76-83

Arora, A., Gupta, B., Uttarakhand, P., & Rawat, I. A. (2017). Analysis of Various Decision Tree Algorithms for Classification in Data Mining. International Journal of Computer Applications, 163(8), 15–19.

Bagul, N., Berad, P., Surana, P., & Khachane, C. (2021). Retail Customer Churn Analysis using RFM Model and K-Means Clustering. International Journal of Engineering Research & Technology, 10(03), 349–354. https://doi.org/DOI : 10.17577/IJERTV10IS030170

Bolón-Canedo, V., & Alonso-Betanzos, A. (2019). Ensembles for feature selection: A review and future trends. Information Fusion, 52(1), 1–12. https://doi.org/10.1016/j.inffus.2018.11.008

Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics, 8(4), 1–21. https://doi.org/10.3390/informatics8040079

Ghasemi, F., Neysiani, B. S., & Nematbakhsh, N. (2020). Feature selection in pre-diagnosis heart coronary artery disease detection. 6th International Conference on Web Research (ICWR), 6, 27–32. https://doi.org/10.1109/ICWR49608.2020.9122285

Hadyan Tisantri, D., Cahya Wihandika, R., & Adinugroho, S. (2019). Prediksi Keputusan Pelanggan Menggunakan Extreme Learning Machine Pada Data Telco Customer Churn. Jurnal Pengembangan Teknologi Informasi Dan Ilmu KomputerJurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(11), 10516–10523.

Halibas, A. S., Cherian Matthew, A., Pillai, I. G., Harold Reazol, J., Delvo, E. G., & Bonachita Reazol, L. (2019). Determining the intervening effects of exploratory data analysis and feature engineering in telecoms customer churn modelling. 2019 4th MEC International Conference on Big Data and Smart City, 1–7. https://doi.org/10.1109/ICBDSC.2019.8645578

Hanifa, T. T., Adiwijaya, & Al-faraby, S. (2017). Analisis Churn Prediction pada Data Pelanggan PT. Telekomunikasi dengan Logistic Regression dan Underbagging. E-Proceeding of Engineering, 4(2), 78.

Hary Candana, E. W., Gede, I., Gunadi, A., & Divayana, D. G. H. (2021). Perbandingan Fuzzy Tsukamoto, Mamdini Dan Sugeno Dalam Penentuan Hari Baik Pernikahan Berdasarkan Wariga Menggunakan Confusion Matrix. Jurnal Ilmu Komputer Indonesia, 6(2), 14–22.

Hasibuan, M. R., & Marji. (2019). Pemilihan Fitur dengan Information Gain untuk Klasifikasi Penyakit Gagal Ginjal menggunakan Metode Modified K-Nearest Neighbor (MKNN). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(11), 10435–10443. http://j-ptiik.ub.ac.id

Hasnain, M., Pasha, M. F., Ghani, I., Imran, M., Alzahrani, M. Y., & Budiarto, R. (2020). Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access, 8, 90847–90861. https://doi.org/10.1109/ACCESS.2020.2994222

Insan, N., Hadijati, M., & Irwansyah, I. (2020). Perbandingan Metode Classification and Regression Trees (CART) dengan Naïve Bayes Classification (NBC) dalam Klasifikasi Status Gizi Balita di Kelurahan Pagesangan Barat. Eigen Mathematics Journal, 3(1), 14. https://doi.org/10.29303/emj.v1i2.68

Irmanda, H. N., Astriratma, R., & Afrizal, S. (2019). Perbandingan Metode Jaringan Syaraf Tiruan Dan Pohon Keputusan Untuk Prediksi Churn. JSI: Jurnal Sistem Informasi (E-Journal), 11(2), 1817–1825. https://doi.org/10.36706/jsi.v11i2.9286

Jones, A. H. S., & Makmun, M. S. (2021). Implementasi Metode CART untuk Klasifikasi Diagnosis Penyakit Hepatitis Pada Anak. Journal of Informatics, Information System, Software Engineering and Applications, 3(2), 61–70. https://doi.org/10.20895/INISTA.V3I2

Kaharudin, Pradana, M. G., & Kusrini. (2019). Prediksi Customer Churn Perusahaan Telekomunikasi Menggunakan Naïve Bayes Dan K-Nearest Neighbor. Jurnal Informasi Interaktif, 4(3), 165–171.

Mantovani, R. G., Horváth, T., Cerri, R., Junior, S. B., Vanschoren, J., & de Carvalho, A. C. P. de L. F. (2018). An empirical study on hyperparameter tuning of decision trees. https://doi.org/https://doi.org/10.48550/arXiv.1812.02207

Nalatissifa, H., & Pardede, H. F. (2021). Customer Decision Prediction Using Deep Neural Network on Telco Customer Churn Data. Jurnal Elektronika Dan Telekomunikasi, 21(2), 122–127. https://doi.org/10.14203/jet.v21.122-127

Nguyen, T. H., & Zucker, J. D. (2019). Enhancing metagenome-based disease prediction by unsupervised binning approaches. Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019, 1–5. https://doi.org/10.1109/KSE.2019.8919295

Nikmatun, I. A., & Waspada, I. (2019). Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor. Jurnal SIMETRIS, 10(2), 421–432.

Novendri, R., & Andreswari, R. (2021). Implementasi Data Mining Untuk Memprediksi Customer Churn Menggunakan Algoritma Naive Bayes. E-Proceeding of Engineering, 8(2), 2762–2773.

Oseki, Y., Yang, C., & Marantz, A. (2019). Modeling Hierarchical Syntactic Structures in Morphological Processing. Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 43–52. https://doi.org/10.18653/v1/w19-2905

Prabawati, N. I., Widodo, & Duskarnaen, M. F. (2019). Kinerja Algoritma Classification a nd Regression Tree ( Cart ) da lam Mengklasifikasikan Lama Masa Studi Mahasiswa y ang Mengikuti Organisasi d i Universitas Negeri Jakarta Avalaiable at : Avalaiable at : Jurnal Pinter, 3(2), 139–145.

Pradana, E. (2018). Analisis Penerapan Adaptive Boosting ( Adaboost ) Dalam Meningkatkan Performasi Algoritma C4.5. Jurnal Teknologi Pelita Bangsa, 96.

Praningki, T., & Budi, I. (2018). Sistem Prediksi Penyakit Kanker Serviks Menggunakan CART, Naive Bayes, dan k-NN. Creative Information Technology Journal, 4(2), 83. https://doi.org/10.24076/citec.2017v4i2.100

Prasetiyowati, M. I., Maulidevi, N. U., & Surendro, K. (2021). Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. Journal of Big Data, 8(1), 22. https://doi.org/10.1186/s40537-021-00472-4

Riyanto, E. A., Juninisvianty, T., Nasution, D. F., & Risnandar, R. (2021). Analisis Kinerja Algoritma CART dan Naive Bayes Berbasis Particle Swarm Optimization (PSO) untuk Klasifikasi Kelayakan Kredit Koperasi. Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(1), 55. https://doi.org/10.25126/jtiik.0812988

Santra, A. K., & Christy, C. J. (2012). Genetic Algorithm and Confusion Matrix for Document Clustering. International Journal of Computer Science, 3(2), 322–328. http://ijcsi.org/papers/IJCSI-9-1-2-322-328.pdf

Setyaningsih, E. R., & Listiowarni, I. (2021). Categorization of Exam Questions based on Bloom Taxonomy using Naïve Bayes and Laplace Smoothing. 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, 330–333. https://doi.org/10.1109/EIConCIT50028.2021.9431862

Sjarif, N. N. A., Yusof, M. R. M., Wong, D. H. Ten, Ya’akob, S., Ibrahim, R., & Osman, M. Z. (2019). A Customer Churn Prediction using Pearson Correlation Function and K Nearest Neighbor Algorithm for Telecommunication Industry. International Journal of Advances in Soft Computing and Its Applications, 11(2), 46–59.

Subarkah, P., Santiko, I., & Tri, A. (2017). Perbandingan Kinerja Algoritma Cart dan Naive Bayesian untuk Mendiagnosa Penyakit Diabetes Melitus. Conference on Information Technology, Information System and Electrical Engineering, 17.

Tangirala, S. (2020). Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612–619. https://doi.org/10.14569/ijacsa.2020.0110277

Utami, Y. T., Shofiana, D. A., & Heningtyas, Y. (2020). Penerapan Algoritma C4.5 Untuk Prediksi Churn Rate Pengguna Jasa Telekomunikasi. Jurnal Komputasi, 8(2), 69–76. https://doi.org/10.23960/komputasi.v8i2.2647

Vatanen, T., Väyrynen, J. J., & Virpioja, S. (2010). Language identification of short text segments with n-gram models. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, 3423–3430.

Widaningsih, S. (2019). Perbandingan Metode Data Mining Untuk Prediksi Nilai dan Waktu Kelulusan Mahasiswa Prodi Teknik Informatika Dengan Algoritma C4.5, Naïve Bayes, KNN, dan SVM. Jurnal Tekno Insentif, 13(1), 16–25. https://doi.org/10.36787/jti.v13i1.78

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061

Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. 1–56. https://doi.org/https://doi.org/10.48550/arXiv.2003.05689

Yulianti, Y., & Saifudin, A. (2020). Sequential Feature Selection in Customer Churn Prediction Based on Naive Bayes. IOP Conference Series: Materials Science and Engineering, 879(1), 7. https://doi.org/10.1088/1757-899X/879/1/012090

Downloads

Published

04-07-2023

How to Cite

Adhitya, R. R., Wina Witanti, & Rezki Yuniarti. (2023). PERBANDINGAN METODE CART DAN NAÏVE BAYES UNTUK KLASIFIKASI CUSTOMER CHURN. INFOTECH Journal, 9(2), 307–318. https://doi.org/10.31949/infotech.v9i2.5641

Issue

Section

Articles