Klasifikasi dan Analisis Semantik Cyberbullying Sosial Media X: Integrasi Web Scraping dan Natural Language Processing (NLP)
DOI:
https://doi.org/10.31949/educatio.v11i2.12725Abstract
Cyberbullying di media sosial, khususnya X, telah menjadi isu kritis dengan dampak psikologis yang signifikan. Studi ini menganalisis sejauh mana cyberbullying masih terjadi di platform X dengan pendekatan semantik. Data dikumpulkan melalui proses web scraping menggunakan Selenium dengan menggunakan kategori dan kata kunci spesifik seperti “gendut” dan “bodoh” selama periode Desember 2024. Sebanyak 700 data berhasil dikumpulkan setelah melalui proses deduplikasi, yang mana memenuhi kriteria Slovin (margin of error 3.77%). Proses analisis melibatkan Natural Language Processing (NLP), termasuk text-cleaning, lowercasing, normalization, tokenization, stopword removal, klasifikasi model menggunakan model BERT yang telah di-fine-tune untuk memastikan program mengenali sebuah komentar termasuk cyberbullying atau tidak, serta pemetaan kata kunci ke 8 kategori, seperti “rasisme” dan “sara”. Hasil menunjukkan bahwa sebanyak 55,4% mengandung indikasi cyberbullying, dengan kategori seksual sebagai yang paling dominan dengan 26,6%, serta kata kunci anjing yang disebut 99 kali. Kata-kata negatif tertentu menunjukkan pola temporal yang fluktuatif, di mana intensitas cyberbullying mencapai puncak pada Minggu 4 dengan persentase tertinggi (79,1%). Temuan ini mengonfirmasi bahwa cyberbullying masih menjadi fenomena signifikan di platform X, oleh karena itu diperlukan kebijakan moderasi konten yang lebih ketat serta pengembangan sistem deteksi otomatis berbasis machine learning untuk mitigasi cyberbullying secara lebih efektif.
Keywords:
Cyberbullying, Analisis Semantik, Web Scraping, NLP, Sosial Media XDownloads
References
Adellia, A. P., Sulistiyana, & Putro, H. Y. S. (2024). Studi komparatif: Bullying di dunia nyata dan dunia maya (cyberbullying). Edukatif: Jurnal Ilmu Pendidikan, 6(4), 4000-4007. https://doi.org/10.31004/edukatif.v6i4.7240
Balet, T., Vo, Q., Salem, O., & Mehaoua, A. (2023). Cyberbullying detection on tweets from Twitter using machine learning algorithms. 2023 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS). https://doi.org/10.1109/ICCNS58795.2023.10193450
Copp, J. E., Mumford, E. A., & Taylor, B. G. (2021). Online sexual harassment and cyberbullying in a nationally representative sample of teens: Prevalence, predictors, and consequences. Journal of Adolescence, 93, 202-211. https://doi.org/10.1016/j.adolescence.2021.10.003
Fati, S. M., Muneer, A., Alwadain, A., & Balogun, A. O. (2023). Cyberbullying detection on Twitter using deep learning-based attention mechanisms and continuous bag of words feature extraction. Mathematics, 11(16), 1-21. https://doi.org/10.3390/math11163567
Hinduja, S., & Patchin, J. W. (2018). Cyberbullying: Identification, prevention, and response. Cyberbullying Research Center. Retrieved from https://cyberbullying.org/Cyberbullying-Identification-Prevention-Response-2018.pdf
Kepios. (2024). Digital 2024 October Global Statshot Report. We Are Social & Meltwater. https://datareportal.com/reports/digital-2024-october-global-statshot
Kholis, K., Wiranata, T. D., Aisyah, S., & Asror, A. G. (2024). Semantik: Pengertian, teori, dan penerapannya dalam pembelajaran bahasa. Prosiding Seminar Nasional Daring, 4(1), 570-578. https://prosiding.ikippgribojonegoro.ac.id/index.php/SPBSI/article/view/2727/pdf
Lalithia, N., Sumaya T. Sk., Tejaswini, N., Bhagya, Sri D., & Srivani, R. (2023). Enhancing cyberbullying detection on Twitter with psychological features and machine learning. International Conference on Emerging Research in Computational Science (ICERCS). https://doi.org/10.1109/ICERCS57948.2023.10434258
Laxmi, S. T., Rismala, R., & Nurrahmi, H. (2021). Cyberbullying detection on Indonesian Twitter using Doc2Vec and Convolutional Neural Network. 2021 International Conference on Information and Communication Technology (ICoICT). https://doi.org/10.1109/ICoICT52021.2021.9527420
Marta, R. F., Kristina, Yulianto, A., & Febrianto, Y. (2024). Metode penelitian: Memahami pendekatan kuantitatif, kualitatif, dan campuran. PT Media Penerbit Indonesia. http://repository.mediapenerbitindonesia.com/338/1/Naskah%20Fix%20K%20204%20-%20%28FINISH%20LAYOUT%29%20Metode%20Penelitian%20Memahami%20Pendekatan%20Kuantitatif%2C%20Kualitatif%2C%20dan%20Campuran.pdf
Nugraha, D., & Astuti, P. (2023). Analisis sentimen cyberbullying pada sosial media Instagram menggunakan metode Support Vector Machine. Information System for Educators and Professionals, 8(2), 152-164. https://doi.org/10.51211/isbi.v8i2.2535
Office for National Statistics. (2023). Children’s online behaviour, England and Wales: Year ending March 2023. Office for National Statistics. https://www.ons.gov.uk/
Olasanmi, O. O., Agbaje, Y. T., & Adeyemi, M. O. (2020). Prevalence and prevention strategies of cyberbullying among Nigerian students. Open Journal of Applied Sciences, 10(6), 351-363. https://doi.org/10.4236/ojapps.2020.106026
Rizalespe. (2019). Dataset-Sentimen-Analisis-Bahasa-Indonesia [Dataset]. GitHub. https://github.com/rizalespe/Dataset-Sentimen-Analisis-Bahasa-Indonesia
Sandrila, W., & Wahyunengsih. (2023). Motives of cyberbullying behavior by teenage K-Pop fans on Twitter social media. Jurnal Riset Rumpun Ilmu Sosial, Politik dan Humaniora (JURRISH), 2(2), 190-196. https://doi.org/10.55606/jurrish.v2i2.1351
Ula, M., & Fachrurrazi, S. (2023). Analisis sentimen cyberbullying pada media sosial Twitter menggunakan metode Support Vector Machine dan Naïve Bayes Classifier. TECHSI - Jurnal Teknik Informatika, 14(2), 91. https://doi.org/10.29103/techsi.v14i2.12103
We Are Social. (2024). Digital 2024: Indonesia. We Are Social & Meltwater. https://datareportal.com/reports/digital-2024-indonesia
Winarno, Wiranto, & Harjito, B. (2023). Enhancing machine learning performance in cyberbullying detection through hyperparameter optimization. 2023 International Conference on Technology, Engineering, and Computing Applications (ICTECA). https://doi.org/10.1109/ICTECA60133.2023.10490843
Yasirutomo. (2021). text-normalization [Normalization]. GitHub. https://github.com/yasirutomo/text-normalization
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Syifa Aulia Azzahra, Nuur Wachid Abdul Majid

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
An author who publishes in the Jurnal Educatio FKIP UNMA agrees to the following terms:
- Author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-ShareAlike 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal
- The author is able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book) with the acknowledgment of its initial publication in this journal.
- The author is permitted and encouraged to post his/her work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work