INDONESIAN TWITTER HATE SPEECH AND ABUSIVE LANGUAGE DETECTION: METHODS AND ANALYSIS
DOI:
https://doi.org/10.2015/cyfors.31.2026Kata Kunci:
Hate speech, abusive language, multi-label classification, Indonesian Twitter, machine learning, feature extractionAbstrak
This study presents a comprehensive analysis of hate speech and abusive language on Indonesian Twitter using a multi-label classification approach. A meticulously cleaned and labeled dataset was employed, categorizing various forms of hate speech and abusive language. We applied machine learning algorithms, including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) for data transformation. Our results indicate that RFDT with LP transformation achieves the highest accuracy. Additionally, this paper underscores the critical role of text normalization and feature extraction in enhancing classification performance and discusses the importance of comprehensive annotation guidelines. The study’s findings provide a foundation for future research in hate speech detection and highlight areas for improvement in data annotation and algorithm selection.





