INDONESIAN TWITTER HATE SPEECH AND ABUSIVE LANGUAGE DETECTION: METHODS AND ANALYSIS

Authors

  • Ana’llhaqq Suryaningprang Author
  • Arif Kurniawan Author
  • Muhammad Syaifudin Tamami Author
  • Muhamad Tuhfatur Roziqin Author
  • Anas Nasrullah Author
  • Irwan Siswanto Author

DOI:

https://doi.org/10.2015/cyfors.31.2026

Keywords:

Hate speech, abusive language, multi-label classification, Indonesian Twitter, machine learning, feature extraction

Abstract

This study presents a comprehensive analysis of hate speech and abusive language on Indonesian Twitter using a multi-label classification approach. A meticulously cleaned and labeled dataset was employed, categorizing various forms of hate speech and abusive language. We applied machine learning algorithms, including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) for data transformation. Our results indicate that RFDT with LP transformation achieves the highest accuracy. Additionally, this paper underscores the critical role of text normalization and feature extraction in enhancing classification performance and discusses the importance of comprehensive annotation guidelines. The study’s findings provide a foundation for future research in hate speech detection and highlight areas for improvement in data annotation and algorithm selection.

Downloads

Published

09-01-2026