Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic

Título

Autor

Corneel Vandelanotte, Quyen G. To, Kien G. To, Van-Anh N. Huynh, Nhung TQ Nguyen, Diep TN Ngo, Stephanie J. Alley, Anh NQ Tran, Anh NP Tran, Ngan TT Pham, Thanh X Bui

Descripción

Anti-vaccination attitudes have been an issue since the development of the first vaccines. The increasing use of social media as a source of health information may contribute to vaccine hesitancy due to anti-vaccination content widely available on social media, including Twitter. Being able to identify anti-vaccination tweets could provide useful information for formulating strategies to reduce anti-vaccination sentiments among different groups. This study aims to evaluate the performance of different natural language processing models to identify anti-vaccination tweets that were published during the COVID-19 pandemic. We compared the performance of the bidirectional encoder representations from transformers (BERT) and the bidirectional long short-term memory networks with pre-trained GLoVe embeddings (Bi-LSTM) with classic machine learning methods including support vector machine (SVM) and naïve Bayes (NB). The results show that performance on the test set of the BERT model was: accuracy = 91.6%, precision = 93.4%, recall = 97.6%, F1 score = 95.5%, and AUC = 84.7%. Bi-LSTM model performance showed: accuracy = 89.8%, precision = 44.0%, recall = 47.2%, F1 score = 45.5%, and AUC = 85.8%. SVM with linear kernel performed at: accuracy = 92.3%, Precision = 19.5%, Recall = 78.6%, F1 score = 31.2%, and AUC = 85.6%. Complement NB demonstrated: accuracy = 88.8%, precision = 23.0%, recall = 32.8%, F1 score = 27.1%, and AUC = 62.7%. In conclusion, the BERT models outperformed the Bi-LSTM, SVM, and NB models in this task. Moreover, the BERT model achieved excellent performance and can be used to identify anti-vaccination tweets in future studies.

Fecha

2021

Materia

neural network, deep learning, LSTM, bert, transformer, stance analysis

Identificador

10.3390/ijerph18084069

Fuente

Epidemiology and Health

Editor

Korean Society of Epidemiology

Cobertura

Medicine

Archivos

https://socictopen.socict.org/files/to_import/pdfs/9aa64adb121a4ee7d10d2f22edffbf9d.pdf

Colección

Coronavirus

Citación

Corneel Vandelanotte, Quyen G. To, Kien G. To, Van-Anh N. Huynh, Nhung TQ Nguyen, Diep TN Ngo, Stephanie J. Alley, Anh NQ Tran, Anh NP Tran, Ngan TT Pham, Thanh X Bui, “Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic,” SOCICT Open, consulta 10 de junio de 2026, https://www.socictopen.socict.org/items/show/6975.

Formatos de Salida

Position: 17910 (19 views)