Davoudi, A., Klein, A. Z., Sarker, A., & Gonzalez-Hernandez, G. (2020). Towards Automatic Bot Detection in Twitter for Health-related Tasks. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2020, 136–141.
A significant challenge in using social media data in health research is identifying posts from “bot” accounts, that is accounts that create content automatically and not by human users. To address this challenge, researchers evaluated a Twitter bot detection system (“Botometer”) and customized the system for health-related research. In a sample of 10,417 profiles, selected from a database of more than 400 million public tweets and 100,000 users who shared their pregnancy on Twitter, two reviewers manually categorized each user as “bot”, “non-bot”, or “unavailable”. The researchers then evaluated the accuracy of the Botometer with the sample profiles. Results found that the Botometer underperformed in identifying health-related Twitter users, likely due to the fact that the Botometer was originally developed for political bot detection. The researchers found that the F1-score for the Botometer is 0.361, which indicates it does not detect “bots” in health data accurately. The Botometer was adapted by adding features and a statistical machine learning classifier to increase performance. The modified Botometer had a 0.700 F1-score, showing significant improvement in performance with health-related data. The modified Botometer can be customizable for other health-related social media cohorts.