Authors
Sharif O, Basak M, Parvin T, Scharfstein A, Bradham A, Borodovsky JT, Lord SE, Preum SM
Purpose
This study had two goals: (1) develop a multilabel dataset of treatment information seeking events (TREAT-ISE) from real-world reports of individuals considering or using medication for opioid use disorder (MOUD); and (2) evaluate non-transformer models, transformer-based models, and ChatGPT’s ability to identify and classify six types of information-seeking events (ISE).
Methods
To develop the TREAT-ISE dataset, 25,044 Reddit posts from the r/Suboxone subreddit (posted January 2018-August 2022) were collected. This online forum contains anonymized content about individuals' experiences considering or using medications for opioid use disorder. Posts were screened for relevancy (n=15,253) and 5,083 were randomly selected for annotation. Experienced human annotators categorized treatment information seeking events into six categories. The TREAT-ISE dataset was used to assess the ISE identification and classification capabilities of multiple models.
Findings
• The TREAT-ISE dataset was successfully developed using guidance from treatment experts, providing a novel resource for future opioid use disorder recovery research.
• XLNet had the best performance compared to all other models (WF1 score of 0.774).
• BiGRU was the best of the non-transformer models (WF1 score of 0.702).
• ChatGPT displayed suboptimal performance overall, incorrectly classifying ISE 45% of the time.
• Errors were most common in longer texts (average word count: XLNet = 140.21, ChatGPT = 128.04).
Relevance
• Development of TREAT-ISE, a multilabel dataset annotated by human area experts, to be used in future research on online health discourse analysis.
• The TREAT-ISE dataset is comprised of social media posts from individuals considering or using MOUD providing real-world data on recovery complexities.
• Comparative analysis of 10 machine learning models to characterize ISE from online discourse using the TREAT-ISE dataset demonstrated the relative effectiveness of different deep learning approaches.
Read More
Sharif, O., Basak, M., Parvin, T., Scharfstein, A., Bradham, A., Borodovsky, J.T., Lord, S.E., & Preum, S.M. (2023). Characterizing Information Seeking Events in Health-Related Social Discourse. AAAI Conference on Artificial Intelligence.
This work was funded in part by the Pilot Core of the P30 Center of Excellence grant from the National Institute on Drug Abuse [P30DA029926; PI Lisa A. Marsch]