Scroll to top

Challenges in Digital Epidemiology: Using Social Media for Health Research


OCTOBER 22, 2021

Graciela Gonzalez-Hernandez, PhD
Associate Professor, Division of Informatics
University of Pennsylvania

About the Presentation: Social media has grown in popularity for health-related research as it has become evident that it can be a good source of patient insights. Be it Twitter, Reddit, Instagram, Amazon reviews or health forums, researchers have collected and processed user comments and published many papers on different uses of social media data, with more or less strictness as to their study design and use of the data.

Using these data presents many challenges when it needs to be used in epidemiology. From identifying the right cohort and reducing bias to finding the ‘needle in the haystack’, social media data is sometimes misused and frowned upon when it is not properly handled.  I will discuss some aspects of how solid scientific principles and careful design of natural language processing methods can help ‘tame’ the noise in social media data and enable digital epidemiology.

Selected relevant publications: , , ,,

About the Presenter: Dr. Gonzalez Hernandez is a recognized expert and leader in natural language processing (NLP) applied to bioinformatics, medical/clinical informatics, and public-health informatics, with more than 100 publications in this area. After 11 years at the Department of Biomedical Informatics at Arizona State University, she joined the University of Pennsylvania and established the Health Language Processing Lab within the Institute of Biomedical Informatics. Her recent work focuses on NLP applications for public-health monitoring and surveillance and is funded by R01 grants from the National Library of Medicine and the National Institute of Allergy and Infectious Diseases.

Her work on social media mining for pharmacovigilance has resulted in numerous publications in prestigious conferences and journals. Examples include work on ADR extraction in the Journal of the American Medical Informatics Association(JAMIA) and on prescription-drug abuse in Drug Safety. A Journal of Biomedical Informatics publication was selected as one of Elsevier/Atlas’s 10 articles with greatest potential social impact, an honor among papers in more than 2500 journals. Her work in this area also caught the attention of the FDA, which awarded her a grant to develop these methods for monitoring nutritional supplements.

Her work on enriching geospatial information for phylogeography, in collaboration with Dr. Matthew Scotch, uses NLP for the automatic extraction of relevant geospatial data from the literature and for linkage to GenBank records.

Dr Gonzalez Hernandez is currently a standing member of the NLM Board of Scientific Advisors, and of the NIH BDMA panel. She served as a member of the NIH BLIRC panel from 2008 to 2013. She is a regular reviewer for a number of prestigious journals and conferences, including PLoS One, PLoS Computational Biology, JAMIA and BMC Bioinformatics. Her prior funding also included funding under the Arizona Alzheimer’s Disease Center, a P30 NIA Center, as director of the Data Core from 2008 to 2016.