Understanding the Use of Social Media Data in Mental Health Research During the COVID-19 Pandemic
The COVID-19 pandemic has significantly impacted mental health worldwide, prompting researchers to explore innovative methods for understanding and predicting mental healthcare needs. One such approach involves leveraging social media data, particularly from platforms like Twitter, to gain insights into public sentiment and its correlation with mental health trends. This article delves into a study approved by the A*STAR Institutional Review Board, which utilized social media data to forecast mental healthcare needs in Singapore during the pandemic.
Study Approval and Data Collection
The study received an exemption from a full Institutional Review Board (IRB) review, allowing researchers to utilize social media data obtained from approved Twitter APIs and existing anonymous public data. The research focused on the COVID-19 outbreak, collecting daily-level time-series data over an 18-month period from July 2020 to December 2021. This timeframe encompassed Singapore’s heightening and stabilizing phases during the pandemic, providing a rich dataset for analysis.
Data and Indicators Selection Criteria
The selection of data and indicators was guided by three primary criteria:
-
Accessibility: The data collection methods needed to be low-cost and easily accessible, ensuring that researchers could gather the necessary information without significant barriers.
-
Continuity: The data had to be available continuously for time-series statistical analysis. Traditional survey data, which is often collected at intervals of several months, was deemed unsuitable for forecasting daily mental healthcare needs.
- Validity: The data and tools employed in the study needed to demonstrate validity, either through previous studies or their application in related fields.
Situation Data and Indicators
To assess the severity of the COVID-19 situation, the researchers relied on data published by health authorities. Key indicators included daily new COVID-19 cases and deaths, as well as the number of government announcements related to the pandemic. These indicators served as comparative predictors for the study, helping researchers understand the relationship between the evolving COVID-19 situation and mental health needs.
Social Media Data and Emotion Indicators
Twitter was chosen as the primary social media source due to its public access to post content, timestamps, and user information. The researchers conducted keyword searches for tweets containing COVID-19-related terms such as “ncov,” “corona,” and “covid.” The dataset focused on tweets from Singapore-based users, ensuring relevance to the local context.
To extract meaningful insights from the noisy social media data, the researchers implemented a series of data processing steps:
-
Data Cleaning: This involved removing duplicate posts, clickbait content, and irrelevant tweets, such as those from influencers or spam accounts. The final dataset consisted of 140,598 tweets after filtering out potential trolls and influencer posts.
- Emotion Analysis: The researchers employed CrystalFeel, a multidimensional emotion analysis tool, to classify tweets into emotions such as joy, anger, fear, and sadness. CrystalFeel quantifies emotional intensity on a continuous scale, providing a nuanced understanding of public sentiment.
Mental Healthcare Needs Measures
To gauge mental healthcare needs, the study utilized behavioral data from two primary sources:
-
Institute of Mental Health (IMH): The daily count of emergency room visits at IMH served as a proxy for public demand for psychiatric services.
- Mindline: An online mental health help portal launched during the pandemic, Mindline provided insights into online mental health needs. Users completed a mental health status questionnaire, which included standardized screening instruments like the PHQ-9 and GAD-7.
The study focused on analyzing trends in user visits to Mindline, particularly during “Crisis” situations, reflecting severe mental health concerns.
Statistical Analysis and Forecasting
Before conducting statistical analyses, the researchers pre-processed the aggregated data to ensure accuracy and reliability. Techniques included normalizing data, ensuring stationarity, and removing volatility and seasonality. The Augmented Dickey-Fuller Test was employed to verify the stationarity of the variables.
Granger Causality Tests
To explore the dynamic relationships between situational indicators, Twitter data, and mental healthcare needs, the researchers utilized Granger causality tests. This statistical method estimates the causal effects of one time-series variable on another, controlling for lagged values. The results provided insights into how public sentiment, as reflected in social media, influenced mental health trends.
ARIMA Forecasting
The researchers evaluated the performance of Auto-Regressive Integrated Moving Average (ARIMA) models for forecasting mental healthcare needs based on significant variables identified through Granger causality tests. The models were assessed using error metrics such as Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), with a focus on minimizing prediction errors.
Conclusion
The study exemplifies the innovative use of social media data to understand and predict mental healthcare needs during a global crisis. By integrating real-time sentiment analysis with traditional healthcare indicators, researchers can gain valuable insights into public mental health trends. This approach not only enhances our understanding of the psychological impact of the COVID-19 pandemic but also paves the way for more responsive mental health interventions in the future. As the world continues to navigate the complexities of public health crises, leveraging social media data will undoubtedly play a crucial role in shaping effective mental health strategies.