CONFERENCE PROCEEDING
Automation of time series analysis of Google trends data in R studio using autonomous AI agents
More details
Hide details
1
University of Medicine and Pharmacy “Iuliu Hațieganu” Cluj-Napoca, Romania
Publication date: 2024-10-17
Tob. Prev. Cessation 2024;10(Supplement 1):A44
KEYWORDS
ABSTRACT
Introduction:
Google Trends (GT) is an open-access data source on the search interests of populations in a certain country and time frame. The Relative Search Volume is the normalized measure used to represent the search interest on a scale of 1 to 100, with a value of 0 indicating insufficient data.
Time Series Analysis refers to characterizing data collected periodically during a longer period using appropriate statistical methods, plots, and models. The gold standard of Time Series Analysis is Modeling and forecasting. This paper describes the development of a program in R to automate the time series analysis of Google Trends Data and the use of autonomous AI agents in its development.
Methods:
Blackbox Robocoder AI, an autonomous AI agent, was used to develop, debug and improve an R code, automizing time series analysis of Google Trends Data using only natural language. The resulting code was tested in R studio on Data from Google Trends for different Topics and Terms in different countries and timeframes.
Results:
A fully working R script was developed, which imports the raw Data file (.csv), identifies variable names, and defines the time variable. Furthermore, it generates the following plots using normal and differenced data: Line, Seasonal, Subseries, Scatter Plot, Histogram, Lag Plot, Autocorrelation, and partial autocorrelation plots. Furthermore, code for STL (seasonal trend decomposition by Loess) was developed to decompose additive time series or log-transformed multiplicative time series, plotting and saving the results for standard and differenced data and replacing missing values using linear interpolation.
Conclusions:
This paper demonstrates that autonomous AI agents can support researchers in developing R scripts faster, using natural language exclusively. However, expertise and understanding of the code and resulting statistics are indispensable. The developed R script can characterize GT Data for different keywords, timeframes, and countries, providing an extensive statistical report.
CONFLICTS OF INTEREST
The author has no conflicts of interest to declare.
FUNDING