It is both pervasive and largely undetectable—attributes that alarm the public and present complex problems for scientists and elected officials.
For months now, researchers have known that the coronavirus can be spread by people with no symptoms; indeed, some studies suggest individuals are most contagious within 48 hours of being infected, when they are asymptomatic or presymptomatic. The virus also spreads through the air, the most common form of transmission; yet, absent some means of measuring the concentration at a given location, people can only make educated guesses as to how safe they are at a grocery store, post office, restaurant, or pharmacy.
Epidemiologists can answer the big questions. They can say where in the country the virus is surging or likely to surge. But they have difficultly addressing more localized questions in a city or smaller geographic area, questions that might reflect everything from the demographics of the area, to the extent of mask wearing among various populations, to the countless decisions people make about where and how to travel.
But data science can help.
“One thing we have that we didn’t before is much more data—at a lower level of granularity,” said Benjamin Bengfort, Ph.D., Program Director for the Data Science Certificate Programs at Georgetown University. “And one thing we can provide that hasn’t been provided in the past is more local studies of a way a pandemic spreads.”
Creating Data-Driven Models
Bengfort and several students from the new Certificate in Advanced Data Science program have been exploring how to create data-driven models that analyze the influence of interventions or to understand risk across demographic and spatial factors. In July, Bengfort, working with Parallax Advanced Research, released open-source software in an ongoing study of how the state of Ohio can use information gleaned from Agent Based Simulations to target non-pharmaceutical interventions (such as school closings and mask-wearing messages) to the areas where they are most needed.
Agent Based Simulations (ABS) are computer models that simulate the combined actions of individuals and groups to assess their impact on the system as a whole. They “provide the ability to model more complex interactions that can be accumulated into a ‘robust statistical portrait’ with multifaceted views of both epidemiological effects and interventions,” Bengfort said.
Computer scientists are not epidemiologists, an important distinction they make given the number of unqualified “armchair epidemiologists” that have surfaced on the Internet and elsewhere, Bengfort said. Instead, they are using their own expertise to analyze the data, and only the data. Epidemiologists, by contrast, can use their knowledge to develop “first order models” concerning the behavior of the virus, but they cannot factor in the enormous number of variables that may impact its spread in a given area.
“That where data science really shines,” Bengfort said, “because there’s not a good way to employ first order models when it comes to that level of variability.”
Solving Complex Problems
One example Bengfort cites in his classes is the process used to predict solar flares.
“To do it from a first-order standpoint, you would have to come up with some theory of magnetized plasma flow and find a way to represent that behavior on the scale of the sun,” Bengfort said. “Whereas with a data-driven model, which is actually what we use to predict solar flares nowadays because we don’t have a model of magnetic plasma flows, … we learn to predict solar flares based on image information.”
The data science profession has only existed—at least in something approximating its present form—for a matter of years, not decades, Bengfort said. But as the amount of “Big Data” keeps growing exponentially, and data science continues to innovate and evolve, the field will play an increasingly vital role in all facets of society.
“Data science offers us the technological opportunity to understand the world by constructing statistical relationships from our direct observations,” Bengfort said. “Although this is not a replacement for an understanding of underlying processes, it is a starting place that can help us solve complex problems today and bootstrap scientific thought.”