(Millbrook, NY) In the Americas, primate species likely to harbor Zika - and potentially transmit the virus - are common, abundant, and often live near people. So reports a new study published today in Epidemics. Findings are based on an innovative model developed by a collaborative team of researchers from the Cary Institute of Ecosystem Studies and IBM Research through its Science for Social Good initiative.

Lead author Barbara Han, a disease ecologist at Cary Institute, explains: "When modeling disease systems, data gaps can undermine our ability to predict where people are at risk. Globally, only two primate species have been confirmed positive for Zika virus. We were interested in how a marriage of two modeling techniques could help us overcome limited data on primate biology and ecology - with the goal of identifying surveillance priorities."

The recent Zika epidemic in the Americas was one of the largest outbreaks in modern times, infecting over half a million people. Like other mosquito-borne flaviviruses, Zika circulates in the wild. Primates can serve as disease reservoirs of spillover infection in regions where mosquitoes feed on both primates and people.

By analyzing data on flaviviruses and the primate species known to carry them, and comparing these traits to 364 primate species that occur globally, the model identified known flavivirus carriers with 82% accuracy and assigned risk scores to additional primate species likely to carry Zika virus. The end product includes an interactive map that takes into account primate geographic ranges to identify hotspots where people are most at risk of Zika spillover.

Primate species in the Americas with Zika risk scores over 90% included: the tufted capuchin (Cebus apella), the Venezuelan red howler (Alouatta seniculus), and the white-faced capuchin (Cebus capucinus) - species adapted to living among people in developed areas. Also on the list: white-fronted capuchins (Cebus albifrons), commonly kept as pets and captured for live trade, and spider monkeys (Saimiri boliviensis), which are hunted for bushmeat in parts of their range.

"These species are geographically widespread, with abundant populations that live near human population centers. They are notorious crop raiders. They're kept as pets. People display them in cities as tourist attractions and hunt them for bushmeat. In terms of disease spillover risk, this is a highly alarming result," says coauthor Subho Majumdar.

Adding to the concern: the mosquito species most likely to spread Zika are commonly found near humans, and are able to thrive in natural and altered landscapes.

The model 
To overcome data gaps, the team combined two statistical tools - multiple imputation and Bayesian multi-label machine learning - to assign primate species with a risk score indicating their potential for Zika positivity.

The pathogens 
Traits of six mosquito-borne diseases were assessed: yellow fever, dengue fever, Japanese encephalitis, St. Louis encephalitis, Zika virus, and West Nile virus. Three of these had known primate reservoirs.

The primates 
Biological and ecological traits of the 18 primate species that have tested positive for any mosquito-borne flavivirus were compared to the traits of 364 primate species that occur globally. 33 features were assessed - including things like metabolic rate, gestation period, litter size, and behavior. Features were weighted for importance in predicting Zika positivity.

Han explains: "Like all pathogens, Zika virus has unique requirements for what it needs in an animal host. To determine which species could harbor Zika, we need to know what these traits are, which species have these traits, and which of these species can transmit the pathogen to humans. This is a lot of information, much of which is unknown."

A statistical method called Multiply Imputed Chained Equations (MICE) was used to overcome data limitations. MICE sets computer algorithms to the task of searching through datasets of organism traits to draw connections between organisms with similar or related traits. When the algorithm encounters a missing data entry, it uses these connections to infer the missing information and fill the 'blanks' in the dataset.

Machine learning was applied to this 'filled in' dataset to predict primate species most likely to carry Zika virus. The model produced a risk score for each species by combining flavivirus infection history and biological traits to predict the likelihood of Zika positivity.

This method could help improve forecasting models for other disease systems, beyond Zika. Senior author Kush Varshney from IBM Research explains, "Data gaps are a reality, especially in infectious diseases that originate from wild animal hosts. Models like the one we developed can overcome some of these gaps and help pinpoint species of concern to fine-tune surveillance, forecast spillover events, and help guide efforts by the public health community."

With Varshney adding, "Conducting machine learning on small-sized, incomplete, and noisy datasets to support critical decision making is a challenge shared across many industries and sectors. We will surely use the experience gained from this project in many different application areas."

Han concludes, "This research was made possible by innovations provided by the broader scientific community. We relied on primate and pathogen data collected by hundreds of field researchers, and the base machine learning and imputation methods that we adapted in this research already existed. Partners at IBM Research took on a lion's share of the math and coding. It was an incredibly successful interdisciplinary collaboration - the kind we need more of if we want to find new solutions to complex problems."