November 8, 2025
Upscaling effects on infectious disease emergence risk emphasize the need for local planning in primary prevention within biodiversity hotspots

Overall workflow

We developed a workflow to estimate epidemic risk from biodiversity loss and land use change. First, we defined the study area, selecting provinces across Java and sourcing key spatial datasets, including forest cover, human population, and administrative boundaries. Next, we identified forest patches and edge areas, applying the species–area relationship (SAR) to estimate microbial diversity for each patch using z-values (e.g., 0.2 or 0.28). We then calculated the estimated risk for novel infectious disease emergence (eRIDE) as the product of patch biodiversity and patch perimeter, generating spatial hazard maps that reflect microbial diversity and fragmentation. To estimate the Population at Risk (PAR), eRIDE values were combined with population density at each pixel and aggregated to both administrative regions and population-based Voronoi polygons. To assess epidemic potential, we applied gravity models incorporating PAR and inter-region distances, comparing outputs from province centroids, Voronoi-based models, and Gaussian decay surfaces. The effects of spatial resolution were evaluated by repeating eRIDE and PAR estimation at resolutions from 100 m to 5 km, with model fit assessed using AICc. Finally, we linked risk estimates to forest management types (e.g., intact, regenerating, agroforestry) and used bivariate mapping and density plots to explore associations between land use practices and emerging disease risk. Further details for each step are below.

Theoretical framework

The theoretical framework applied here estimates richness values from the species–area relationship (SAR) to connect biodiversity, forest habitat and fragmentation with the risk of exposure to novel infectious diseases2. Under this model, habitat division inherently leads to increased exposure to potential sources of harm (hazard)40 at the landscape-level. The hazard is posed by the entire range of disease-causing microbial diversity within a habitat, rather than specific traits of individual disease-causing agents. The framework is based on assumptions that generalise disease behaviour allowing it to be modelled in a mechanistic fashion2. We provide a summary of biodiversity and edge amount in a biodiverse region of the world as a starting point for quantifying the amount of human-nature interface per area. We evaluate data by first estimating patch biodiversity and the risk of emerging infectious disease emergence.

Biodiversity model

The number of mammalian viruses for zoonotic infections proportionately increases with mammalian species richness (Fig. S2). Based on this assumption, we build a microbial diversity model that is used to estimate the risk for novel infectious disease emergence (eRIDE). Increasing contact due to human encroachment and landscape fragmentation with corresponding species diversity decline are likely to act antagonistically to affect hazards from novel pathogens. We calculate eRIDE2 using the measures of biological diversity derived from landscape fragmentation and decline rates based on species-area relationship from literature. From every patch, a measure of alpha microbial diversity is measured. eRIDE is a ‘bottom-up’ estimate based on all biodiversity from forest patch and edge metrics and species-area rules. The species-area relationship model here uses a power-law relationship. We derived data from species-habitat-area relationships expected for microbial communities and vertebrates (z-value = 0.23 ± 0.13, approximated to 0.20, and z-value = 0.28, Table S2), considering forests as habitats in a binary forest-non-forest space. Rate of species decline was set as 0.2 based on island biogeographic empirical estimates at a conservative lower level. Because z varies according to matrix type (as the intervening matrix becomes more permeable, the z-value decreases), we expect z-values in land to be smaller as the terrestrial matrix is not expected to be completely impermeable to terrestrial wildlife as the ocean. Because of that, we use edges and moving windows to evaluate spillover risk. Regardless, we also run our estimates for z-value = 0.28 and compare them with z = 0.20 in order to check the sensitivity of the trends observed at optimal spatial resolution values for eRIDE. The variation encompasses empirically derived values for vertebrate species assemblages across different habitat patches present in different types of landscapes and island systems (Table S3, extracted from Matthews et al.41).

The land use layer was upscaled from the native resolution of 30 m to 100 m values in spatial resolution to match the population data through nearest neighbor resampling. After that, we repeated our workflow for progressively coarser spatial scales.

Within our system, all approximations of total microbial species diversity can be relative measures on an arbitrary scale. We assume that our habitat patches are not linked following division, but our estimates incorporate the dense tree cover areas as forest habitat. The overall risk coming from any pathogen is based on the biodiversity model and interactions with forest edges. We assume that microbial species-rich areas inland vary as a function of patch area and empirical number of species of birds and mammals, whereas patches considered as habitats are classified as forest from land cover data.

Habitat fragmentation was based on forest cover extracted from GLAD GlobeCover for 201922 using landsat images (Fig. S3). The mosaic images were processed at 0.00025 dd resolution (~ 30 m). All forest land cover classes (landclass codes from 50 through 116) were considered as habitat. We adapted the original algorithms from Wilkinson et al. (2018)2 to R 4.5.026 and GRASS GIS 8.442 through custom codes based on the lsmetrics package workflow43.

Estimated risk of infectious disease emergence

The risk of infection emergence from forests into the expanding human population (R), was considered as the product of the relative number of potential disease-causing agents, which we assume to scale linearly with forest patch (i) microbial diversity (Si)44, and the area over which the population first comes into uniform contact with this habitat, which we assume is represented by the perimeter of the habitat fragment (Pi). So, hazard is proportional to patch biodiversity, whereas we have that:

And total risk R is

$$R = \sum\limits_{i}^{{}} {R_{i} }$$

We assume eRIDE correlates with a possible hazard. Our gravity models take into account that the estimated hazard reflects a higher risk of human exposure to newly emerging pathogens. We assume that disease spread between nearby areas is proportional to the product of their population densities, making routes with high population density more likely pathways for spread2. Human population at risk (PAR) is defined as the product of the pixel eRIDE index and the human population at that location. That is the basis for the flow calculation for the target subregions applying three versions of a gravity model2,45. The epidemic potential map from eRIDE was built from gravity to terrestrial areas of Java. Then we calculate population at risk estimates and epidemic potential across the following administrative regions: Bali Province, Banten Province, Jakarta Raya Special District, Jawa Barat (West Java) Province, Jawa Tenga (Central Java) Province, Jawa Timur (East Java) Province, and Yogyakarta Special region. We used WorldPop population data46 and administrative boundaries rnaturalearth data47. We used the human population at ~ 100 m resolution as the total number of people per grid-cell based on the 2020 population census48 to infer the effects from a potential outbreak through direct transmission using gravity models and a pixel-based cumulative potential surface for PAR49, representing the given risk of novel emerging zoonoses. The estimated PAR was defined as the product of the eRIDE value and the population at a given location.

Modelling the impact of upscaling

After calculating eRIDE and PAR, we developed models to understand how upscaling working resolution influences statistics for eRIDE and PAR. In previous work, our eRIDE metric was a considerably better predictor of Ebola virus disease (EVD) emergence in Africa compared to other habitat fragmentation metrics, which were individually poor predictors for a wide range of scales (from 300 m up to 60 km)2. Within 5 km of outbreaks, eRIDE was 10–12-fold higher than background values and 7–eightfold increase in areas within 5–60 km of known EVD outbreak index cases. Our main question here was to understand how much resolution matters in terms of information loss (variation, SD) and generality (averages maintained) for analyses using this eRIDE modelling framework. Resolution changes are viewed in terms of computational cost (number of pixels to process) and represent the upscaling effect. Predictor data for costs was extracted for the following resolutions (N = 18): 100, 150, 200, 250, 300, 350, 400, 500, 600, 700, 800, 900, 1000, 2000, 2500, 3000, 4000, and 5000 m. Information loss —the degree to which information at a coarser scale deviates from the observed values at a finer scale of spatial measurement—was modelled as the difference in SD values from the reference resolution (100 m) for a given metric. Model plausibility was compared through the corrected Akaike Information Criteria. A no-effect, intercept-only model was competed with all concurrent models including general linear, piecewise regression50, and non-linear additive models with limited basis dimension for the smooth term (k) up to four. Models with ΔAICc were considered the most plausible models. Weight of evidence was inspected to determine the model that stands out as the most appropriate representation of the underlying processes.

Epidemic risk models

We apply gravity models based on human population at risk for our study region using two area division types and discuss epidemic potential. To estimate the potential for epidemics from an emerging disease, we modelled disease spread across target provinces of Indonesia (Admin-1 states provinces): Banten, Jakarta Raya, West Java, Central Java, Yogyakarta special region, East Java, and Bali. Our models assume that the risk flow follows an inverse-linear relationship with the distance between two population centres, aligning with the principle that proximity facilitates interaction flow45. The denominator exponent and scaling constant were set to one. We assume the potential for spread between any two areas was proportional to the product of the population, so that epidemics were likely to travel along paths of high population density. We applied three methods for the calculation of epidemic risk. First, we calculate a gravity model using the provinces of Java. This method ignores islands in terms of their spatial information but takes into account the PAR for the entire province, including the PAR in islands for a given province. It considers the centroid of each province as the base for calculation of distances in the gravity model. In the case of Java, most provinces are composed of small islands around their respective majoritarian area in Java main island. In order to population and add anisotropy here in increasing complexity, we calculate risk using two more methods.

The second method models high-level population-Voronoi-tessellation areas forced upon identified provinces and then calculates a gravity model based majority occupied in these areas (Fig. S5). When constructing a gravity model, particularly for epidemic risk assessment, using Voronoi polygons based on high population values provides a more realistic representation of population distribution compared to simply relying on province boundaries and their centroids. While centroids represent the geometric center of a province, they fail to capture the true spatial variation in population density, especially when the distribution is uneven across the region. Voronoi polygons, derived from high-density population centers, offer a better approximation of how populations are spatially distributed, or the spatial domains of a region based on where most people are present (here, using the 95th population quantile, N = 100). Although not perfectly accurate, these polygons capture the influence of densely populated centers, which is crucial in gravity models that rely on spatial interactions, such as distance and neighborhood. In addition to adding more realism to the analysis, Voronoi polygons remain computationally efficient, unlike pixel-based gravity models, which are more data-intensive. This approach strikes a balance between improving accuracy and maintaining manageable computational demands, making it suitable for large-scale applications. Moreover, by forcing the Voronoi coverage to align with province boundaries (i.e., each Voronoi polygon is assigned to the province where the majority of its area is located), each polygon retains the administrative boundary information (Fig. S5). This feature makes Voronoi polygons not only ideal for large-scale epidemic risk assessments but also useful for more local spatial planning tasks that require adherence to administrative limits. Province-level governance actions—such as public health interventions, infrastructure planning, or resource allocation—can be aligned with these high-population domains for each region, ensuring that political limits are respected while containing meaningful information on population centers, given and received risk.

Lastly, we used a cumulative surface based on pixel-level contiguity and 100 km window for Gaussian decay of estimates as a proxy for given risk. This distance surpasses the average commuting length in the largest Indonesian metropolitan areas (~ 22 km), allowing for the reach of rural areas and neighbouring provinces considering the distances covered by motorized vehicles51. To assess the potential of each source area to contribute to an epidemic, we rank the total estimated flow for every area and their contributions to risk in all other regions of Java. So the received epidemic risk is later calculated for each province considering the province source of given risk for both all vector-based calculations applied. No outbreak area was used to validate epidemic potential estimates, because current observations of emerging disease data is proven to be heavily determined by detection10.

The relationship between risk and forest management/primary prevention targets

Finally, we extracted values for populations at risk to highlight regions of interest that are under varying forest management regimes, which can be viewed as where primary prevention actions— such as pathogen spillover surveillance, conservation actions, deforestation reduction and sustainable practices—may take place. We evaluate no management/mostly intact, low-management/regenerating and managed/ agroforestry areas and their land cover intensity across Java island at different levels of PAR. We believe this analysis helps understand how land management practices may increase infectious disease emergence risk. The source for our forest management layer was Lesiv et al.52 with reference data at 100 m (Table S4). All spatial data was warped to the Pseudo-Mercator (EPSG:3857) projected coordinate system and the World Geodetic System 1984 datum. We build bivariate maps using the quantiles for management cover and PAR using the provinces and population-informed Voronoi areas as spatial units. We finally inspect data density distribution of land cover proportions to unravel the association patterns between forest management and high levels of PAR divided as quantiles.

link

Leave a Reply

Your email address will not be published. Required fields are marked *