Mosquito Forecasting: How It Works

The process of forecasting mosquito activity is complex but grounded in scientific research and real-world data. Our service builds upon established models, incorporating real-time data along with environmental, economic, and geographic factors to provide accurate and reliable forecasts. Let's walk through the steps of our forecasting process with an example.

Convergence of Random Forest Algorithm

65 factors are considered in this example to predict mosquito activity

Model Sources

Regional GDP: Regional GDP data was sourced from economic databases to understand the impact of socio-economic conditions on mosquito activity.

Standing Water Sources: Standing water sources were identified using satellite imagery and geographical databases. These sources were classified by type and size to quantify potential mosquito breeding sites.

Data Preprocessing: The collected data was normalized, cleaned, and preprocessed to ensure consistency and accuracy. Feature engineering was applied to create new features, such as the humidity index and average rainfall over the past week.

Data Analysis

Temperature Model: We use a sinusoidal function to account for seasonal variations:

T(t)=Tmean+ATsin(2π(tϕT)365)T(t) = T_{\text{mean}} + A_T \cdot \sin\left(\frac{2\pi(t - \phi_T)}{365}\right)

Economic Factors

GDP Model: We use a modified exponential growth function with cyclical components:

GDP(t)=GDP0ert(1+Asinsin(2πtTcycle)+Acoscos(2πtTcycle))GDP(t) = GDP_0 \cdot e^{rt} \cdot (1 + A_{\sin} \cdot \sin(\frac{2\pi t}{T_{\text{cycle}}}) + A_{\cos} \cdot \cos(\frac{2\pi t}{T_{\text{cycle}}}))

Where GDP0GDP_0 is the initial GDP value (¥6.8 trillion for Kyoto as of 2022)

Disease Outbreak Factors

Historical Outbreak Impact: Modeled using a decay function with seasonal variations:

H(t)=i=1nHieδ(tti)(1+εsin(2π(tti)365))H(t) = \sum_{i=1}^n H_i \cdot e^{-\delta(t-t_i)} \cdot (1 + \varepsilon \cdot \sin(\frac{2\pi(t-t_i)}{365}))

Where HiH_i is the severity of outbreak ii, and tit_i is the time of outbreak ii

Composite Grading System

The final mosquito activity forecast grade is calculated as:

F=wE(αTGT+αHGH+αRGR)+wG(βEGE+βWGW)+wC(γGDPGGDP+γPGP)+wD(δHGH+δVGV)F = w_E(α_T G_T + α_H G_H + α_R G_R) + w_G(β_E G_E + β_W G_W) + w_C(γ_{GDP} G_{GDP} + γ_P G_P) + w_D(δ_H G_H + δ_V G_V)

Where w are weights for each factor category, and G are individual factor grades

Kyoto-Specific Considerations

Post-Processing and Weight Adaptation

To further refine our predictions and dynamically adapt to changing conditions, we employ a Random Forest algorithm for post-processing and weight adjustment.

  • Random Forest Model:

    • We use an ensemble of decision trees to capture non-linear relationships and interactions between features.

    • Each tree in the forest is trained on a bootstrap sample of the data, with a random subset of features considered at each split.

  • Feature Importance:

    • We calculate feature importance to identify the most influential factors in our model:

Ij=1NTTtT:v(st)=jp(t)(Δi(st)2)I_j = \frac{1}{N_T} \sum_{T} \sum_{t \in T: v(s_t)=j} p(t) (\Delta i(s_t)^2)

Where IjI_j is the importance of feature j, NTN_T is the number of trees, p(t)p(t) is the proportion of samples reaching node t, and Δi(st)2Δi(s_t)^2 is the decrease in impurity.

  • Weight Adaptation:

    • We adjust the weights in our composite grading system based on the feature importance:

wknew=wkold+η(IkIˉ)w_k^{new} = w_k^{old} + \eta \cdot (I_k - \bar{I})

Where wkneww_k^{new} is the updated weight for factor k, ηη is the learning rate, IkI_k is the importance of factor k, and Iˉ\bar{I} is the mean importance across all factors.

This adaptive approach allows our model to continuously improve its accuracy by learning from new data and adjusting the relative importance of different factors based on their observed impact on mosquito activity.


References