Notebook 1: Global Sensitivity analysis for Stochastic models¶

Authors: A. Carmona-Cabrero and R. Muñoz-Carpena, University of Florida, carpena@ufl.edu

Date: March 2024

Stochastic models (like our class example Gluttons n’ Muttons agent-based-model- ABM) are those that for the same set of inputs produce different outputs every time due to stochastic processes (those drawn from probability distributions such as random walk, MCMC such us Metropolis-Hastings, etc.). Agent-based models (ABMs) are promising tools for improving our understanding of complex natural- human systems and supporting decision-making processes. ABM bottom-up approach is increasingly employed to recreate emergent behaviors that mimic real complex system dynamics. However, often the knowledge and data available for building and testing the ABM and its parts are scarce. Due to ABM output complexity and stochasticity, exhaustive analysis methods are required to increase ABM transparency and ensure that the ABM behavior mimics the real system. Global sensitivity analysis (GSA) is one of the most used model analysis methods, as it identifies the most important model inputs and their interactions and can be used to explore model behaviors that occur in certain regions of the parameter space. However, due to ABM’s stochastic nature, GSA application to ABMs can result in misleading interpretations of the ABM input importance. Here we present a recently proposed framework (Carmona-Cabrero et al., 2024) to analyze stochastic models with an extension of the original GSA for deterministic models. Details are provided in the class slides (Part II). We will use our class example (PreyPredator.jar) but this time we will control manually the number of replications for each input set (no replications, MCS=1 on the input.dat.ptr file). For this we will follow 4 steps, each with a separate Jupyter notebook:

(this notebook) Create a Sobol GSA sample with a number of replications for each sample and prepare the matrix.txt file ready to create HPC jobs with our class scripts (make_jobs.sh).
Analyze the results from the replicated simulations of the model to decompose the variance of the output into a deterministic and stochastic components and the input factors controlling each, pilot the results (ring diagrams)
Design an intervention strategfy to improve the probability of achieving the desired outcome of sheep population >70 (behavioural outcome). We will employ Monte Carlo Filtering (MCF) to itentify input factors suitable for intervention and prepare the intervention input sample file ready to run in the HPC.
Do the post-intervention analysis to quantify the effectiveness of the proposed intervention and discuss.

IMPORTANT: to run the notebooks make sure the libraries are installed before calling them. For HyperGator OOD Jupyter session, make sure that the UFRC Python-3.x kernel is selected before running the notebooks.

Create a GSA sample matrix¶

We will use the Python library SALib (https://salib.readthedocs.io/en/latest/). A limitation of this library is that it only allows for a few types of continuous input distributions: uniform (unif), log-uniform (logunif), triangular (triang), normal (norm), lognormal (lognorm), truncated-normal (trucnorm). Despite of this limitation, this library is fast and easy to use and suitable for common GSUA applications. For a more complete library on sensitivity analysis use the R package sensitivity (https://cran.r-project.org/web/packages/sensitivity/sensitivity.pdf).

In [1]:

import numpy as np
import pandas as pd
from SALib.sample import saltelli

First, we define the problem of our model analysis. This is defining the model input distributions. Here, the input distributions were selected based on model stability and output-behavior criteria.

In [2]:

input_names = ['MutMxSz', 'MutSSz', 'MutSpP', 'GlutMnSz', 'GlutSSz', 'GlutFzP', 'GlutFzA', 'GlutIniPop']
problem = {'num_vars': len(input_names),
           'names': input_names,
           'bounds': [[1,10], [0.1,5], [0.01,0.3], [0.1,2], [0.1,5], [0.01,0.2], [0.01,0.2], [3,9]]
           }

Then, we generate the GSA sample matrix. For this, we need to define the GSA sampling intensity, l, and the order of the sensitivity indices we are estimating. By default SALib will estimate first ($S_{i}$) and total-order ($S_{Ti}$) sensitivity indices (calc_second_order=False), but can estimate up to second-order indices.

We obtain a GSA sample matrix, where each line is an independent sample of our input distributions:

In [3]:

l = 1024  # Sampling intensity
param_values = pd.DataFrame(saltelli.sample(problem, l, calc_second_order=False), columns=input_names)  # calc_second_order=True for second-order interactions. 
print("GSA sample matrix size: {}".format(param_values.shape))
param_values.head()

GSA sample matrix size: (10240, 8)

Out[3]:

	MutMxSz	MutSSz	MutSpP	GlutMnSz	GlutSSz	GlutFzP	GlutFzA	GlutIniPop
0	2.977539	0.573730	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
1	5.508789	0.573730	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
2	2.977539	0.439746	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
3	2.977539	0.573730	0.034639	1.385840	1.47334	0.182373	0.018721	8.396484
4	2.977539	0.573730	0.160381	0.584277	1.47334	0.182373	0.018721	8.396484

Let's also check the distribution of the input factors (uniform distributions with the ranges assigned above).

In [4]:

import matplotlib.pyplot as plt
for i, input_var in enumerate(input_names):
    fig, ax1 = plt.subplots()
    ax1.set_title(input_var)
#    ax1.hist(sample_matrix[input_var])
    ax1.hist(param_values[input_var])
    ax1.set_ylabel('Frequency')
    ax1.set_xlabel('Value')

For our HPC code, we will need to add the call of the shell function "sens_class" in front of every GSA sample. Hence, we add a column of with the 'sens_class' string to the DataFrame:

In [5]:

sample_matrix = pd.concat([pd.DataFrame(['sens_class'] * param_values.shape[0]), param_values], axis=1)
sample_matrix.head()

Out[5]:

	0	MutMxSz	MutSSz	MutSpP	GlutMnSz	GlutSSz	GlutFzP	GlutFzA	GlutIniPop
0	sens_class	2.977539	0.573730	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
1	sens_class	5.508789	0.573730	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
2	sens_class	2.977539	0.439746	0.160381	1.385840	1.47334	0.182373	0.018721	8.396484
3	sens_class	2.977539	0.573730	0.034639	1.385840	1.47334	0.182373	0.018721	8.396484
4	sens_class	2.977539	0.573730	0.160381	0.584277	1.47334	0.182373	0.018721	8.396484

As our model is stochastic, random processes occur, its uncertainty has two components, deterministic (related to the model structure and input distributions) and stochastic (related to the random processes in the model). Hence, model realizations (model runs with the same inputs) are required to accurately estimate input importance. This will allow us to decompose both uncertainty components. In general, a relatively low number of realizations (M) are needed to decompose the deterministic uncertainty (e.g., M=l(k+2), where l is the sampling intensity given above and k is the number of inputs, in our case k=8) and a higher number for the stochastic component (e.g., M=n_real.l(k+2) ).

Here, we will replicate the model runs (i.e., the GSA samples) 100 times (e.g., n_real=100).

In [6]:

n_real = 100
sample_matrix_realizations = pd.DataFrame(np.repeat(sample_matrix.to_numpy(), repeats=n_real, axis=0))
print("GSA sample matrix size: {}".format(sample_matrix_realizations.shape))

GSA sample matrix size: (1024000, 9)

Finally, we save our GSA sample matrix without headers and indexes and use tabs as separators, so it is ready to be used by our HPC code.

In [7]:

sample_matrix_realizations.to_csv('wolf_gsa_matrix_{}real.txt'.format(n_real), header=False, index=False, sep='\t')