Notebook 3: Monte Carlo Filtering (MCF): Design of interventions¶

Authors: A. Carmona-Cabrero and R. Muñoz-Carpena, University of Florida, carpena@ufl.edu

Date: March 2024

We have explored how sensitive the outputs are to the inputs. Now, we will employ Monte Carlo Filtering (MCF) to explore if the behavioural output (based on a desired threshold of the output) is favored by a narrower range of the input factors we identified with GSA. This allows us to define management interventions objectively.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from SALib.sample import saltelli
from SALib.analyze import sobol
from ring_plot_funct import ring_plot
In [2]:
input_names = ['MutMxSz', 'MutSSz', 'MutSpP', 'GlutMnSz', 'GlutSSz', 'GlutFzP', 'GlutFzA', 'GlutIniPop']
problem = {'num_vars': len(input_names),
           'names': input_names,
           'bounds': [[1,10], [0.1,5], [0.01,0.3], [0.1,2], [0.1,5], [0.01,0.2], [0.01,0.2], [3,9]]
           }
n_real = 100  # Number of realizations (model runs with the same inputs)
output_names = ['AvgGlutPop', 'AvgGlutRad', 'AvgGlutRadGn', 'AvgMutPop', 'MaxMutPop', 'MinMutPop']  # Output names

data = pd.read_csv('wolf.out', header=None, sep=' ')  # This is with the modified file with just the output in the 4th column (AvgMut)
data.drop(6, axis=1, inplace=True)  # There is a space in the last column
data.columns = output_names
data.head()
Out[2]:
AvgGlutPop AvgGlutRad AvgGlutRadGn AvgMutPop MaxMutPop MinMutPop
0 9.0 11.286599 7.157074 44.0 170.0 0.0
1 9.0 9.756715 1.126875 41.0 120.0 0.0
2 9.0 10.265316 4.335474 35.0 97.0 0.0
3 9.0 8.084866 18.202648 58.0 127.0 0.0
4 9.0 7.882991 5.344454 33.0 171.0 0.0
In [3]:
# Read the GSA sample matrix that generated the outputs
inputs = pd.read_csv('wolf_gsa_matrix_100real.txt', sep='\t', header=None).drop([0], axis=1)  # The file included a column of sens_class and an empty column
inputs.columns = input_names
# Let's check everything looks good
inputs.head()
Out[3]:
MutMxSz MutSSz MutSpP GlutMnSz GlutSSz GlutFzP GlutFzA GlutIniPop
0 2.977539 0.57373 0.160381 1.38584 1.47334 0.182373 0.018721 8.396484
1 2.977539 0.57373 0.160381 1.38584 1.47334 0.182373 0.018721 8.396484
2 2.977539 0.57373 0.160381 1.38584 1.47334 0.182373 0.018721 8.396484
3 2.977539 0.57373 0.160381 1.38584 1.47334 0.182373 0.018721 8.396484
4 2.977539 0.57373 0.160381 1.38584 1.47334 0.182373 0.018721 8.396484

Let's define a threshold for the desired population of Muttons:

In [4]:
threshold = 70
In [5]:
data.drop(['MinMutPop', 'AvgGlutPop'], axis=1, inplace=True)  # We drop MinMutPop as it is always 0
output_names = output_names[:-1]
output_names = output_names[1:]
data.head()
Out[5]:
AvgGlutRad AvgGlutRadGn AvgMutPop MaxMutPop
0 11.286599 7.157074 44.0 170.0
1 9.756715 1.126875 41.0 120.0
2 10.265316 4.335474 35.0 97.0
3 8.084866 18.202648 58.0 127.0
4 7.882991 5.344454 33.0 171.0

We can plot and calculate how likely it is that the sheep population would be over 70:

In [6]:
# Plot the output distributions

# Create a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Plot the output distributions in the first subplot
for output in ['AvgMutPop']:
    data[output].hist(alpha=0.25, label=output, density=True, bins=50, ax=ax1)
    # Plot threshold line for behavioral distribution
    ax1.plot([threshold, threshold], [0, 0.006], '--k')
ax1.legend()
ax1.set_xlabel('Population')
ax1.set_ylabel('Density')
#ax1.set_xlim(0, 200)  # Set X-axis limit up to 200 units

# Plot cumulative probability in the second subplot
sorted_data = data['AvgMutPop'].sort_values()
cumulative_prob = (sorted_data.rank() - 1) / len(sorted_data)
ax2.plot(sorted_data, cumulative_prob)
# Plot threshold line
ax2.axvline(x=threshold, color='r', linestyle='--')
ax2.set_xlabel('Population')
ax2.set_ylabel('Cumulative Probability')
ax2.set_xlim(0, 200)  # Set X-axis limit up to 200 units

# Calculate also how likely it is that the sheep population would be over the threshold
print()
print('Probability of AvgMutPop > {}: '.format(threshold), (data['AvgMutPop'] > threshold).sum()/len(data))
Probability of AvgMutPop > 70:  0.161720703125

We can also see some statistics of our outputs:

In [7]:
data.describe()
Out[7]:
AvgGlutRad AvgGlutRadGn AvgMutPop MaxMutPop
count 1.024000e+06 1.024000e+06 1.024000e+06 1.024000e+06
mean 4.641985e+01 3.964457e+01 9.355988e+01 2.122230e+02
std 7.787010e+01 5.476022e+01 3.652645e+02 4.851996e+02
min 1.009277e-01 1.116429e-12 0.000000e+00 3.000000e+00
25% 3.589138e+00 2.996924e+00 1.000000e+00 2.400000e+01
50% 1.355338e+01 1.514644e+01 8.000000e+00 5.000000e+01
75% 4.856505e+01 5.431364e+01 3.700000e+01 1.410000e+02
max 5.172543e+02 3.131049e+02 7.877000e+03 8.266000e+03

Now, let's relate those output values to the input values that generated them.

In [8]:
# Concatenate the columns of the outputs and the inputs
data_inputs = pd.concat([data, inputs], axis=1)
# Let's check everything looks good
data_inputs.tail()
Out[8]:
AvgGlutRad AvgGlutRadGn AvgMutPop MaxMutPop MutMxSz MutSSz MutSpP GlutMnSz GlutSSz GlutFzP GlutFzA GlutIniPop
1023995 5.129673 8.557813 86.0 96.0 4.449707 0.293799 0.254263 1.247607 4.035791 0.132368 0.191929 5.006836
1023996 6.359249 8.077404 52.0 111.0 4.449707 0.293799 0.254263 1.247607 4.035791 0.132368 0.191929 5.006836
1023997 7.688465 18.294091 38.0 114.0 4.449707 0.293799 0.254263 1.247607 4.035791 0.132368 0.191929 5.006836
1023998 7.205223 39.904821 43.0 97.0 4.449707 0.293799 0.254263 1.247607 4.035791 0.132368 0.191929 5.006836
1023999 9.166289 46.482640 40.0 114.0 4.449707 0.293799 0.254263 1.247607 4.035791 0.132368 0.191929 5.006836
In [9]:
# Set the condition for behavioral distribution
behavioral = data_inputs[data_inputs['AvgMutPop'] > threshold]
print(behavioral.head())
    AvgGlutRad  AvgGlutRadGn  AvgMutPop  MaxMutPop   MutMxSz   MutSSz  \
6     8.490367      0.026615       78.0      126.0  2.977539  0.57373   
13   10.786193      2.638279       80.0       95.0  2.977539  0.57373   
15    7.652617      0.800113      121.0      121.0  2.977539  0.57373   
17   10.407952      0.298364       82.0      135.0  2.977539  0.57373   
29    8.359962      3.549761      116.0      117.0  2.977539  0.57373   

      MutSpP  GlutMnSz  GlutSSz   GlutFzP   GlutFzA  GlutIniPop  
6   0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
13  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
15  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
17  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
29  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
In [10]:
# Set the condition for nonbehavioral distribution
nonbehavioral = data_inputs[data_inputs['AvgMutPop'] <= threshold]
print(nonbehavioral.head())
   AvgGlutRad  AvgGlutRadGn  AvgMutPop  MaxMutPop   MutMxSz   MutSSz  \
0   11.286599      7.157074       44.0      170.0  2.977539  0.57373   
1    9.756715      1.126875       41.0      120.0  2.977539  0.57373   
2   10.265316      4.335474       35.0       97.0  2.977539  0.57373   
3    8.084866     18.202648       58.0      127.0  2.977539  0.57373   
4    7.882991      5.344454       33.0      171.0  2.977539  0.57373   

     MutSpP  GlutMnSz  GlutSSz   GlutFzP   GlutFzA  GlutIniPop  
0  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
1  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
2  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
3  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  
4  0.160381   1.38584  1.47334  0.182373  0.018721    8.396484  

Now, we can map the input values of the simulations to those output values that led to behavioral output values (i.e., what input values are more likely to cause values of Muttons > 70). A quick way of doing this is by plotting the distributions of the input values that led to behavioral outputs.

In [11]:
for input_var in input_names:
    plt.figure(input_var)
    #plt.title(input_var)
    behavioral[input_var].hist(alpha=0.25, label='behavioral', density=True, bins=50)
    nonbehavioral[input_var].hist(alpha=0.25, label='nonbehavioral', density=True, bins=50)
    plt.legend()
    plt.ylabel('Density')
    plt.xlabel(input_var)

Before continuing, we also need to test if the sub-distributions of behavioural and non-behavioural inputs are statistically different. An intervention on each important input is only warranted if the two sub-distributions are different. We test whether the distributions are significantly different using the Kolmogorov-Smirnov test. Note that for this test we need to normalize the input values (x-axis). For this test, H0: the distributions are the same, and we will reject H0 (support the alternative) when p-value < α, for a desired level of significance α.

In [12]:
# Sort and normalize the behavioural and and non-behavioural input data
for input_var in input_names:
    b_sorted = np.sort(behavioral[input_var])/max(behavioral[input_var])
    nonb_sorted = np.sort(nonbehavioral[input_var])/max(nonbehavioral[input_var])

    # Calculate the cumulative probabilities
    prob1 = np.arange(1, len(b_sorted) + 1) / len(b_sorted)
    prob2 = np.arange(1, len(nonb_sorted) + 1) / len(nonb_sorted)

    # Plot CDF for behavioural inputs
    plt.plot(b_sorted, prob1, label='behavioural')

    # Plot CDF for non-behavioural inputs
    plt.plot(nonb_sorted, prob2, label='non-bahavioural')

    # Add labels and legend
    plt.xlabel(f'{input_var} Normalized')
    plt.ylabel('Cumulative Probability')
    plt.legend()

    # Show plot
    plt.grid(True)
    plt.show()

    # Do the test Kolmogorov-Smirnov test on the normalized CDFs
    from scipy.stats import ks_2samp
    ks_test = ks_2samp(behavioral[input_var], nonbehavioral[input_var],
                       alternative='two-sided')
    print('Maximum discrepancy between the distributions is {:4f}, p-value = {:4f} (H0: the distributions are the same)'.format(ks_test[0], ks_test[1]))
Maximum discrepancy between the distributions is 0.192565, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.220262, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.191908, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.038349, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.070009, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.030574, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.031733, p-value = 0.000000 (H0: the distributions are the same)
Maximum discrepancy between the distributions is 0.154237, p-value = 0.000000 (H0: the distributions are the same)

Using the above results above (histograms, and KS tests) we can define management interventions that could increase the probability of AvgMutPop > 70. This is, the intervention should target input factors that meet the following conditions:
a) Can be managed, i.e. an intervention is possible around them. Same factors CANNOT!
b) Are important according to the previous GSA (Notebook 2)
c) The behavioural input distributions that can be separated from the non-behavioural ones (histograms above).
d) The distributions are statistically different (KS test)

These interventions could affect one or more of the important inputs initially identified by GSA. Based on the criteria given above (a-d), some possible interventions would be:

  1. Reduce MutSSz
  2. Increase MutSpP
  3. Reduce GlutSSz
  4. Reduce GlutIniPop

Notice that some of the interventions may not be possible. For example, the legislation may not permit controlling the Glutton population.

Here, we will intervene over the important input factor MutSpP. In our model, this intervention will be captured by changing the distribution of MutSpP. From the behavioural/nonbehavioural density plots above, for MutSpP we can see that there is a higher density of behavioural outputs when MutSpP>0.156. In this case the intervention would be centered around mantaining the variation of this important output above that value, for example by changing the "breed" of the sheep to one with more fertility or improving the sheep nutrition with suplements or better grass to increase this. We will then evaluate this intervention by changing the distribution of MutSpP from the original U[0.01, 0.3] to U[0.156, 0.3]:

In [13]:
# Define the GSA input distributions
input_names = ['MutMxSz', 'MutSSz', 'MutSpP', 'GlutMnSz', 'GlutSSz', 'GlutFzP', 'GlutFzA', 'GlutIniPop']
problem2 = {'num_vars': len(input_names),
           'names': input_names,
           'bounds': [[1,10], [0.1,5], [0.156,0.3], [0.1,2], [0.1,5], [0.01,0.2], [0.01,0.2], [3,9]]
           }
l = 1024  # Sampling intensity
param_values2 = pd.DataFrame(saltelli.sample(problem2, l, calc_second_order=False), columns=input_names)  # calc_second_order=True for second-order interactions
print("GSA sample matrix size: {}".format(param_values2.shape))
param_values2.head()
GSA sample matrix size: (10240, 8)
Out[13]:
MutMxSz MutSSz MutSpP GlutMnSz GlutSSz GlutFzP GlutFzA GlutIniPop
0 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
1 7.270996 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
2 1.013184 3.389795 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
3 1.013184 1.944678 0.274336 1.024951 2.829932 0.170405 0.055923 6.524414
4 1.013184 1.944678 0.220477 1.850635 2.829932 0.170405 0.055923 6.524414

And now let's add the sens_class to the sample file to run it on the HPC.

In [14]:
sample_matrix = pd.concat([pd.DataFrame(['sens_class']*param_values2.shape[0]), param_values2], axis=1)
sample_matrix.head()
n_real = 100  # We will only run 100 realizations this time because we do not intend to decompose the variance
sample_matrix_realizations = pd.DataFrame(np.repeat(sample_matrix.to_numpy(), repeats=n_real, axis=0))
sample_matrix_realizations.head()
Out[14]:
0 1 2 3 4 5 6 7 8
0 sens_class 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
1 sens_class 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
2 sens_class 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
3 sens_class 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414
4 sens_class 1.013184 1.944678 0.220477 1.024951 2.829932 0.170405 0.055923 6.524414

Let's save the new sampling matrix to run the model with the intervention:

In [15]:
# Save GSA sample matrix
sample_matrix_realizations.to_csv('wolf_mcf_matrix_{}real.txt'.format(n_real), header=False, index=False, sep='\t')

Now, we can run new simulations using the new GSA sample matrix to evaluate how effective our intervention would be.