Model execution#

This notebook provides an analysis of the sharing of models against our best practices for enabling other to use and execute the simulation model. In summary this is defined as:

The authors provide a readme or other obvious instruction file for users to consult;
The authors provide step by step instructions to run the DES model;
Models are shared with either informal or formal software dependency management;
Models are shared with details of model and/or code testing;
The model or model code is downloadable to enable local execution;
The model is shared in a manner that enables execution online without the need to install locally.

Notebook aims#

The notebook analyses the following questions related to best practice:

What proportion of the shared model artefacts have a readme or equivalent file?
What proportion of artefacts have step by step instructions to use them?
What proportion of models have formal and informal dependency management included?
What proportion of models are shared with evidence that they have been tested?

Given the findings we also report the following exploratory questions

How are models developed in a VIM that cannot be downloaded shared?
What coding languages provided dependency management?

Data used in analysis#

The dataset is a subset of the main review - limited to models shared. The type of model shared is coded as Visual Interactive Modelling (VIM) based (e.g Anylogic, Simul8, Arena) versus CODE (e.g. Matlab, Python, SimPy, Java, R Simmer).

The data can be found here: https://raw.githubusercontent.com/TomMonks/des_sharing_lit_review/main/data/bp_audit.zip

The following fields are analysed in this notebook.

model_format - VIM or CODE
readme - is there an obvious file(s) where a user would look first? (0/1)
steps_run - are there steps to run a model? (0/1)
formal_dep_mgt - has the model been shared with formal software dependency management? (0/1)
informal_dep_mgt - have any informal methods of dependency management been shared? E.g. a list of software requirements. (0/1)
evidence_testing - do the model and artefacts in the repository contain any evidence that they have been tested? (0/1)
downloadable - can the model and artefacts be downloaded and executed locally? (0/1)
interactive_online - can the model and its artefacts be executed online without local installation? (0/1)
model_archive - name of archive if used (0/1)
model_repo - name of model repo if used (0/1)
model_journal_supp - what is stored in the journal supplementary material (0/1)
model_personal_org - name of personal or organisational website if used (0/1)
model_platform - name of cloud platform used (e.g. Binder or Anylogic cloud) (0/1)|

1. Imports#

1.1. Standard#

import pandas as pd
import numpy as np

1.2 Preprocessing#

from preprocessing import load_clean_bpa, drop_columns

2. Constants#

FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/bp_audit.zip'

3. Analysis functions#

A number of simple functions to conduct the analysis and format output.

def balance_of_model_format(df):
    '''
    REturns the counts of VIM versus code
    
    Params:
    -------
    df: pd.DataFrame
        Subset of the best practice dataset to analyse
        
    Returns:
    (labels: List, counts: List)
    '''
    unique_elements, counts_elements = np.unique(df['model_format'], 
                                                 return_counts=True)
    return unique_elements, counts_elements

def category_frequencies_by_model_format(df, cols):
    '''
    Calculate the frequencies of 0/1s for VIM versus code.
    Return concatenated in a pandas dataframe
    
    Params:
    ------
    df: pd.DataFrame
        DAtaframe containing subset of best practice audit to summarise.
    
    Returns:
    -------
    pd.DataFrame
    '''

    # key to select fields where category is 1.
    key = [('CODE', 1), ('VIM', 1)]

    df = pd.DataFrame()

    # operation needs to be done separetly on each criteria then combined.
    for col in list(clean[cols]):
        # group by VIM and code and get frequencies of 1/0
        results = clean.groupby('model_format')[col].value_counts(dropna=False)
        # concat to single dataframe
        df = pd.concat([df, results.loc[key]], axis=1)
        
    # drop multi-index, transpose and relabel
    df = df.reset_index()
    df = df.T
    df = df.drop(['level_0', 'level_1'])
    df.columns = ['CODE', 'VIM']
    
    # add percentages
    # get total number of code and vim based models.
    _, (n_code, n_vim) = balance_of_model_format(clean)
    per_cols = ['CODE_%', 'VIM_%']
    df[per_cols[0]] = (df['CODE'] / n_code * 100).map('{:,.1f}'.format)
    df[per_cols[1]] = (df['VIM'] / n_vim * 100).map('{:,.1f}'.format)
    
    return df

4. Load and inspect dataset#

The clean data set has 27 fields included. These are listed below.

clean = load_clean_bpa(FILE_NAME)
clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 27 columns):
 #   Column                        Non-Null Count  Dtype   
---  ------                        --------------  -----   
 model_format                  47 non-null     category
 key                           47 non-null     object  
 item_type                     47 non-null     category
 pub_yr                        47 non-null     int64   
 author                        47 non-null     object  
 doi                           46 non-null     object  
 reporting_guidelines_mention  47 non-null     category
 covid                         47 non-null     category
 sim_software                  47 non-null     object  
 foss_sim                      47 non-null     category
model_archive                 5 non-null      object  
model_repo                    21 non-null     object  
model_journal_supp            10 non-null     object  
model_personal_org            6 non-null      object  
model_platform                11 non-null     object  
github_url                    21 non-null     object  
model_has_doi                 47 non-null     category
orcid                         46 non-null     category
license                       47 non-null     object  
readme                        47 non-null     category
link_to_paper                 37 non-null     category
steps_run                     47 non-null     category
formal_dep_mgt                47 non-null     category
informal_dep_mgt              47 non-null     category
evidence_testing              25 non-null     category
downloadable                  47 non-null     category
interactive_online            47 non-null     category
dtypes: category(15), int64(1), object(11)
memory usage: 7.1+ KB

5. Results#

5.1. Overview split by Code and VIM models#

cols = ['readme', 'steps_run', 'formal_dep_mgt', 'informal_dep_mgt', 
        'evidence_testing', 'downloadable', 'interactive_online']

category_frequencies_by_model_format(clean, cols)

	CODE	VIM	CODE_%	VIM_%
readme	21	7	67.7	43.8
steps_run	13	3	41.9	18.8
formal_dep_mgt	7	0	22.6	0.0
informal_dep_mgt	7	8	22.6	50.0
evidence_testing	3	0	9.7	0.0
downloadable	31	11	100.0	68.8
interactive_online	4	6	12.9	37.5

5.2 How are models developed in a VIM that cannot be downloaded shared?#

vim_non_downloads = clean[(clean['model_format'] == 'VIM') 
                          & (clean['downloadable'] == 0)]

# number of models that cannot be downloaded.
vim_non_downloads.shape[0]

selected_columns = ['model_archive', 'model_repo', 'model_journal_supp',
                    'model_personal_org', 'model_platform']

vim_non_downloads[selected_columns]

	model_archive	model_repo	model_journal_supp	model_personal_org	model_platform
31	NaN	NaN	NaN	NaN	AnyLogic Cloud
33	NaN	NaN	NaN	NaN	AnyLogic Cloud
37	NaN	NaN	NaN	NaN	AnyLogic Cloud
38	NaN	NaN	NaN	NaN	AnyLogic Cloud
41	NaN	NaN	NaN	NaN	AnyLogic Cloud

All of the models that cannot be downloaded and used locally have been shared by AnyLogic Cloud.

5.3 What coding languages provided dependency management?#

code_formal_dept_mgts = clean[(clean['model_format'] == 'CODE') 
                              & (clean['formal_dep_mgt'] == 1)]

# number of models with formal dependency management
code_formal_dept_mgts.shape[0]

code_formal_dept_mgts['sim_software']

   R Simmer
   R Simmer
          R
         R
     SimPy
     SimPy
     SimPy
Name: sim_software, dtype: object

Model and code sharing practices in healthcare discrete-event simulation - a systematic review

Model execution

Contents