Model archiving#

This notebook provides an summary of how DES models were shared. This uses Janssen et al (2020) methodology to classify model archiving. The summary breaks into the following categories: open science archives, online code repositories, personal or organisation websites, or an online platform. This is further summarise by models developed via code based tools or Visual Interative Modelling (VIM) software. The latter is typically a single file.

The notebook answers the following research question:

What methods, tools, and resources did authors use to share their computer models and code?

Data used in analysis#

The dataset is a subset of the main review - limited to models shared. The type of model shared is coded as Visual Interactive Modelling (VIM) based (e.g Anylogic, Simul8, Arena) versus CODE (e.g. Matlab, Python, SimPy, Java, R Simmer).

The data can be found here: https://raw.githubusercontent.com/TomMonks/des_sharing_lit_review/main/data/bp_audit.zip

1. Imports#

1.1. Standard#

import pandas as pd
import numpy as np

1.2 Preprocessing#

from preprocessing import load_clean_bpa, drop_columns

2. Constants#

FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/bp_audit.zip'

3. Analysis functions#

A number of simple functions to conduct the analysis and format output.

def get_counts(df, column):
    '''
    For a specified column return a Dataframe containing two columns
    methods and counts.  The methods are unique and the n represents
    the number of instances in the dataset.
    
    Params:
    ------
    df: pd.DataFrame
        The pandas dataframe containing the cohort of interest
        
    columns: str
        The column containing the values to count.
        
    Returns:
    -------
    pd.DataFrame 
    
    '''
    method = df[~df[column].isna()][column]
    unique_elements, counts_elements = np.unique(method, return_counts=True)
    unique_elements, counts_elements = pd.DataFrame(unique_elements), \
                                        pd.DataFrame(counts_elements)
    results = pd.concat([unique_elements, counts_elements], axis=1)
    results.columns = ['method', 'n']
    return results.set_index('method').sort_values('n', ascending=False)

def get_model_format_summary(model_format, df_code, df_vim, category):
    code = get_counts(df_code, model_format)          
    vim = get_counts(df_vim, model_format)       
    comb = pd.concat([code, vim], axis=1)
    comb.columns = ['CODE', 'VIM']
    comb = comb.fillna(0).astype('int')
    comb['category'] = category
    comb = comb.reset_index()
    return comb.set_index(['category', 'method'])

def multiple_archive_methods(df, jansson_method):
    '''
    identifies if the column has 1 or mode models that are shared
    by multiple archiving methods.  For example, Zenodo + GitHub.
    Returns list of all archive methods in a list.
    
    Params:
    ------
    df: pd.DataFrame
        The pandas dataframe containing the cohort of interest
                
    jansson_method: list
        A list of jansson method fields. Assumes first field in model_format
        and this is excluded from analysis.
        
    Returns:
    -------
    list
    
    '''
    jansson = df[jansson_method[1:]].fillna(0)
    # all non zeros to 1 (via bool -> int)
    jansson = jansson.astype(bool).astype(int)
    multiple_achived = clean[jansson.sum(axis=1) > 1][jansson_method]
    multiple_achived

    # loop through columns and get uniques
    results = []
    for col in jansson_method[1:]:
        results += get_counts(multiple_achived, col).index.tolist()

    return results

4. Load and inspect dataset#

The dataframe clean contains the full dataset used in the best practice audit.

clean = load_clean_bpa(FILE_NAME)

Split into code and visual interactive dataframes to assist in creating main summary

jansson_method = ['model_format', 'model_archive', 'model_repo', 'model_journal_supp',
                  'model_personal_org', 'model_platform']

df_code = clean[jansson_method]
df_code = df_code[df_code['model_format'] == 'CODE']

df_vim = clean[jansson_method]
df_vim = df_vim[df_vim['model_format'] == 'VIM']

5. Results#

The main aim of the results section is to summarise all archiving methods in a single table (in the same style of Janssen et al, 2020). This is built up category by category and then all tables are combined.

5.1 Overall numeric summary#

clean[jansson_method].groupby(by='model_format').count().T

model_format	CODE	VIM
model_archive	2	3
model_repo	20	1
model_journal_supp	6	4
model_personal_org	4	2
model_platform	5	6

5.2 Open science archives#

ARCHIVE = 'model_archive'
archive_results = get_counts(clean[jansson_method], ARCHIVE)
archive_results

	n
method
Zenodo	2
Institutional	1
Mendeley	1
Research Square	1

archive_comb = get_model_format_summary('model_archive', df_code, df_vim, 
                                        'Archive')
archive_comb

		CODE	VIM
category	method
Archive	Institutional	1	0
	Zenodo	1	1
	Mendeley	0	1
	Research Square	0	1

5.2 Model repositories#

repo_results = get_counts(clean[jansson_method], 'model_repo')                            
repo_results

	n
method
GitHub	20
GitLab	1

repo_comb = get_model_format_summary('model_repo', df_code, df_vim, 
                                     'Repository')
repo_comb

		CODE	VIM
category	method
Repository	GitHub	19	1
Repository	GitLab	1	0

5.3 Format of models stored in journal supplmentary material#

supp_results = get_counts(clean[jansson_method], 'model_journal_supp')                            
supp_results

	n
method
File	5
Word doc	3
PDF	1
r script	1

supp_comb = get_model_format_summary('model_journal_supp', df_code, df_vim, 
                                     'Journal')
supp_comb

		CODE	VIM
category	method
Journal	Word doc	3	0
	File	1	4
	PDF	1	0
	r script	1	0

5.4 Personal and organisational websites#

org_results = get_counts(clean[jansson_method], 'model_personal_org')                            
org_results

	n
method
Organisational website	4
Google Drive	2

org_comb = get_model_format_summary('model_personal_org', df_code, df_vim,
                                    'Personal or Organisational')
org_comb

		CODE	VIM
category	method
Personal or Organisational	Organisational website	4	0
Personal or Organisational	Google Drive	0	2

5.5 Platform#

platform_results = get_counts(clean[jansson_method], 'model_platform')                            
platform_results

	n
method
AnyLogic Cloud	6
CRAN	2
BinderHub	1
Google Colab	1
R Shiney	1

platform_comb = get_model_format_summary('model_platform', df_code, df_vim,
                                         'Platform')
platform_comb

		CODE	VIM
category	method
Platform	CRAN	2	0
	BinderHub	1	0
	Google Colab	1	0
	R Shiney	1	0
	AnyLogic Cloud	0	6

5.7 Overall summary table#

jansson_table = pd.concat([archive_comb, repo_comb, 
                           supp_comb, org_comb, platform_comb])
jansson_table

		CODE	VIM
category	method
Archive	Institutional	1	0
	Zenodo	1	1
	Mendeley	0	1
	Research Square	0	1
Repository	GitHub	19	1
Repository	GitLab	1	0
Journal	Word doc	3	0
	File	1	4
	PDF	1	0
	r script	1	0
Personal or Organisational	Organisational website	4	0
Personal or Organisational	Google Drive	0	2
Platform	CRAN	2	0
	BinderHub	1	0
	Google Colab	1	0
	R Shiney	1	0
	AnyLogic Cloud	0	6

5.8 Modify Jansson table to indicate combinations#

The table below incorporates a small change in the table. All archiving methods that have been used in combination with others are flagged with an asterisk.

multi_methods = multiple_archive_methods(clean, jansson_method)

recode = {'method':{}}
for method in multi_methods:
    recode['method'][method] = f'{method}*'

recode
jansson_table = jansson_table.reset_index()
jansson_table = jansson_table.replace(recode)
jansson_table = jansson_table.set_index(['category', 'method'])
jansson_table

		CODE	VIM
category	method
Archive	Institutional*	1	0
	Zenodo*	1	1
	Mendeley	0	1
	Research Square	0	1
Repository	GitHub*	19	1
Repository	GitLab*	1	0
Journal	Word doc	3	0
	File	1	4
	PDF	1	0
	r script	1	0
Personal or Organisational	Organisational website*	4	0
Personal or Organisational	Google Drive	0	2
Platform	CRAN	2	0
	BinderHub*	1	0
	Google Colab	1	0
	R Shiney*	1	0
	AnyLogic Cloud	0	6

6. Output table as LaTeX#

print(jansson_table.style.to_latex(hrules=True, 
                                   label="Table:4", 
                    caption="Janssen et al. classification of mode archiving"))

\begin{table}
\caption{Janssen et al. classification of mode archiving}
\label{Table:4}
\begin{tabular}{llrr}
\toprule
 &  & CODE & VIM \\
category & method &  &  \\
\midrule
\multirow[c]{4}{*}{Archive} & Institutional* & 1 & 0 \\
 & Zenodo* & 1 & 1 \\
 & Mendeley & 0 & 1 \\
 & Research Square & 0 & 1 \\
\multirow[c]{2}{*}{Repository} & GitHub* & 19 & 1 \\
 & GitLab* & 1 & 0 \\
\multirow[c]{4}{*}{Journal} & Word doc & 3 & 0 \\
 & File & 1 & 4 \\
 & PDF & 1 & 0 \\
 & r script & 1 & 0 \\
\multirow[c]{2}{*}{Personal or Organisational} & Organisational website* & 4 & 0 \\
 & Google Drive & 0 & 2 \\
\multirow[c]{5}{*}{Platform} & CRAN & 2 & 0 \\
 & BinderHub* & 1 & 0 \\
 & Google Colab & 1 & 0 \\
 & R Shiney* & 1 & 0 \\
 & AnyLogic Cloud & 0 & 6 \\
\bottomrule
\end{tabular}
\end{table}

Model and code sharing practices in healthcare discrete-event simulation - a systematic review

Model archiving

Contents