Simulation software#

The results in this notebook do not directly answer any of our primary research questions. The results support RQ2:

  1. What methods, tools, and resources did authors use to share their computer models and code?

The results also illustrate that ~11% of the literature do not report the simulation software used.

1. Imports#

1.1. Standard Imports#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
# set up plot style as ggplot
plt.style.use('ggplot')

1.2 Imports from preprocessing module#

# function for loading full dataset
from preprocessing import load_clean_dataset

2. Constants#

FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/share_sim_data_extract.zip'

RG_LABEL = 'reporting_guidelines_mention'
NONE = 'None'
WIDTH = 0.5

3. Functions#

3.1. Functions to create summary statistics#

Two functions are used together in order to generate the high level results by year.

  • high_level_metrics - takes a subgroup of the dataset and generates summary statistics and counts

  • analysis_by_year - loop through the years passing each to high_levle_metrics and concatenates datasets at the end.

def software_count(column, threshold=2):
    """
    Return a count of simulation software.
    
    If the count of software is less than 2 the it is labelled as 'Other' 
    
    Params:
    -------
    column: pandas Series
    
    Returns:
    -------
    pd.DataFrame
    """
    counts = column.value_counts().to_frame().reset_index()
    counts.columns = ['software', 'count']
    summarised = counts[counts['count'] <= threshold].sum()
    counts.loc[counts['count'] <= threshold, 'software'] = 'Other'
    counts = counts.groupby('software').sum()
    counts.loc['Other'] = summarised

    return counts

4. Read in data#

clean = load_clean_dataset(FILE_NAME)

5. Results#

5.1 Overall summary table#

software_counts = software_count(clean['sim_software'], threshold=2)
software_counts['n(\%)'] = \
    software_counts['count'] / software_counts['count'].sum() *100
software_counts = software_counts.sort_values('count', ascending=False)
software_counts['n(\%)'] = software_counts['n(\%)'].round(1)
software_counts
count n(\%)
software
Arena 124 21.6
AnyLogic 78 13.6
Unknown 60 10.5
Simul8 51 8.9
Other 41 7.1
R 35 6.1
FlexSim 23 4.0
Excel 21 3.7
Simio 21 3.7
MATLAB 19 3.3
SimPy 18 3.1
R Simmer 15 2.6
TreeAge 14 2.4
Python 10 1.7
ExtendSim 7 1.2
C++ 6 1.0
Salabim 5 0.9
MedModel 5 0.9
Flexsim 5 0.9
ProModel 4 0.7
Plant Simulation 3 0.5
WITNESS 3 0.5
anyLogistix 3 0.5
iGrafx 3 0.5

6. Output to LaTeX#

print(software_counts.style.to_latex(hrules=True, 
                                   label="DES Software", 
                    caption="Software used in DES healthcare studies"))
\begin{table}
\caption{Software used in DES healthcare studies}
\label{DES Software}
\begin{tabular}{lrr}
\toprule
 & count & n(\%) \\
software &  &  \\
\midrule
Arena & 124 & 21.600000 \\
AnyLogic & 78 & 13.600000 \\
Unknown & 60 & 10.500000 \\
Simul8 & 51 & 8.900000 \\
Other & 41 & 7.100000 \\
R & 35 & 6.100000 \\
FlexSim & 23 & 4.000000 \\
Excel & 21 & 3.700000 \\
Simio & 21 & 3.700000 \\
MATLAB & 19 & 3.300000 \\
SimPy & 18 & 3.100000 \\
R Simmer & 15 & 2.600000 \\
TreeAge & 14 & 2.400000 \\
Python & 10 & 1.700000 \\
ExtendSim & 7 & 1.200000 \\
C++ & 6 & 1.000000 \\
Salabim & 5 & 0.900000 \\
MedModel & 5 & 0.900000 \\
Flexsim & 5 & 0.900000 \\
ProModel & 4 & 0.700000 \\
Plant Simulation & 3 & 0.500000 \\
WITNESS & 3 & 0.500000 \\
anyLogistix & 3 & 0.500000 \\
iGrafx & 3 & 0.500000 \\
\bottomrule
\end{tabular}
\end{table}