Use of reporting guidelines#

Primary research questions:#

These results presented in this notebook the following questions

  1. What proportion of studies make use of a reporting guideline?

1. Imports#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## Imports from preprocessing module
from preprocessing import load_clean_dataset

2. Constants#

FILE_NAME = 'https://raw.githubusercontent.com/TomMonks/' \
    + 'des_sharing_lit_review/main/data/share_sim_data_extract.zip'

RG_LABEL = 'reporting_guidelines_mention'
NONE = 'None'

3. Functions#

def reporting_guideline_summary(df_clean, exclude_none=True):
    '''
    For studies included, summarise reporting guidelines.
    Returned as name; n; % of included table
    
    Params:
    ------
    df_clean; pd.DataFrame
        All papers
        
    exclude_none: bool, optional (default=True)
        Excludes the row for "None" i.e. no reporting guideline mention
        
    Returns:
    -------
    pd.DataFrame
    '''
    # restrict to included studies only
    included = df_clean[df_clean['study_included'] == 1]
    
    # exclude or include 'None'
    if exclude_none:
        report_guidelines = included[included[RG_LABEL] != NONE]
    else:
        reporting_guidelines = included
        
    # frequency + percentage
    counts = report_guidelines.groupby([RG_LABEL])['key'] \
        .count().sort_values(ascending=False)
    percentages = counts / len(included)
    
    # summary table
    summary = pd.concat([counts, (percentages * 100).round(1)], axis=1)
    summary.columns = ['n', '% of included']
    summary = summary.drop(NONE, axis=0)
    return summary.sort_values(by=['n'], ascending=False)
def guidelines_by_subset(df_clean, field, column_label):
    subset = df_clean[df_clean[field] == 1]
    summary = reporting_guideline_summary(subset)
    summary.columns = [column_label, '% of included']
    return summary
    

4. Read in data#

clean = load_clean_dataset(FILE_NAME)

5. Results#

5.1 Create a high level summary of the reporting guidelines used.#

# overall
overall_summary = reporting_guideline_summary(clean)
overall_summary
n % of included
reporting_guidelines_mention
ISPOR 37 6.6
STRESS 22 3.9
CHEERS 8 1.4
SQUIRE 2 0.4
ODD 1 0.2
Sanders et al. 1 0.2
Zhang et al. 1 0.2

The most frequent guidelines used were ISPOR; typically within papers publishing DES models used in a cost effectiveness study.

# covid only?
guidelines_by_subset(clean, 'covid', 'Covid')
Covid % of included
reporting_guidelines_mention
STRESS 9 13.0
CHEERS 1 1.4
ISPOR 0 0.0
ODD 0 0.0
SQUIRE 0 0.0
Sanders et al. 0 0.0
Zhang et al. 0 0.0

5.2 What proportion overall made use of any reporting guideline?#

n_reporting = overall_summary['n'].sum() 
total_included = len(clean[clean['study_included'] == 1])
per_reporting = (n_reporting / total_included) * 100

txt = f'A total of {n_reporting} ({per_reporting:.1f}\%) studies used models' \
        + f' published in articles that mentioned a known simulation' \
        + ' reporting guideline or checklist.'
    
print(txt)
A total of 72 (12.8\%) studies used models published in articles that mentioned a known simulation reporting guideline or checklist.