Participants should leave this workshop with:
Our aim is to provide a practical introduction to working with hospital data to explore clinical questions. Over the course of the workshops we will be stepping through a simplified version of a study that explores the relationship between arterial catheters and mortality in patients with respiratory failure using the used the MIMIC dataset.
In this first workshop we will prepare all of the software necessary to complete the course. Most importantly you will be asked to:
MIMIC-III is a clinical database developed by the MIT Laboratory for Computational Physiology, in collaboration with Beth Israel Deaconess Medical Center. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).
The MIMIC-III website
MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors:
We will begin by installing a demo version of MIMIC-III, which includes data for a 100 patients and excludes the noteevents
table.
The following sections will guide you through the major steps in setting up your environment. We have also prepared operating system specific guides, which we recommend you open in a separate tab.
MIMIC-III is a relational database, so requires the installation of a relational database system. In this workshop we will be using PostgreSQL. You may already have PostgreSQL installed. If not, please follow the instructions on the PostgreSQL website for download and installation: https://www.postgresql.org/download/
While it is possible and often desirable to explore PostgreSQL databases using command line tools, we will be using a graphical tool in the workshops for simplicity. Please download and install PgAdmin3:
Exploring MIMIC using PgAdmin3
We will install a demo version of the MIMIC-III dataset available on PhysioNet. The demo includes data for 100 patients and excludes the noteevents
table. To download the data you will need to complete the following steps:
Once the data has been downloaded, open PgAdmin3 and follow the steps below to load the data into your PostgreSQL database:
We will be analysing data with R, a popular open source language. To install R, follow the instructions on the following page: https://cran.rstudio.com/
RStudio is a graphical interface for R. To install R Studio, follow the instructions at: https://www.rstudio.com/
Part of the reason that R has become so popular is the vast number of packages that have been written to support research. We will be using the following packages during the course:
You should install these packages by opening up RStudio and running the following command in the console for each of the packages:
install.packages("PACKAGENAME")
Plain text editors are often helpful during data analysis, for example when writing code and when reviewing data. You may already have a favourite text editor. If so, stick with that. If not try one of the following:
Atom, a plain text editor.
Git is a powerful and increasingly popular version control system. We’ll learn more about it during the course. To install Git, follow the instructions at: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git (NB: Windows users should install “Git BASH”)
Github is a website that interfaces with Git. It is useful for backing up and sharing code, and working collaboratively on research projects. If you don’t already have an account, create one at: https://github.com/
We encourage everyone to take efforts to make work reproducible and highly recommend reading “Good enough practices in scientific computing” by Greg Wilson et al. At the very least you’ll be making life easier for your future self.
If you’ve got to this point and all of the software above has been installed, then you are ready for the workshops! If you get stuck and need help, there are several facilitators on hand to help out.
MIMIC-III, although de-identified, still contains detailed information regarding the clinical care of patients, so must be treated with appropriate care and respect. If you have not already done so, your homework will be to request access to the full MIMIC-III dataset by following the instructions at: http://mimic.physionet.org/gettingstarted/access/.
We will be working through a simplified version of a study that explores the relationship between arterial catheters and mortality in patients with respiratory failure, so please familiarise yourself with this paper.
Once your application to access MIMIC-III has been approved, you will be granted permission to view the ‘MIMIC-III Clinical Database’ project page on PhysioNet: http://mimic.physionet.org/gettingstarted/dbsetup/. MIMIC-III is provided as a collection of comma-separated (CSV) files, which can be imported into a database system such as Postgres.
Tutorials for installing MIMIC in a local Postgres database are provided on the MIMIC-III website for Mac OSX, Unix, and MS Windows systems. For more detail, see the following links: