Data Analysis with Python

The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis.

Arthritis Inflammation

We are studying inflammation in patients who have been given a new treatment for arthritis.

There are 60 patients, who had their inflammation levels recorded for 40 days. We want to analyze these recordings to study the effect of the new arthritis treatment.

To see how the treatment is affecting the patients in general, we would like to:

  1. Calculate the average inflammation per day across all patients.
  2. Plot the result to discuss and share with colleagues.

3-step flowchart shows inflammation data records for patients moving to the Analysis step
where a heat map of provided data is generated moving to the Conclusion step that asks the
question, How does the medication affect patients?

Data Format

The data sets are stored in comma-separated values (CSV) format:

The first three rows of our first file look like this:

0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1

Each number represents the number of inflammation bouts that a particular patient experienced on a given day.

For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study.

In order to analyze this data and report to our colleagues, we’ll have to learn a little bit about programming.

Prerequisites

You will need to have worked through the material presented in the Python Fundamentals course.

In addition, you will need to obtain the lesson materials:

  1. Download the data and the code.
  2. Create a folder called swc-python on your Desktop.
  3. Move downloaded files into this newly created folder.
  4. Unzip the files.

You should now see two new folders called data and code in your swc-python directory on your Desktop.

The commands in this lesson pertain to Python 3.

Getting Started

To get started, follow the directions on the “Setup” page to download data and install a Python interpreter.

Schedule

Setup Download files required for the lesson
00:00 1. Reading in Tabular Data How can I load tabular data files in Python?
00:50 2. Producing Plots How can I produce plots in Python?
01:40 3. Analyzing Data from Multiple Files How can I do the same operations on many different files?
02:05 4. Filtering Data How can I filter out bad data?
02:25 5. Tidying up the Analysis How can I facter the analysis code?
03:05 6. Command-Line Programs How can I write Python programs that will work like Unix command-line tools?
03:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.