Second Annual Data Science Bowl – Part 2

In Part 1 of this blog series, I described how to fix the brightness and contrast of the MRI images. In this blog we finish cleaning up the input data.

Other than brightness and contrast, we need to fix up the following problems:

  • different image sizes

  • different image rotations – portrait versus landscape

  • different pixel spacing

  • different short axis slice spacing

  • image sets from duplicate locations

  • sax sets aren’t in the same order as their locations

  • some sax image sets have multiple locations

There are approximately a dozen different image sizes, include rotated images. Not all of the image sizes scale to the same 4:3 aspect ratio that is the most common across the training set. Some of the machine learning algorithms I used later need fixed dimension images, so I compromised and decided to use a standard sizing of 256 x 192 pixels portrait aspect ratio. This meant that I wasn’t always scaling the x-axis and the y-axis by the same amount, and occasionally I was even upscaling an image.

Note that I used matrix transpose to rotate the image 90 degrees. I could also have used the rotate function in EBImage. Either way could work, but I felt that matrix transpose would be a faster operation.

The DICOM images contain information about image slice location, pixel spacing (how far apart are adjacent pixel centroids), and slice spacing (how far apart are adjacent sax slices).

I’m not a medical expert. So when there are images from duplicate locations, I don’t know which image sets are the best to use. Therefore I just assumed that the medical specialist repeated the MRI scans until the image quality was adequate. This meant that I chose the image set with the latest time stamps (from the header information inside the DICOM files, not the file system date).

Then I just

  1. searched for DICOM images in all of the subfolders for each patient

  2. filtered to use only sax slices, ignoring those images that did not come from a folder that contained the substring “sax”

  3. read the DICOM header information from each image to find its location and time

  4. searched for the high time stamp for each location and kept that image

  5. wrote out a new folder structure where sax_01, sax_02, … were the sax slice image sets for that patient, order by their location along the long axis of the heart

The R script to do this isn’t too difficult. I’ve pasted it below:

I stored all of the DICOM header information in a table. Some of this will be required later, when calculating the volume of the left ventricle chamber.

The next step, which will be described in my next blog, is to design a convolutional neural network that will automatically find the left ventricle in an image.

Like this:

Like Loading…

Related