TECHNOLOGY CORPORATE Machine Learning Methods on functional MRI Data Siemens AG Corporate Technology Dept. of Neural Computation (2003-2005) Team: Dr. Janaina Mourao-Miranda, Dr. Martin Stetter In Cooperation with : Dr. Arun Bokde, Ludwig Maxmilians University Information & Communications Neural Computation TECHNOLOGY CORPORATE Information & Communications Neural Computation Aim: Develop machine learning algorithms to train classifiers to detect differences in brain activity between two cognitive states or between groups of subjects (e.g. task 1 vs. task 2 or patients vs. healthy controls) f : single fMRI scan -> cognitive state or group membership

TECHNOLOGY CORPORATE Supervised Learning Input (e.g. brain scans): X1 X2 X3 Learning Methodology Learning/Training Training Examples: (X1, y1), (X2, y2), . . .,(Xn, yn) Test Example Xi Information & Communications Neural Computation f Output (e.g. patients vs. controls) y1 y2 y3

Generate a function or hypothesis f such that f(xi) -> yi Test Prediction f f(Xi) = yi TECHNOLOGY CORPORATE Information & Communications Neural Computation Example of Neuroimaging (brain scan) techniques: Computed Tomography (CT), Positron Emission Tomography (PET), Single Photon Emission Computed Tomography (SPECT), Structural Magnetic Resonance Imaging (MRI), Functional Magnetic Resonance Imaging (fMRI). Among other imaging modalities MRI/fMRI became largely used due to its low invasiveness, lack of radiation exposure, and relatively wide availability.

MRI studies brain anatomy. Functional MRI (fMRI) studies brain function. CORPORATE TECHNOLOGY MRI vs. fMRI Information & Communications Neural Computation Source: Jody Culhams fMRI for Dummies web site TECHNOLOGY CORPORATE Examples of brain scans MRI fMRI one image high resolution (1 mm)

many images (e.g., every 2 sec for 5 mins) Information & Communications Neural Computation low resolution (~3 mm but can be better) CORPORATE TECHNOLOGY fMRI: What it measures? Information & Communications Neural Computation Source: Arthurs & Boniface, 2002, Trends in Neurosciences When neurons fire in response to sensory or cognitive process a sequence of events happens resulting in an increase in local cerebral metabolism.

An increase in neural activity (and metabolism) causes an increased demand for oxygen. To compensate for this demand the vascular system increases the amount of oxygenated haemoglobin relative to the deoxygenated haemoglobin. fMRI measures changes in the Blood Oxygen Level Dependent (BOLD) signal due to changing in neural activity. TECHNOLOGY CORPORATE Information & Communications Neural Computation fMRI Setup CORPORATE TECHNOLOGY fMRI: relative measure During a standard fMRI experiment, hundreds of volumes or scans comprising brain activations at thousands of locations (voxels) are acquired.

Brain scans acquired during task 1 Information & Communications Neural Computation Brain scans acquired during task 2 Brain scan Brain scans acquired during task 1 3D matrix of voxels Brain scans acquired during task 2

time TECHNOLOGY CORPORATE Information & Communications Neural Computation fMRI Data Analysis The most popular method is the General Linear Model GLM (Friston et al.,1995) , in which a regression is performed on the signal value at a voxel in order to determine whether the voxels activity is related to one stimulus or cognitive state. Typical question: Which areas are related with one stimulus or cognitive state? Programs: SPM (FIL-UCL), AFNI (NIMH-NIH) TECHNOLOGY single singlevoxel voxel time timeseries series fMRI Data

Time CORPORATE Slice e m Ti Information & Communications Neural Computation fMRI time series Voxel Intensity BOLD signal Y = X + = CORPORATE Y Least

Leastsquares squares parameter parameterestimate estimate = (X X) X Y T -1 T + 1 x0 x1 Null Nullhypothesis: hypothesis: 1=0 t = /Std() Information & Communications Neural

Computation + 0 noise Intensity Time TECHNOLOGY Regression model: TECHNOLOGY CORPORATE Information & Communications Neural Computation Multivariate pattern recognition methods In these applications the fMRI scans are treated as spatial patterns and machine learning methods are used to identify statistical properties of the data that discriminate between brain states (e.g. task 1 vs. task 2) or group of subjects (e.g. patients

and controls). Each fMRI volume is treated as a vector in a extremely high dimensional space (~200,000 voxels or dimensions after the mask) CORPORATE TECHNOLOGY fMRI data as input to a classifier fMRI volume Information & Communications Neural Computation feature vector (dimension = number of voxels) TECHNOLOGY CORPORATE Machine Learning Approach on fMRI data Input ML - training

Volumes from task 1 Map: Discriminating regions between task 1 and task 2 Volumes from task 2 New example Information & Communications Neural Computation Output ML - test Prediction: task 1 or task 2 TECHNOLOGY CORPORATE Binary classification can be viewed as a task of finding a hyperplane Machine Learning (Training) fMRI scans from task 1 voxel 2

volume in t1 Hyperplane H: w.Xi+b=0 w.Xi+b<0 task 2 volume in t3 fMRI scans from task 2 w.Xi+b>0 task 1 volume in t2 volume in t1 volume in t2 volume in t4 volume in t4 w Test volume in t3 fMRI scan

from a new subjects Information & Communications Neural Computation where: Xi is an example (volume) w is a learning weight vector b is the offset volume from a new subject voxel 1 TECHNOLOGY CORPORATE First Approach: Fisher Linear Discriminant (FLD) voxel 2 X1 (t3 ) X 2 (t 2 ) thr m1 X1 (t1 )

w m2 X 2 (t 4 ) w voxel 1 FLD without correction w Information & Communications Neural Computation Projections onto the learning weight vector FLD with correction w TECHNOLOGY CORPORATE Fisher Linear Discriminant (FLD) 1. Compute the mean vector of each class (i=1,2 task or group). 2. Find a (normalized) weight vector between the two means.

3. Correct for the weight vector by the within-class covariance w := w 1 (m 2 m 1 ) m 2 m 1 (m 2 m 1 ) 1 Sw m 2 m 1 S w =cov(class1 ) + cov(class2 ) 4. Project each volume onto the weight vector yi =x ti w 5. Choose a threshold thr = (m2 + m1) / 2 and classify Information & Communications Neural Computation 1 Ni t m i = x i

N i t =1 TECHNOLOGY CORPORATE Optimal Hyperplane voxel 2 voxel 1 Which of the linear separators is optimal? A classifier that does very well on the training data might not generalize well to unseen examples. Information & Communications Neural Computation SVM selects from many possible solutions the most robust one (large margin classifier). TECHNOLOGY CORPORATE Largest Margin Classifier Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin (the distance of the closest points to the hyperplane). Given a training set with 6 examples: Let us consider that all test points are generated by adding bounded noise (r) to the

training examples. r Information & Communications Neural Computation If the optimal hyperplane has margin >r it will correctly separate the test points. Finding the optimal hyperplane is a quadratic optimization problem with linear constrains and can be formally stated as: Determine w and b that minimize the functional (w) = ||w||2/2 subject to the constraints yi[(wXi)+b] 1, i=1,,n CORPORATE TECHNOLOGY Second Approach: Support Vector Machine (SVM) The solution has the form: w = iyiXi b = wXi-yi for any Xi such that i 0 The examples Xi for which i > 0 are called the Support Vectors. w

d Xi Information & Communications Neural Computation Margin Support vectors Optimal hyperplane Data: , i=1,..,N Observations: Xi Rd Labels: yi {-1,+1} TECHNOLOGY CORPORATE How to interpret the weight vector w (Discriminating Volume)? 1 task1 task2

4 0.5 0.3 task1 2 task2 3 1 1.5 task1 task2 2.5 4.5 2 1 H: Hyperplane 3 2.5 2 1.5 voxel 1

1 0.5 w 0 0 1 2 3 4 5 voxel 2 Weight vector (Discriminating Volume) W = [0.45 0.89] Information & Communications Neural Computation 0.45 0.89

The value of each voxel in the weight vector indicates the importance of such voxel in discriminating between the two classes or brain states. TECHNOLOGY CORPORATE General Procedure Pre-processing: Realignment Normalization Smooth Split data: training and test Dimensionality Reduction (e.g. PCA) and/or feature selection (e.g. ROI) ML training and test Information & Communications Neural Computation Outputs: 1. Accuracy 2. Discriminating Maps (weight vector)

TECHNOLOGY CORPORATE Information & Communications Neural Computation Application 1 We used fMRI data from 16 healthy subjects and 16 MCI (Mild Cognitive Impairment) patients during two different experiments: Face Matching Experiment Location Matching Experiment Press button when faces are identical CORPORATE TECHNOLOGY Experiment Design I : Face Matching Time (Scans or Volumes) Instruction Control task Press button when image appears Information & Communications

Neural Computation Face matching task Press button when location of abstract images are different CORPORATE TECHNOLOGY Experiment Design II : Location Matching Time (Scans or Volumes) Instruction Control task Press button when image appears Information & Communications Neural Computation Location matching task TECHNOLOGY CORPORATE Data Description

Number of subjects: 16 First Experiment: Face matching task (3 blocks of 7 scans) x Control Task (3 blocks of 7 scans) Second Experiment: Location task (3 blocks of 7 scans) x Control Task (3 blocks of 7 scans) Pre-Processing Procedures Time shift correction, motion correction, normalization to standard space (MNI template) Correction for base line and the low frequency components. Mask to select voxels inside the brain. Leave one-out-test Machine Learning: 15 subjects Test: 1 subject This procedure was repeated 16 times and the results were averaged. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Error rate: the ratio of the number of data units in error to the total of data units. Information & Communications Neural Computation TECHNOLOGY CORPORATE Training Phase: . . .

21 volumes x 15 subjects = 315 volumes of task 1 Machine Learning 21 volumes x 15 subjects = 315 volumes of task 1 . . . volume with the most discriminative regions Test Phase: task 1 or task 2 Machine Learning 21 volumes of task 1 21 volumes of task 2 Information & Communications Neural Computation fMRI volume from a new subject

Projection of the volume X w TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects - FLD : test individual volumes Control task (negative x) and Face task (positive o) PCA & FLD: Learning weight vector Face task x Control task TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects - FLD : test individual volumes Control task (negative x) and Location task (positive o) PCA & FLD: Learning weight vector

Location task x Control task TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects - FLD : test individual volumes Location task (negative x) and Face task (positive o) PCA & FLD: Learning weight vector Face task x Location task TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects - SVM : test individual volumes Control task (negative x) and Face task (positive o) PCA & SVM: Learning weight vector Face task x Control task TECHNOLOGY CORPORATE

Information & Communications Neural Computation Healthy Subjects - SVM : test individual volumes Control task (negative x) and Location task (positive o) PCA & SVM: Learning weight vector Location task x Control task TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects - SVM : test individual volumes Location task (negative x) and Face task (positive o) PCA & SVM: Learning weight vector Face task x Location task TECHNOLOGY CORPORATE Information & Communications Neural Computation

Patients vs. Healthy Subjects - FLD : test individual volumes Face task: Healthy Subject (negative x) and Patient (positive o) Location task: Healthy Subject (negative x) and Patient (positive o) TECHNOLOGY CORPORATE Information & Communications Neural Computation Patients vs. Healthy Subjects - SVM : test individual volumes Face task: Healthy Subject (negative x) and Patient (positive o) Location task: Healthy Subject (negative x) and Patient (positive o) TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects error rate sensitivity

specificity Test FLD SVM FLD SVM FLD SVM Face X Control 0.26 0.13 0.72 0.85 0.76 0.90

Location X Control 0.18 0.17 0.71 0.78 0.93 0.88 Face X Location 0.47 0.27 0.45 0.76 0.62 0.70

Patients error rate sensitivity specificity Test FLD SVM FLD SVM FLD SVM Face X Control 0.34 0.25 0.45

0.71 0.87 0.79 Location X Control 0.47 0.27 0.45 0.76 0.62 0.70 Face X Location 0.35 0.30 0.70

0.72 0.60 0.68 TECHNOLOGY CORPORATE Information & Communications Neural Computation Healthy Subjects vs. Patients error rate sensitivity specificity Test FLD SVM FLD SVM

FLD SVM Face task 0.44 0.50 0.55 0.44 0.57 0.57 Location task 0.50 0.52 0.46 0.36

0.54 0.60 TECHNOLOGY CORPORATE Information & Communications Neural Computation Conclusions The classifiers were able to distinguish between tasks for both groups Face matching task vs. Control task Location matching task vs. Control task Face matching task vs. Location matching task The classifiers were not able to distinguish between the groups Location matching task: Healthy vs. Patients Face matching task: Healthy vs. Patients Which method is better? Using 5 subjects the results are similar for both classifiers. Using 16 subjects SVM presented better results than the FLD.