CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
adasegroup

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: adasegroup/NEUROML2022
Path: blob/main/seminar3/morphometry_classification_2022.ipynb
Views: 63
Kernel: Python 3 (ipykernel)

Open In Colab

MRI classification on morphometry data

1. Introduction

In this notebook we will do conventional morphometry analysis for gender-related morphometric sighns search.

Our goal will be to build a network for MEN and WOMEN brain classification, to explore gender influence on brain structure and find gender-specific biomarkers.

We will use the data from https://db.humanconnectome.org/data/projects/HCP_1200.

Proceeding with this Notebook you confirm your personal acess to the data. And your agreement on data terms and conditions.

from google.colab import drive drive.mount('/content/drive')

!!! Add a shortcut with "Add a shortcut" with the data repository from here https://drive.google.com/drive/folders/1GCIXnK6ly5l_LADanpLmvtZ6YbqPUamQ?usp=sharing

After adding a shotcut change this data dir to yours own files location in Google drive and give a path to seminars/anat folder:

data_dir = '/content/drive/My Drive/Skoltech Neuroimaging/NeuroML2020/data/seminars/anat/'

Importing the data

Importing unrestricted_freesurfer.csv dataset from https://db.humanconnectome.org/data/projects/HCP_1200.

import pandas as pd data = pd.read_csv(data_dir + 'unrestricted_hcp_freesurfer.csv')
data.head()

How to get this morpometry data inhouse?

  1. You should have T1 weighted MRI, or T1 with T2.

  2. Should have patients data in BIDS or converted in *.nii.

  3. Run fsdocker or standalon installed freesurfer 6.0.

  4. Wait 5-11 hours on 1 CPU per patient.

  5. Get freesurfer stats as output and convert them in table format.

The tutorial and data for table creation as well as data visualisation can be found here: https://github.com/kondratevakate/your-brain-mri-visualization

Defining the train and test data

In is a set of brain morphometry measures of healthy young adults. On the data we have two classification plobles to solve:

  • men/women classification.

  • age above 30n classification.

Choose your X (train_data) and y (train_targets) as pandas.DataFrame() or numpy.array():

X = data[data.columns[3:]] y = data[data.columns[1]] X.shape, y.shape
y.value_counts()

Let's change str values to binary classes. The easy but BAD way is:

y[y == 'F'] = 1 y[y == 'M'] = 0 y = y.astype(int) y.value_counts()

Let's do classical data analysis

Statistics:

  1. Pick up a test for two grups comparison (gaussian/ parametric or not?)

  2. Is it paired test or not?

  3. Choose a p-value and address the multiple comparisons.

from sklearn.feature_selection import SelectKBest X_new = SelectKBest(k=10).fit_transform(X, y)
X.columns[SelectKBest(k=10).fit(X, y).get_support()]
SelectKBest(k=10).fit(X, y).pvalues_[SelectKBest(k=10).fit(X, y).get_support()]

What do you know about multiple comparisons?

graph

bonf

Now we are doing ML!

Defining model grid search. Defining new object of Grid Cross Validation Linear class:

from sklearn.linear_model import LogisticRegressionCV
# will run for 1-2 minutes lr_cv = LogisticRegressionCV(max_iter = 1000, random_state = 42, n_jobs = -1) lr_cv.fit(X,y)
lr_cv.scores_

What else?

  1. Hyperparameter search and model optimisation. Comparison of model performance with statistical testing.

  2. Model interpretation

  3. Biomarkers stability

  4. Biomarkers statistical approval

How we can explore the found morphometry biomarkers from here?

lr_cv.coef_

How do we compare the performance of two classificational models?

Nice work with recommendations on models comparison: https://arxiv.org/pdf/1806.08295.pdf