CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
adasegroup

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: adasegroup/NEUROML2022
Path: blob/main/seminar1/hw1-baseline.ipynb
Views: 63
Kernel: Python 3

HW1 - baseline

Open In Colab

In this notebook we handle the homework data in order to predict motion over rest using EEG

# For Colab only !pip install mne !wget https://raw.githubusercontent.com/adasegroup/NEUROML2020/seminar1/seminar1/train.csv !wget https://raw.githubusercontent.com/adasegroup/NEUROML2020/seminar1/seminar1/test.csv
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from mne.time_frequency import psd_array_multitaper from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score from sklearn.manifold import TSNE %matplotlib inline
df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv')
df_train.head()
ch_names = df_train.columns[3:]
epochs = df_train['epoch'].unique()
epochs
array([ 0, 2, 6, 8, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 58, 60, 61, 62, 64, 65, 66, 67, 68, 69, 74, 77, 79, 80, 81, 86, 87, 88, 89, 90, 91, 93, 95, 96, 97, 99, 101, 102, 105, 107, 109, 110, 111, 113, 115, 117, 118, 126, 127, 128, 129, 131, 134, 135, 136, 137, 138, 139, 141, 142, 143, 144, 145, 147, 151, 152, 154, 155, 156, 157, 158, 159, 160, 162, 164, 166, 167, 169, 171, 172, 173, 174, 175, 176, 177, 181, 182, 184, 185, 187, 192, 193, 194, 196, 197, 200, 201, 202, 204, 205, 210, 212, 216, 217, 221, 222, 223, 225, 226, 227, 228, 230, 231, 233, 234, 235, 237, 239, 240, 241, 244, 245, 246, 248, 250, 253, 254, 255, 261, 262, 263, 265, 268, 269, 270, 276, 277, 279, 281, 283, 285, 287, 290, 292, 293, 294, 297, 298])
def get_target(df): return df.drop_duplicates('epoch')[['epoch', 'condition']].reset_index(drop=True)

Idea for feature engineering

df_train[df_train['condition'] == 1].groupby('time')['F4'].mean().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f1b9d283a90>
Image in a Jupyter notebook
df_train[df_train['condition'] != 1].groupby('time')['F4'].mean().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f1b9d0cbf10>
Image in a Jupyter notebook
def calc_features(df): feats = [] for epoch_idx, epoch_df in df.groupby('epoch'): epoch_df = epoch_df[ch_names] psds, freqs = psd_array_multitaper(epoch_df.T.values, 160, verbose=False) total_power = psds.sum(axis=1) idx_from = np.where(freqs > 13)[0][0] idx_to = np.where(freqs > 25)[0][0] b_pwr = psds[:,idx_from:idx_to].sum(axis=1) / total_power d = {} d['epoch'] = epoch_idx for ch in ch_names: s = epoch_df.iloc[40:][ch] val = (s > 5).sum() d[ch.lower() + '_p300'] = val feats.append(d) feats_df = pd.DataFrame(feats) return feats_df

Common ML workflow

X = get_target(df_train) X = X.merge(calc_features(df_train), on='epoch') y = X['condition'].apply(lambda x: 0 if x == 1 else 1) del X['epoch'] del X['condition']
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)
scaler = StandardScaler() X_train_sc = scaler.fit_transform(X_train) X_test_sc = scaler.transform(X_test)
model = LogisticRegression(C=1)
model.fit(X_train_sc, y_train)
LogisticRegression(C=1)
y_pred_train = model.predict_proba(X_train_sc)[:, 1] roc_auc_score(y_train, y_pred_train)
0.7982700892857142
y_pred = model.predict_proba(X_test_sc)[:, 1]
roc_auc_score(y_test, y_pred)
0.7171945701357466

Visualize t-SNE

X_embedded = TSNE(n_components=2).fit_transform(X)
plt.scatter(X_embedded[np.where(y == 0), 0], X_embedded[np.where(y == 0), 1]) plt.scatter(X_embedded[np.where(y == 1), 0], X_embedded[np.where(y == 1), 1])
<matplotlib.collections.PathCollection at 0x7f1b9c548e90>
Image in a Jupyter notebook

Build submission

scaler = StandardScaler() X_sc = scaler.fit_transform(X)
model.fit(X_sc, y)
LogisticRegression(C=1)
X_test = calc_features(df_test) submission = X_test[['epoch']].copy() del X_test['epoch'] X_test_sc = scaler.transform(X_test)
y_pred = model.predict_proba(X_test_sc)[:, 1]
submission['Predicted'] = y_pred
submission['Id'] = submission['epoch'] del submission['epoch']
submission.to_csv('baseline_submission.csv', index=False)