Created a Decision Tree Classifier to predict which drug a new patient would respond to based on the data collected, or predict the condition a patient has based on the drug that worked for them.
ubuntu2204
Kernel: Python 3 (system-wide)
In [3]:
Importing all necessary libraries
In [4]:
In [5]:
Out[5]:
Pre-processing
I declared our variables as the following;
X = Feature Matrix, y = response vector
In [11]:
Out[11]:
array([[23, 'F', 'HIGH', 'HIGH', 25.355],
[47, 'M', 'LOW', 'HIGH', 13.093],
[47, 'M', 'LOW', 'HIGH', 10.114],
[28, 'F', 'NORMAL', 'HIGH', 7.798],
[61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)
I also created labels for the data so we can still evaluate categorical data to form our Decision Tree
In [12]:
Out[12]:
array([[23, 0, 0, 0, 25.355],
[47, 1, 1, 0, 13.093],
[47, 1, 1, 0, 10.114],
[28, 0, 2, 0, 7.798],
[61, 0, 1, 0, 18.043]], dtype=object)
In [13]:
Out[13]:
0 drugY
1 drugC
2 drugC
3 drugX
4 drugY
Name: Drug, dtype: object
With the variables established, I set up the Decision Tree to determine which drug would be the best option
In [14]:
In [15]:
In [18]:
Out[18]:
Shape of X training set (140, 5) & Size of Y training set (140,)
Shape of X testing set (60, 5) & Size of Y testing set (60,)
In [19]:
Out[19]:
In [20]:
Out[20]:
In [21]:
In [22]:
Out[22]:
['drugY' 'drugX' 'drugX' 'drugX' 'drugX']
40 drugY
51 drugX
139 drugX
197 drugX
170 drugX
Name: Drug, dtype: object
In [23]:
Out[23]:
DecisionTrees's Accuracy: 0.9833333333333333
In [26]:
Out[26]:
In [0]: