Created a Decision Tree Classifier to predict which drug a new patient would respond to based on the data collected, or predict the condition a patient has based on the drug that worked for them.
ubuntu2204
Kernel: Python 3 (system-wide)
In [3]:
Importing all necessary libraries
In [4]:
In [5]:
Out[5]:
Pre-processing
I declared our variables as the following;
X = Feature Matrix, y = response vector
In [11]:
Out[11]:
array([[23, 'F', 'HIGH', 'HIGH', 25.355],
       [47, 'M', 'LOW', 'HIGH', 13.093],
       [47, 'M', 'LOW', 'HIGH', 10.114],
       [28, 'F', 'NORMAL', 'HIGH', 7.798],
       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)
I also created labels for the data so we can still evaluate categorical data to form our Decision Tree
In [12]:
Out[12]:
array([[23, 0, 0, 0, 25.355],
       [47, 1, 1, 0, 13.093],
       [47, 1, 1, 0, 10.114],
       [28, 0, 2, 0, 7.798],
       [61, 0, 1, 0, 18.043]], dtype=object)
In [13]:
Out[13]:
0    drugY
1    drugC
2    drugC
3    drugX
4    drugY
Name: Drug, dtype: object
With the variables established, I set up the Decision Tree to determine which drug would be the best option
In [14]:
In [15]:
In [18]:
Out[18]:
Shape of X training set (140, 5) &  Size of Y training set (140,)
Shape of X testing set (60, 5) &  Size of Y testing set (60,)
In [19]:
Out[19]:
In [20]:
Out[20]:
In [21]:
In [22]:
Out[22]:
['drugY' 'drugX' 'drugX' 'drugX' 'drugX']
40     drugY
51     drugX
139    drugX
197    drugX
170    drugX
Name: Drug, dtype: object
In [23]:
Out[23]:
DecisionTrees's Accuracy:  0.9833333333333333
In [26]:
Out[26]:
In [0]: