ubuntu2004
COSC 130 - Project 01
Kyle Anderson
The goal is to find a Linear regression to create a linear model that can be used to estimate the natural logarithm of average MPG for a vehicle based on the vehicle's weight based on the following:
• The weight of the vehicle, measured in pounds.
• The average miles per gallon (MPG) for the model.
• The natural logarithm of average MPG for the model.
Part 1: Importing and Viewing the Data
The first tasks will be to import and view the data.
Each list contains as required, all 398 values
First 10 vehicles on the list
We will now create two scatter plots
Observations
Notice that the relationship between MPG and weight in the first scatter plot seems to have a slight bend or curve, whereas the relationship between log-MPG and weight appears to be mostly linear. Since we will be constructing a linear model, we will use log-MPG as the response variable in our model.
Part 2: Splitting the Data
We will now be splitting the data into training and test sets.
Create Scatter Plots
Part 3: Descriptive Statistics
We will start by calculating the mean of the 𝑋 values (which represent weight), and the mean of the 𝑌 values (which represent log-MPG).
Calculating 𝑆𝑥𝑥 and 𝑆𝑦𝑦.
Calculating the variance of the training values of 𝑋 and 𝑌.
Part 4: Linear Regression Model
We will calculate 𝑆𝑋𝑌, which we will then use to find the coefficients for our linear regression model.
We will now be calculating the coeffecients of our model.
Part 5: Training Score
We will be calculating the training r-squared score, and that we will start by calculating estimated response values for the training set.
We will now calculate the residuals for the training set.
We will be displaying the values mentioned above.
We will now calculate the sum of squared errors score for the training set.
We will now calculate the r-squared score for the training set.
Part 6: Test Score
We will be calculating the test r-squared score, and that we will start by calculating estimated response values for the test set.
We will now calculate the residuals for the test set.
We will be displaying the values mentioned above.
We will now calculate the sum of squared errors score for the test set.
We will now calculate the value of 𝑆𝑌𝑌 on the test set, and will then use that and the test sum of squared errors to calculate the test r-squared score.
We will now create a plot to visualize the errors for the observations in the test set.
Part 7: Transforming Test Predictions
We will be calculating estimates for the average MPG for observations in our test set.
We will now calculate the error in each estimate for the average MPG.
We will now display the true MPG, the estimated MPG, and the estimation error for each of the first 10 observations in the test set.