Synopsis
In this report we are interested to know if automatic or manual transmission is better for MPG using the mtcars dataset and to quantify this result. The complication is that other variables also affect the MPG. In our best linear regression model, we see that weight and \(\frac{1}{4}\) mile time influence the MPG and therefore transmission alone cannot be used to determine the better MPG.
Exploratory Data Analysis
We first look at the names from the dataset. We are interested in the mpg
and am
(transmission) variables for our exploratory analysis. However, we will look at the other variables for our regression model fitting.
A visual representation of the data helps us understand the transmission type effect on MPG. Please see Figure 1 above. Looking at the plot, we can clearly see that manual transmission offers higher MPG. If we aggregate the data by transmission type, we get the mean MPG for each:
Automatic transmission is 17.15 and Manual transmission is 24.39.
A Welch two sample t-test is performed for statistical inference and uncertainty purposes. The results show there is a 95% confidence interval of [-11.28, -3.21] when examining mpg
and am
. The negative intervals indicate a decrease in MPG for automatic transmission since the means were subtracted as automatic – manual. We also see that \(p=\) 0.0013736 \(<0.05\), so our null hypothesis (\(H_0\)) stating there is no difference between the transmissions on MPG is rejected. This makes sense since we calculated the means showing manual transmission having more MPG over automatic.
Regression Model Fitting
We would like to develop some linear regression models to help us quantify the transmission type effect on the MPG. Please see the table below for the regression results. Below are the three models that were tested with mpg
as the outcome and differing predictors.
Model1 <- lm(mpg ~ am, data=mtcars)
Model2 <- step(lm(mpg ~ ., data=mtcars))
Model3 <- lm(mpg ~ am:wt+am:qsec, data=mtcars)
This first model indicates that for manual transmission, MPG is on average 7.24 more than automatic. The p-values are very small to also indicate the difference. However, we want to find out which other variables are effecting mpg
. To do this, we perform a stepwise algorithm by AIC on a multivariate linear model with all variables as the predictors. This returns an optimized model.
This second model indicates that wt
(weight (lb/1000)) and qsec
(¼ mile time) also have a significant effect on the MPG. The adjusted \(R^{2}=\) 0.83 (percent of total variability explained by this model) and all \(p-value<0.05\) (\(H_0\) rejected) as indicated in the table. We now want to see if interactions can improve our linear model by having wt
and qsec
interact with am
.
This third and best model improves on the second model by increasing our adjusted \(R^{2}=\) 0.88 and decreasing all \(p-value<0.01\) instead of \(<0.05\).
Dependent variable: | |||
mpg | |||
(1) | (2) | (3) | |
wt | -3.917^{***} | ||
(0.711) | |||
qsec | 1.226^{***} | ||
(0.289) | |||
amManual | 7.245^{***} | 2.936^{**} | |
(1.764) | (1.411) | ||
amAutomatic:wt | -3.176^{***} | ||
(0.636) | |||
amManual:wt | -6.099^{***} | ||
(0.969) | |||
amAutomatic:qsec | 0.834^{***} | ||
(0.260) | |||
amManual:qsec | 1.446^{***} | ||
(0.269) | |||
Constant | 17.147^{***} | 9.618 | 13.969^{**} |
(1.125) | (6.960) | (5.776) | |
Observations | 32 | 32 | 32 |
R^{2} | 0.360 | 0.850 | 0.895 |
Adjusted R^{2} | 0.338 | 0.834 | 0.879 |
Residual Std. Error | 4.902 (df = 30) | 2.459 (df = 28) | 2.097 (df = 27) |
F Statistic | 16.860^{***} (df = 1; 30) | 52.750^{***} (df = 3; 28) | 57.284^{***} (df = 4; 27) |
Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |
We look at the ANOVA table below and see that \(p-value<0.05\) for the best model, so we can reject the null hypothesis, \(H_0\). We also make residual diagnostic plots of our selected best model (3) as noted in Figure 2 below. In each plot we do not see any pattern that would be of concern, therefore our model is credible.
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 30 | 720.90 | ||||
2 | 28 | 169.29 | 2 | 551.61 | 62.74 | 0.0000 |
3 | 27 | 118.70 | 1 | 50.59 | 11.51 | 0.0022 |
Conclusions
We select our multivariate third model as our best. The uncertainty in our conclusion can be quantified. The adjusted \(R^{2}=\) 0.88, a percentage of total variability explained by this model, is relatively high as we would like. The \(p-value<0.01\) for each variable, allowing us to reject our null hypothesis, \(H_0\). The coefficients quantify our results. In regards to \(\frac{1}{4}\) mile time, qsec
, every 1sec increase resulted in 0.834 and 1.446 increase in MPG for automatic and manual transmission, respectively. In regards to weight, wt
, every 1000lb increase resulted in 3.176 and 6.099 decrease in MPG for automatic and manual transmission, respectively. Therefore, we can conclude that transmission alone cannot be used to determine the better MPG for this specific dataset. Weight and \(\frac{1}{4}\) mile time also play a significant role in determining MPG.