Assignment 4
Due Saturday 11:59 pm (Week 10)
In this assignment, you will be required to do research about Decision tree Regressor
and other common regressions such as Ridge Regression, Lasso Regression, and Logistic
Regression.
Like Assignment 3, you will need to do the research about these regression models, but
you don’t need to find their mathematical formulas. You are only required to know what
they are, how and when to use them.
1. Research (20 points)
You will need to find the answers for the following questions in order to help you
understand how these regressions work. Write your answers for each question in
your write-ups.
• What is Decision tree Regressor?
• What is the difference between Decision Tree Regressor and Decision Tree
Classifier.
• What is the feature importance in Decision Tree Regressor?
• What is Ridge Regression?
• What is Lasso Regression?
• What is Logistic Regression?
2. Use the Boston housing data again (Assignment 2). Since we have done EDA of this
dataset in assignment 2, it will save us a lot of time so that we can focus on applying
each regression that we discussed above. (50 points, each regression counts as 10
points.)
• For Linear Regression, Ridge Regression, Lass Regression, and Logistic
Regression, find the correlations for all the independent variables and
dependent variables. Select the feature variables that correlate to the price of
the house. (To use the logistics model, you may have to separate the price of the
house into low, medium, and high).
For Decision Tree Regressor, we will use all features to predict the price of the
Boston house price.
• Apply Linear Regression, Ridge Regression, Lass Regression, Logistic Regression,
and Decision Tree Regressor to the data. Your assignment should have at least 5
models.
• Comparing the MSE, RMSE, and its accuracies.
• Choose the model(s) that you think appropriate and predict the house price.
• Only for Decision Tree Regressor, do the tree visualization, and plot the feature
importance, find which feature has the highest importance, and which feature is
the second highest importance.
• Interpret the results.
General Requirements for all your assignments.
You will need to write up your findings, interpretations, and results (30 points) for this
assignment. Use the Machine Learning Workflow of Week 6 as a guideline for your
assignment. It will be a great idea to screenshot your codes, results, and graphs so that
you can explain your findings along with them. (It is also easier for me to follow you
when I read your paper). A pdf file is required. There is no page limit but try to be
straightforward with your answers.
The py file that you have used to finish your assignment. (It may be a duplicate or
somewhat duplicate of the screenshots that you have inserted in your paper but that is
okay. I would like to look over your codes.)