联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codehelp

您当前位置:首页 >> Java程序Java程序

日期:2020-08-26 08:29

Assignment 1
Student name and ID
Due: 23 August 2020
The goal of the assignment is to model the sales price of a bulldozer at auction based on its usage, equipment
type, configuration and other details. The data is sourced from auction result postings and includes information
on usage and equipment configurations.
This assignment is based on a prediction competition that took place in 2013 at kaggle.com/c/bluebook-forbulldozers.
We will only use a subset of the variables used in the competition.
Data for 401125 machines sold at auction are available including the following variables:
? ID (unique identifier of a particular sale of a machine at auction);
? Sale price: cost of sale in $US;
? Year made: year of manufacture of the machine;
? Machine hours: current usage of the machine in hours at time of sale; missing or 0 means no hours
have been reported for that sale;
? Sale date: year of sale;
? Product group: type of earth moving equipment;
? Enclosure: does machine have a roll-over protective structure (ROPS) and air conditioning.
Your task is to build a regression model for the sale price using the other information available. Only use
variables where at least 90% of observations have non-missing values.
Feel free to work in groups if you prefer. But you all need to submit individual reports.
To read in the data set (3 marks):
library(tidyverse)
bds <- readr::read_csv("bds.csv")
1. Use ggplot() to produce appropriate plots of Sale Price against each of the potential predictors. What
do you learn? (3 marks)
2. Use lm() to fit a regression model for SalePrice with the predictors as main effects. (1 mark)
3. Find the best model you can (with smallest AIC) by adding interactions, and discuss what it tells you
about bulldozer sales prices. (4 marks)
4. Use visreg to visualize the terms in the model, and describe what you learn from these. (4 marks)
5. Produce suitable diagnostic plots to check the model fit, and identify unusual or influential observations.
Comment on the results. (3 marks)
You should submit a single Rmd file that is self-contained and compiles without error. You can assume that
the bds.csv file is in the same folder as the Rmd file. (2 marks)
The assignment has 20 marks in total.
1

版权所有:留学生程序辅导网 2019 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。