Week 2 Application Assignment of Predictive Modeling and Analytics
Week 2 Application Assignment of Predictive Modeling and Analytics
Question 1
In this assignment we are continuing to work with customer reward programs (review assignments from Week 1 if you haven’t completed them). The data is in the file
In this exercise, you will complete a predictive modeling task where the target variable is continuous based on the data in the shared file. First remove all rows where either the Reward or NumStores column takes the value 0. Also remove all rows where the rewards do not expire (ExpirationMonth=999). [Hint: You can sort the relevant columns to quickly find the rows to delete. ] How many rows are left after deleting these irrelevant rows, not counting the header row? What is the sum of the ExpirationMonth column?
Question 2
Consider linear regression models with ExpirationMonth column as the target variable. Find the model with one predictor variable and the highest R-squared. Consider the following set of predictor variables: Salerank, X2013USSales, X2013WorldSales, NumStores,RewardSize, and ProfitMargin. Which variable did you choose?
Question 3
What is the estimated intercept coefficient of the model?
Question 4
What is the estimated slope coefficient of the model?
Question 5
Data transformation is a great way to improve model fit. Now consider the log transformation for the model identified in the previous question. [Hint: Use log function to create the transformed columns.] You can choose to transform neither of them, one of them, or both of them. You should have four different models.
Report the R-squared values of all four models.
What is the R-squared for Model 1?
Question 6
R-squared for Model 2 is ( report answer using 4 decimal places i.e. ):
Question 7
R-squared for Model 3 is ( report answer using 4 decimal places i.e. ):
Question 8
R-squared for Model 4 is ( report answer using 4 decimal places i.e. ):
Question 9
Which model gives the best fit based on the R-squared value?
Question 10
Our analysis so far shows that variable transformation does not improve the model fit. Another way to improve model fit is to add more explanatory variables on the right side. Again consider the following set of predictor variables: Salerank, X2013USSales, X2013WorldSales, NumStores, RewardSize, and ProfitMargin. Add one more variable to the best model you identified in the previous question. Which variable will you add? Hint: The correct additional variable gives the highest R-squared value.
Question 11
What is the R-squared for the model with the additional variable added ( report answer using 4 decimal places i.e. )?
Question 12
One way to figure out whether a linear regression model explains a particular data point well is to look at the residual. For which retailer do you have the highest absolute value of residual based on your result in the previous question?
Question 13
For which retailer do you have the lowest residual based on your result in the previous question?
Week 3 Quiz of Predictive Modeling and Analytics 1. Question 1 A soccer team is believed to have a 8 to 2 odds of winning the election. What is…
Week 3 Application Assignment of Predictive Modeling and Analytics 1. Question 1 Let’s reconsider the customer reward program dataset. In this exercise, you will complete a predictive modeling task…
Week 4 Application Assignment of Predictive Modeling and Analytics 1. Question 1 Let’s once again consider the customer reward program dataset. For your convenience, here the original data set. In…
Week 4 Quiz of Predictive Modeling and Analytics 1. Question 1 Consider the following split in the appointment data. What is the Gini index for the branch Age<65.5? 1…
Week 1 Quiz of Predictive Modeling and Analytics 1. Question 1 At what stage(s) of Data Exploration would you address missing values in a data set? 1 point Data…
Week 2 Quiz of Predictive Modeling and Analytics 1. Question 1 Which type of target variable are we dealing with in linear regression? 1 point Binary Categorical Continuous Imaginary…