Week 3 Application Assignment of Predictive Modeling and Analytics
Week 3 Application Assignment of Predictive Modeling and Analytics
Question 1
Let’s reconsider the customer reward program dataset. In this exercise, you will complete a predictive modeling task where the target variable is binary. Using the following data file for this exercise:
The dataset also contains a column IndustryType, which is created based on the column Industry in the raw data. Note that Industry has many categories. The analyst who prepared the data chose to combine some categories, which resulted in the column IndustryType. IndustryType has five categories: Department, Discount, Grocery, Restaurants, Specialty. You can create a set of dummy variables based on IndustryType in XLMiner by using the Transform functions.
Part I.
Consider logistic regression models with Reward column as the target variable. Fit the model with two indicator variables, one indicating whether a retailer is a discount store (i.e., IndustryType is Discount), and the other indicating whether a retailer is a grocery store (i.e., IndustryType is Grocery). Report the coefficient estimates in the next three questions. [Hint: After you create the dummy variables, use them as Selected Variables (instead of Categorical Variables) in the first step of Logistic Regression.]
What is the estimated intercept coefficient?
Question 2
What is the estimated coefficient for IndustryType_Discount (round the answer to 4 decimal places i.e. ) ?
Question 3
What is the estimated coefficient for IndustryType_Grocery (round the answer to four decimal places i.e. ) ?
Question 4
What is the number of true positives? (Specify a whole number.)
Question 5
What is the number of true negatives? (Specify a whole number.)
Question 6
Part II.
Split the dataset into training and validation sets using a 60:40 split (set the seed for partitioning to 12345; this should be the default value if you have not changed it). [Hint: note that there two Partition buttons in XLMiner ribbon. You should use the Partition->Standard Partition in the Data Mining group.] Report the new coefficient estimates in the next three questions. Use the same two predictor variables as in Part I.
What is the estimated intercept coefficient (round the answer to 4 decimal places i.e. ) ?
Question 7
What is the estimated coefficient for IndustryType_Discount (round the answer up to 4 decimal places i.e. ) ?
Question 8
What is the estimated coefficient for IndustryType_Grocery (round the answer to 4 decimal places i.e. ) ?
Question 9
How many observations are in the training set?
Question 10
What is the number of true positives on the validation data? (Specify a whole number.)
Question 11
What is the number of true negatives on the validation data? (Specify a whole number.)
Question 12
(Part 3) By default, XLMiner uses the cutoff threshold 0.5. Repeat Part II with a cutoff threshold 0.3. What are the numbers of true positives and true negatives on the validation data?
Report the number of true positives:
Question 13
Report the number of true negatives:
Week 1 Quiz of Predictive Modeling and Analytics 1. Question 1 At what stage(s) of Data Exploration would you address missing values in a data set? 1 point Data…
Week 3 Quiz of Predictive Modeling and Analytics 1. Question 1 A soccer team is believed to have a 8 to 2 odds of winning the election. What is…
Week 2 Quiz of Predictive Modeling and Analytics 1. Question 1 Which type of target variable are we dealing with in linear regression? 1 point Binary Categorical Continuous Imaginary…
Week 2 Application Assignment of Predictive Modeling and Analytics 1. Question 1 In this assignment we are continuing to work with customer reward programs (review assignments from Week 1…
Week 4 Quiz of Predictive Modeling and Analytics 1. Question 1 Consider the following split in the appointment data. What is the Gini index for the branch Age<65.5? 1…
Week 4 Application Assignment of Predictive Modeling and Analytics 1. Question 1 Let’s once again consider the customer reward program dataset. For your convenience, here the original data set. In…