FinalAssignment-Regressions1.pdf

By beginning of this assignment, you affirm that you will not give or receive any unauthorized help, and that all work will be your own. You agree to abide by Seneca's Academic Integrity Policy, and you understand any violation of academic integrity will be subject to the penalties outlined in the policy.

Problem 1 (35 % marks) File: MALL. XLS

A national chain of women’s clothing stores with locations in the large shopping malls thinks that it can do a better job of planning more renovations and expansions if it understands what variables impact sales. It plans a small pilot study on stores in 25 different mall locations. The data it collects consist of monthly sales, store size (sq. ft), number of linear feet of window display, number of competitors located in mall, size of the mall (sq. ft), and distance to nearest competitor (ft).

1. Define a multiple regression model for the data. (6 marks) 2. Interpret the values of the coefficients in the model. (15 marks) 3. Test whether the model as a whole is significant. At the 0.05 level of significance,

what is your conclusion? (2 marks) 4. Use the model to predict monthly sales for each of the stores in the study. (6

marks) 5. Find and interpret the value of 𝑅2for this model. (2 marks) 6. Test the individual regression coefficients (i.e., check the result of test statistics

that SAS or Excel provides). At the 0.05 level of significance, what are your conclusions? (2 marks)

7. If you were going to drop just one variable from the model, which one would you choose? Why? (2 marks)

Problem 2 (35%) – File: Bank. xlsx

Community Bank would like to increase the number of customers who use payroll deposit. Management is considering a new sales campaign that will require each branch manager to call each customer who does not currently use payoff direct deposit. As an incentive to sign up for payroll direct deposit, each customer contacted will be offered free checking for two years. Because of the time and cost associated with the new campaign, management would like to focus their efforts on customers who have the highest probability of signing up for payroll direct deposit. Management believes that the average monthly balance in a customer’s checking account may be useful predictor of whether the customer will sign up for direct payroll deposit. To investigate the relationship between these two variables, Community Bank tried the new campaign using a sample of 50 checking account customers who do not currently use payroll direct deposit. The sample data show the average monthly checking account balance (in hundreds of dollars) and whether the customer contacted signed up for payroll direct deposit (coded 1 if the customer signed up for payroll direct deposit and 0 if not).

1. For the Community Bank data, use SAS to formulate the estimated logistic regression equation. (5 marks)

2. Estimate the probability that customers with an average monthly balance of $1000 will sign up for direct payroll deposit. (5 marks)

3. Suppose Community Bank only wants to contact customers who have a 0.50 or higher probability of signing up for direct payroll deposit. What is the average monthly balance required to achieve this level of probability? (10 marks)

4. What is the estimated odds ratio? What is the interpretation? (15 marks)

Problem 3 (30%) – File: Lakeland. xlsx

Over the past few years, the percentage of students who leave Lakeland College at the end of the first year has increased. Last year Lakeland started a voluntary one-week orientation program to help first-year students adjust to campus life. If Lakeland can show that the orientation program has a positive effect on retention, they will consider making the program a requirement for all first-year students. Lakeland’s administration also suspects that students with lower GPAs have a higher probability of leaving Lakeland at the end of the first year. To investigate the relation of these variables to retention, Lakeland selected a random sample of 100 students from last year’s entering class. The data are contained in the data set named Lakeland.

1. Write the logistic regression equation relating xto y. (5 mark) 2. For the Lakeland data, use SAS to compute the estimated logistic regression

equation. (5 marks) 3. Use the estimated logit computed above to estimate the probability that students

with a 2.5 grade point average who did not attend the orientation program will return to Lakeland for their sophomore year. What is the estimated probability for students with a 2.5 grade point average who attended the orientation program? (10 marks)

4. What is the estimated odds ratio for the orientation program? Interpret it. (5 marks)

5. Would you recommend making the orientation program a required activity? Why or why not? (5 marks)