Statistics homework help.
- There are three parts of the homework.
- Part I – You must solve manually using calculator
- Part II – You will need to solve using MS-Excel
- Part III – You will need to solve using R 3) Your submission would be a single PDF file.
- For Excel part, you will need to include snapshot of your analysis done using excel. Along with the same, please write a short paragraph explaining how you conducted the analysis. If you have been asked follow-up questions based on your analysis, please answer them as well.
- For R part, please submit snapshot of your R code containing comments. If you have been asked follow-up questions based on your analysis, please answer them as well. 6) Please ensure to submit your homework in sequential order.
Part 1: Questions to be solved manually using calculator:
- 1 Following tables 1 and 2 were collected by Mr. Biden as a part of gaming business. Mr. Biden knows that covariance associated with Game 1 is given by “H”. In order to decide his strategy for next year, he wants to know correlation and covariance between Customers and Profit with reference to Game 2 in terms of H. Please help Mr. Biden to get the required information.
Table 1: Game 1 | Table 2: Game 2 | |||
Customers | Profit | Customers | Profit | |
10 | 5 | 60 | 30 | |
20 | 15 | 90 | 40 | |
30 | 20 | 150 | 40 | |
40 | 25 | 30 | 10 | |
50 | 10 | 120 | 75 |
Q.2 Mr. Kumar is a computer operator. He is interested in determining relationship between Input and Output values generated by the computer he is working on. Use simple linear regression to help Mr.
Kumar answer few questions based on the data provided in the below table.
Input | Output |
1 | 4 |
2 | 7 |
3 | 10 |
4 | 13 |
5 | 16 |
6 | 19 |
- Develop a relationship between Input (independent variable) and output (dependent variable) using simple linear regression.
- Predict the output values for input values 7 and 8. Calculate sum of squared errors (SSE)
- Calculate Total Sum of Squares (SST).
- Find the r^{2 }for the model developed in sub question 1
Q.3
Consider the following set of points:
x_{1} | x_{2} | y |
1 | 1 | 9 |
2 | 0 | 15 |
0 | 1 | 2 |
- Find the least square regression line ?? = ?_{0 }+ ?_{1}?_{1 }+ ?_{2}?_{2} from the given data points.
- Find the residual corresponding to each y
- Find the Sum of Squared Error (SSE)
- Find the Total sum of squares (SST)
- Estimate the value of y given (?_{1 }= 2, ?_{2 }= 3)
- Find the ?^{2} of the model
Q.4
Mr. King performed logistic regression for his manufacturing project. The actual observed odd ratio for his project is 3/2. However, his predicted odds ratio is 2/3. Please help Mr. King to calculate the absolute difference between observed probability and estimated/predicted probability.
Q.5
Using the information provided below, convert the problem into logistic regression framework and solve
- Framework for logistic regression: ln(odds ratio)= ?_{0 }+ ?_{1 }∗independent variable
Length | Number of Positive Cases |
Number of Negative Cases |
4810 | 47 | 139 |
4520 | 177 | 241 |
4400 | 1087 | 1183 |
4370 | 187 | 175 |
4350 | 397 | 671 |
3780 | 40 | 14 |
3660 | 39 | 17 |
- Find the appropriate equation for the above data. (Hint: You will need to use formula from simple linear regression. But you would need to calculate your Y values based on information provided keeping logistic regression framework in mind).
- Find estimated probability for length=5000.
- Based on estimated probability in earlier case, how many total cases (estimate) one needs to gather if he/she wants to get 100 positive cases for length=5000.
Part 2: Questions related to Excel:
- For the dataset under worksheet “Example 2”, perform simple linear regression using Excel with
Tissue Concentration as an independent variable
- Based on results, please write down regression line equation
- Based on results, can intercept have value “-2”? If yes/no, why?
- What percentage of the total variation of “Math Score” is NOT explained by regression?
- For the dataset under worksheet “Example 3”, perform multiple linear regression using Excel with Average _fare as a dependent variable
- Based on results, please write down regression line equation (α = 0.05)
- Interpret results as in explain what will happen to dependent variable with increase or decrease independent variables. (For this question: ? = 0.1)
- Wang is a new intern at Magical Analytics company. Her manager, Dr. Chen gave her the below Excel output results. These are results generated by running Excel which is performing simple linear regression on a dataset. Few missing values are represented by letters A-F. Please answer the following question:
- Please find values for A, B, C, D, E and F.
Part 3 :Questions related to R:
- For the given data vectors with “SO2_Concentration” and “Recession_Rate”, please fill in the necessary blanks in R code to get the following:
- Find the correlation between two variables
- Find the covariance between two variables
- Apply simple linear regression with “Recession_Rate” as an independent variable
- Summarize your simple linear regression
- Find the confidence intervals for simple linear regression model
- ANOVA for simple linear regression.
- Based on the results, write down the equation of the regression line. (Terms which are statistically insignificant will have 0 value for their beta values).
- Report the adjusted R^{2}
- Find the relationship between multiple R^{2 }and correlation.
What to submit:
R code and comments about R Code
Answers of the questions g, h and i with explanation
- For the data you are fetching from the website, please fill the necessary blanks in R code to get the following:
- Fit a multiple linear regression using y (cost) as dependent variable and x1, x2, x3, x4 as independent variables
- Summarize your multiple linear regression
- Find the confidence interval for multiple linear regression
- ANOVA for the multiple linear regression
- Based on the results, write down the equation of the regression line. (Terms which are statistically insignificant will have 0 value for their beta values).
- Derive F statistics expressed by summary function using details given by anova table
What to submit:
R code and comments about R Code
Answers of the questions e and f with explanations
- Ms. Wang is a new intern at Magical Analytics company. Her manager, Dr. Chen gave her the below R output results. These are results generated by running R code which is performing multiple linear regression on a dataset. Please answer the following question:
- What are number of observations in this dataset?
- Calculate F statistics for multiple regression model. Please specify the number of degrees of freedom for the numerator and denominator.
- Which predictors would not contain 0 in their confidence interval? (α = 0.05)
- If lower end confidence interval value for complaints variable is 0.28016866, then find the confidence interval for intercept.
Sources: http://users.stat.ufl.edu/~winner/data/lsd.dat http://users.stat.ufl.edu/~winner/data/airq402.dat http://users.stat.ufl.edu/~winner/data/tombstone.dat http://users.stat.ufl.edu/~winner/datasets.html http://users.stat.ufl.edu/~winner/data/atlschool.txt