Mechanical Engineering homework help

Mechanical Engineering homework help. The goal of the project is to model and understand the socio-economic factors affecting cancer mortality.
The data were aggregated from a number of sources including the American Community Survey
(census.gov (http://census.gov)), clinicaltrials.gov (http://clinicaltrials.gov), and cancer.gov
(http://cancer.gov). The data dictionary is provided in the Appendix. We will attempt to predict cancer
mortality in different counties in the nation (TARGET_deathRate) and try to understand how different
socio-economic factors might influence health and mortality.
The data has been portioned into two (1) CancerData.CSV, and (2) CancerHoldoutData.csv. Use
CancerData.csv for model training, parameter tuning (if any), etc. CancerHoldoutData.csv should only be
used for evaluation of model performance. It should not be used in anyway in the model development
process.
Analyze the following. Note that the items need not be presented in a sequential order. You can address
them in any order. For example, missing data analysis can be integrated with regression analysis.
1. Exploratory Data analysis 20 Points
 What variables look most promising for predicting cancer mortality from exploratory data
analysis? Why?
 Are there any outliers? Can they be detected and addressed? How does addressing outliers affect
model performance?
 Are there any missing values? Research and explore techniques to handle missing values. Note
that the approach to handle missing data might be different for different variables. Document
model performance improvement obtained by missing data handling.
 Is there any collinearity between variables? Can it be detected? Document how addressing
collinearity affects model performance?
2. Linear Regression 25 Points
 Develop a linear regression model.
 What variables are significant? Insignificant? How does removing insignificant variables affect
model performance?
 Present and interpret model diagnosis. What insights did you obtain to improve the model from
diagnosis?
 Include few non-linear and interaction terms and evaluate how they affect model performance
and diagnosis.
3. KNN
 Split CanverData.csv data into 70% training and 30% testing.
 Develop KNN model for predicting Cancer Mortality. Evaluate test MSE for at least 5 different
values of K and find the K that minimizes test MSE. 20 Points
 KNN is a non-linear technique, but does not work well with high dimensional data. Try to
identify important variables from Linear Regression model and use only a subset of important
features in the KNN model. Document impact on test performance 20 Points
4. Feature Selection 10 Points
Write an “Executive Summary” section documenting your interpretation of the important features
impacting cancer mortality and how they influence cancer mortality.
5. Performance reporting on Holdout data 5 Points
Summarize and compare the model performance (MSE) of LR and KNN on holdout dataset as a table.
Appendix: Data Dictionary
1. TARGET_deathRate: Dependent variable. Mean per capita (100,000) cancer mortalities
2. incidenceRate: Mean per capita (100,000) cancer diagnoses
3. medianIncome: Median income per county
4. povertyPercent: Percent of populace in poverty
5. MedianAge: Median age of county residents
6. MedianAgeMale: Median age of male county residents
7. MedianAgeFemale: Median age of female county residents
8. Geography: County name
9. AvgHouseholdSize: Mean household size of county
10. PercentMarried: Percent of county residents who are married
11. PctNoHS18_24: Percent of county residents ages 18-24 highest education attained: less than high
school
12. PctHS18_24: Percent of county residents ages 18-24 highest education attained: high school
diploma
13. PctSomeCol18_24: Percent of county residents ages 18-24 highest education attained: some
college
14. PctBachDeg18_24: Percent of county residents ages 18-24 highest education attained: bachelor’s
degree
15. PctPrivateCoverage: Percent of county residents with private health coverage
16. PctPublicCoverage: Percent of county residents with government-provided health coverage
17. PctPubliceCoverageAlone: Percent of county residents with government-provided health
coverage alone
18. PctWhite: Percent of county residents who identify as White
19. PctBlack: Percent of county residents who identify as Black
20. PctAsian: Percent of county residents who identify as Asian
21. PctOtherRace: Percent of county residents who identify in a category which is not White, Black,
or Asian
22. PctMarriedHouseholds: Percent of married households

Mechanical Engineering homework help

Solution:

15% off for this assignment.

Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!

Why US?

100% Confidentiality

Information about customers is confidential and never disclosed to third parties.

Timely Delivery

No missed deadlines – 97% of assignments are completed in time.

Original Writing

We complete all papers from scratch. You can get a plagiarism report.

Money Back

If you are convinced that our writer has not followed your requirements, feel free to ask for a refund.