Unit 5: Using Linear Regression; Data Analysis Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Purpose of Analysis: Explore: Example of grease disposal in NYC Goals: • To get to know your data
Techniques: • Summary statistics
o Central tendency o Variation
• To understand the features of the data Consider:
o Correlation o Association • Data visualization
Does the data have integrity? Any missing/erroneous data? Any early surprises?
Simplify:
Goals: • Reduce complexity
Techniques:
• Data reduction
o Data clustering
Example – Acxiom: collects and collates data on U.S. consumers
• Data visualization
o Factor analysis Explain & Predict: Example of how health and wealth have improved globally, but a need for improvement in poverty rates is obvious in the visualization. Goals: • To understand drivers and forecast outcomes • The heart of many analyses Techniques: • Neural networks • Support vector machines Types of Data Analysis; Linear Regression: • Linear regression is the tool for data analysis, empirical science, social science and business research. • Linear regression is a tool that offers an Explanation (relationships, variable, and columns of data) and Prediction (forecast, extrapolations, and interpolations). o Example: Chart of top 30 MBA programs and starting salaries. o Example: Wages at Triangle Construction Company. Analytics involves the idea of a transformation. We want to tie analytics to decision-making. Hypothesis Testing: Assessing whether an observed difference is a fluke or real; also referred to as statistical significance. Is the relationship significant or a fluke? • Assess hypothesis tests by their p-values. Example: Who’s taller? Men or women? • The p-value measures the likelihood that a data outcome is by chance. Example: Flipping a coin. • Small p-values support the claim that a real difference exists. • A p-value is less than 5% indicates there is probably a real difference. • Statistical (not practical) significance is a term that can be used when we determine that a real difference exists. • The p-value is important to the interpretation of a regression because a low value means the regression has explanatory power. • Random forecasts • Linear regression
Made with FlippingBook - Online catalogs