Vulcan - Leading with Analytics: Program Resources

Unit 5: Using Linear Regression; Data Analysis Faculty: Dr. Adam Mersereau and Dr. Brad Staats

Purpose of Analysis: Explore: Example of grease disposal in NYC Goals: • To get to know your data

Techniques: • Summary statistics

o Central tendency o Variation

• To understand the features of the data Consider:

o Correlation o Association • Data visualization

 Does the data have integrity?  Any missing/erroneous data?  Any early surprises?

Simplify:

Goals: • Reduce complexity

Techniques:

• Data reduction

o Data clustering

 Example – Acxiom: collects and collates data on U.S. consumers

• Data visualization

o Factor analysis Explain & Predict: Example of how health and wealth have improved globally, but a need for improvement in poverty rates is obvious in the visualization. Goals: • To understand drivers and forecast outcomes • The heart of many analyses Techniques: • Neural networks • Support vector machines Types of Data Analysis; Linear Regression: • Linear regression is the tool for data analysis, empirical science, social science and business research. • Linear regression is a tool that offers an Explanation (relationships, variable, and columns of data) and Prediction (forecast, extrapolations, and interpolations). o Example: Chart of top 30 MBA programs and starting salaries. o Example: Wages at Triangle Construction Company. Analytics involves the idea of a transformation. We want to tie analytics to decision-making. Hypothesis Testing: Assessing whether an observed difference is a fluke or real; also referred to as statistical significance. Is the relationship significant or a fluke? • Assess hypothesis tests by their p-values. Example: Who’s taller? Men or women? • The p-value measures the likelihood that a data outcome is by chance. Example: Flipping a coin. • Small p-values support the claim that a real difference exists. • A p-value is less than 5% indicates there is probably a real difference. • Statistical (not practical) significance is a term that can be used when we determine that a real difference exists. • The p-value is important to the interpretation of a regression because a low value means the regression has explanatory power. • Random forecasts • Linear regression

Made with FlippingBook - Online catalogs