L EADINGWITH A NALYTICS : P ROGRAM R ESOURCES F ALL 2018
Unit 1: Why Analytics Matter Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Introduction: Brief history of analytics from 18,000 BCE. • Key Fact: 90% of the world’s data has been created within the past two years. • Data Analytics Today: Infographic on page 3 • Case Study: Moneyball by Michael Lewis exemplifies analytics; 2002 Oakland Athletics. Baseball is a great setting to gain lots of data with clear outcomes. Definitions of Analytics: • “The scientific process of transforming data into insight for better decisions.” – INFORMS • “The discovery and communication of meaningful patterns in data.” – WIKIPEDIA • “The broad use of data and qualitative analysis for decision-making within organizations.” – TOM DAVENPORT Analytics involves the idea of a transformation. We want to tie analytics to decision-making. Analytics can turn data into value / action. “Business Intelligence” can include analytics, querying, dashboarding, and reporting. • Predictive analytics help us to try and predict the future. • Prescriptive analytics are used to improve decision-making. The analytical approach is not always the best fit nor should be utilized in all situations – for example, ordering at a restaurant. Analytics can enable transformation, scalability, detailed understanding, and survivability. • Case study: U.S. Credit Card Business circa 1980’s infographic on page 4 o Data analytics changed the credit card industry with automation. Credit cards are not necessarily banking, but ideal for information. • Case Study: The Value of Detailed Customer Data infographic on page 5 o Signet / Capitol One transforms itself and the industry. • Challenges with data: Asking the wrong question, getting the wrong data, failing to understand the data, conducting a poor analysis of data, and making incorrect decisions. Key Takeaways: • Analytics really do matter. Get quality data, visualize the data, analyze data, and make decisions. • After the Moneyball story was released, several baseball teams copied the Moneyball strategy / approach and also had great success. • The goal is to combine analytics with other strategies. Don’t place too much emphasis on data. • Even successful teams have to reassess their strategy to find the right balance between analytics and intuition. • Understanding where analytics has the most impact and where it has very little impact. Types of Analytics: • Descriptive analytics allow us to understand what has happened in the past.
DATA ANALYTICS TODAY
number of E-MAILS sent every SECOND
data consumed by households each day
MILLION 50 tweet s per day
20 HOURS of video uploaded to YouTube every Minute
minutes spent on facebook each month
72.9 products ordered of f of amazon pe r s e c ond
1.3 EXABYTES of data received BY MOBILE Internet users
24 PETABYTES processed per day by google
* data complied by IBM
THERE IS ENORMOUS VALUE IN DATA:
DATA ANALYTICS is a priority for executives
plan to invest in predictive analytics tools in the next 12-24 months
plan to invest in more Data Analytics software in the next 12-24 months
invested in Data Analytics software in the last 12 months
* data complied by www.wipro.com
predictive analytics business orientation + analysis application
real-time analytics platforms
application performance management other
75% 6.3% 6.3% WHERE CAOs INVEST THE NEXT 12-24 MONTHS 68.8% 56.3% 43.8% 43.8% 25% 25% 18.8%
11.1% will the budget slow progress 11.1%
WHAT WORRIES CAOs THE MOST
is talent acquisition slowing strategy
do privacy issues threaten progress
are data + analytics going to meet expectations
22.2% whats the best way to assure adoption + executive support
what organizational models work best
3 maintaining consistent, integrated customer interactions across multiple channels
analytical systems to reveal key customer insights
bui lding an efficient data management framework
Top 3 Battles faced by a CAO everyday
* data complied by www.chiefanalyticsofficerforum.com
US Credit Card Business circa 1980s
In the 1980s, credit card spending rose from
1234 4567 7890 0112 1234 11/85
1234 4567 7890 0112 1234 11/85
1234 4567 7890 0112 1234 11/85
1234 4567 7890 0112 1234 11/85
1234 4567 7890 0112 1234 11/85 1234 11/85
1234 4567 7890 0112 1234 11/85
1234 4567 7890 0112 1234 11/85 0112
1234 4567 7890 0112 1234 11/85
to 4% 8% 1234 4567 7890 0112 1234 11/85 1234 4567 7890 0112 1234 11/85 1234 4567 7890 0112 1234 11/85 1234 4567 7890 0112 1234 11/85
The switch from store credit to general-purpose credit cards lead to an increase in the revolving debt
40% in 1980 70% by 1989
Credit cards were
more profitable than 7 other retail bank products 3-5x
In 1987 nearly 4,000 banks issued credit cards
Top 4 Association cards
15.3 billion Citibank 5.4 billion
Chase 5.2 billion Bank of America 4.6 billion First Chicago
Top 2 Non-Association
General Purpose Credit Card Issuers
5.9 billion Discover
American Express 3.8 billion
Mass-marketing mailings generated a 5%-10% response rate before 1988
Banks used simiple metrics to evaluate creditworthiness
Credit Score & 780 FICO
Every cardholder was charged 19.8% interest
THE VALUE OF DETAILED CUSTOMER DATA
UNDERSTANDING + PERSONALIZATION
“ We are not utilizing the power of statistical analysis on unavailable data for making decisions with long term consequences ”
“ Credit cards aren’t banking -- they’re information. ”
“ Banks are not utilizing IT to collect data ”
Richard Fairbank + Nigel Morris Stanford and London Business School MBAs (both at Mercer Management Consulting) worked together on a project where they researched unprofitable operations at a bank.
Credit card companies in the 1980s charged everyone the same interest rate: 19.8% . Let’s think about why charging all credit card customers the same interest rate might be a problem. WHY PERSONALIZE ?
For illustration, let’s imagine a simple world in which the average customer defaults on their credit card with a 15% probability. As a credit card issuer, we might choose a single interest rate that is fair for that level of risk.
Generic Applicant 15% default probability
Mountain Biker 10% default probability
20% default probability
But suppose in reality there are two types of people in this world: mountain bikers , who are low-risk customers with a 10% default probability, and baseball dads , who are high-risk customers with a 20% default probability.
by charging a single interest rate we are undercharging the high -risk baseball dads & will lose money on these customers. IN THIS CASE,
We are also overcharging the low-risk mountain bikers . Over time, they may leave us for competitors who offer them better & more accurate prices...
AND WE’LL BE STUCK WITH ALL THE BAD CUSTOMERS.
WHY SEGMENT ? The point is that we will better compete in this marketplace the better we can understand our customers and their risk profiles. Of course, the real world is much more complicated in that a real credit card company has a lot of variables on which to segment its base of customers and so we can potentially CATEGORIZE OUR CUSTOMERS into lots of tiny buckets.
Single Female age 35-39, zip codes 27500-27599, income $50K-$70K, holds another Capital One card, annual charges between $5000 - $7500, rejected a previous offer in the last 6 months, & purchase frequently from Amazon.com
How can a company understand the behavior of its customers at such a granular level? Certainly it requires lots of data + sophisticated analytics.
To make matters worse, customer behavior changes over time , so Capital One needs to be constantly learning about its customers.
Unit 2: Asking Connecting Questions Faculty: Dr. Adam Mersereau and Dr. Brad Staats
The Successful Data Scientist: • Definition and infographic on page 7 • Five dimensions of the so-called data scientist include being a:
o (1) business or domain expert, (2) statistics expert, (3) programming expert, (4) database technology expert; and (5) visualization and communications expert. • The Data Science Team: No one person is an expert in all five dimensions. Put a team together who can be responsible for all of the five dimensions. (see data science teams article on page 8) Connecting questions equal “crunchy questions”* and should : • Be “sticky” & aligned with strategic goals. • Relate to key performance indicators. • Be designed to be actionable and informational. • Provide foresight rather than hindsight. –NICOLE LASKOWSKI (TechTarget) • Be capable of forming the connective tissue between tactical and senior-level objectives in an organization. – JOHN LUCKER (Deloitte) Using connecting questions is how an expert translates strategic goals into action. * “ Crunchy Questions” – from Deloitte Model Building:
• Asking connecting questions is really an exercise in model building. • By model, we mean a representation of how inputs are related to outputs. Models drive the conscious decisions that we make. • There can be an explicit model like Moneyball, based on written statistics. • Or there can be a mental model , a sense for what is good, but it’s not written down. • Exploring Blackjack: Movies like Bringing Down the House or 21 use the concept of data in order to win.
Connecting Questions Framework: “An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.” –JOHN TUKEY • Lots of moving parts - Inputs, outputs and relationships. • Challenges with data: Asking the wrong question, getting the wrong data, failing to understand the data, conducting a poor analysis of data, and making incorrect decisions. Key Takeaways: • Netflix is a great example of using data analytics to forecast what a user may enjoy. • Fictitious Casino Case Study: How can a casino can make more money? Giving comps, food and drinks to those who spend a lot of time at gambling. How much more gambling is generated by the comps, food and drinks? Assist to create best comp policy for the future. Are comps impacting daily behavior of guests? • Top-Down approach: Consider - what’s wrong? what are the symptoms? and how to solve it… • Bottom-Up approach: Look at data to understand questions we should ask.
The Successful Data Scientist
Business or Domain Expert
Has an understanding of the problem and how the context impacts both the problem and the potential solution
Has an ability to write code to acquire, process and analyze the data
Has an understanding of how to do the analysis - what the results say and what the limitations are
Database Technology Expert Has the knowledge to store the data in an effective and efficient
Visualization & Communications Expert Has an ability to represent the data, to understand it for oneself, and then to educate and persuade others
manner so it can be used now and in the future
Next in Tech
The 5 Dimensions of the So-Called Data Scientist
by Anand Rao
March 5, 2014
Business Intelligence Data Mining Data Visualization
What is “data science”? Is it really a new emerging discipline as some claim it to be; or is it the emperor in new clothes – data mining, statistics, business intelligence or analytics re-branded? Moreover, is it possible that one person can fulfil the role of a data scientist? Rather than answering this question directly, let’s review some of the skills required for someone to be a “data scientist.” First and foremost, a “data scientist” is a business or domain expert : Someone who has to have the ability to articulate how information, insights, and analytics can help business leadership answer key questions – and even determine which questions need answering – and make appropriate decisions. The data scientist will need a thorough understanding of the business across the value chain (from marketing, sales, distribution, operations, pricing, products, finance, risk, etc.) to do this well. Second, a “data scientist” is a statistics expert : Someone who has to have the ability to determine the most appropriate statistical techniques for addressing different classes of problems, apply the relevant techniques, and translate the results and generate insights in such a way that the businesses can understand the value. This will be predicated on a thorough understanding of statistical (e.g., regression analysis, cluster analysis, and optimization techniques) techniques and the tools and languages used to run the analysis (e.g., SAS or R) Third, a “data scientist” is a programming expert : Someone who has the ability to determine the appropriate software packages or modules to run, the ability to modify them, and the ability to design and develop new computational techniques to solve business problems (e.g., machine learning, natural language processing, graph/social network analysis, neural nets, and simulation modelling). Invariably, the data scientist would have a computer science background and be comfortable designing and programming in a variety of languages including Java, Python, C++ or C#. Fourth, a “data scientist” is a database technology expert : Someone who has a thorough understanding of external and internal data sources, how they are gathered, stored, and retrieved. This will enable the data scientist – and by extension, the business as a whole – to 1) extract, transform and load data stores; 2) retrieve data from external sources (through screen scraping and data transfer protocols); 3) use and manipulate large ‘big data’ data stores (like Hadoop, Hive, Mahoot and an entire range of emerging Big Data technologies); and 4) use the disparate data sources to analyze the data and generate insights. Finally, a “data scientist” is a visualization and communications expert : Someone who has a thorough understanding of visual art and design. This is important because it enables those who aren’t professional data analysts to interpret data. Accordingly, the data scientist should be able to 1) take statistical and computational analysis and turn it into graphs, charts, and animations; 2) create visualizations (e.g., motion charts, word maps) that clearly show insights from data and corresponding analytics; and 3) generate static and dynamic visualizations in a variety of visual media (e.g., reports, screens – from mobile screens to laptop/desktop screens to HD large visualization walls, interactive programs, and – perhaps soon – augmented reality glasses). Last, but not least, a ‘data scientist’ should be able to engage with senior management, talk their language and translate the data-driven insights into decisions and actions. Do any of the alternative phrases, such as “data mining”, “business intelligence”, “analytics”, “statistician” capture all of the five expertise areas? Do you have any “data scientists” who fit the description above in your organization? If not, where and how can you find them?
Unit 3: Data Acquisition, Quality and Strategy Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Types of Data Sources (Where Does Data Come from?): • Internal Data: Proprietary data that is specific to an organization.
• Secondary Data: This could be available at the industry level. Or a company that collects and resells data. • Generate Data: Surveys, focus group, experiments, and sensors. Estimated 50 billion sensor objects by 2020. “Big Data” refers to sensors, internet, text data, and voice data. The Three V’s: • Volume (Big) – Size of the datasets. • Velocity (Fast) – Data is rapidly generated in real time. • Variety (Ugly) – Less curated than traditional static datasets, includes unstructured data. Case study: Disney World Retail and Big Data - How It’s Used: • Inventory tracking • Customer tracking • Associate empowerment Big Data, Definition and Concerns: “Big Data” is a huge quantity of data being captured from different sources, without traditional data structure, as a fine level of granularity, and in real time. • What technologies are required to store, analyze, and structure such a large amount of data? • What ethical challenges do we have in terms of transparency and customer privacy? • Are we sampling the right data? • Are we inferring correctly? • Are we creatively using Big Data to solve our problems? Data Quality Concerns:
• 1 in 3 business leaders lack faith in the data given to them. • Data quality concerns cost businesses $3 trillion annually. • Data quality issues are demonstrated across industries. • Anywhere from 2% - 25% of clinical pharmaceutical trial data is wrong. Five Stages of Analytical Development: • Analytically impaired • Localized analytics
• Analytical aspirations • Analytical companies • Analytical competitors
Key Takeaways: • Origin of data: Big data comes at a high volume, high velocity, and high variety. • Ensure data quality: Put processes in place to make sure your data is correct and people believe it. • Types of data to gather or invest: Create a data strategy for your organization.
Unit 4: Visualizing Data Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Examples of Data Visualization:
• Food insecurity increases with the poverty rate in the U.S., but there are lots of differences between states. Data visualization can sometimes hinder our ability to understand what we’re trying to study. • Hans Rosling & the BBC: A data visualization expert demonstrating global public health over hundreds of years. o How health and wealth have improved globally, but a need for improvement in poverty rates is obvious in his visualization. Effective visualization (like the example below) are critical to convey information.
Four Question Framework for Visualization Process: Answer the following questions: • What are you trying to accomplish? Explore, educate or persuade? • Who is your audience? Who am I talking to and how much time do they have? • What is the right visualization? o Basic blocking chart and rules of visualization, how the eye “sees and reads”. • What is the story you are telling? o The right picture is worth a lot more than 1,000 words or 10,000 numbers. Key Takeaways: • Data visualization is an important tool to explore data but it can also provide core insights as well. • Use the four question framework to leverage the power of data visualization.
Unit 5: Using Linear Regression; Data Analysis Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Purpose of Analysis: Explore: Example of grease disposal in NYC Goals: • To get to know your data
Techniques: • Summary statistics
o Central tendency o Variation
• To understand the features of the data Consider:
o Correlation o Association • Data visualization
Does the data have integrity? Any missing/erroneous data? Any early surprises?
Goals: • Reduce complexity
• Data reduction
o Data clustering
Example – Acxiom: collects and collates data on U.S. consumers
• Data visualization
o Factor analysis Explain & Predict: Example of how health and wealth have improved globally, but a need for improvement in poverty rates is obvious in the visualization. Goals: • To understand drivers and forecast outcomes • The heart of many analyses Techniques: • Neural networks • Support vector machines Types of Data Analysis; Linear Regression: • Linear regression is the tool for data analysis, empirical science, social science and business research. • Linear regression is a tool that offers an Explanation (relationships, variable, and columns of data) and Prediction (forecast, extrapolations, and interpolations). o Example: Chart of top 30 MBA programs and starting salaries. o Example: Wages at Triangle Construction Company. Analytics involves the idea of a transformation. We want to tie analytics to decision-making. Hypothesis Testing: Assessing whether an observed difference is a fluke or real; also referred to as statistical significance. Is the relationship significant or a fluke? • Assess hypothesis tests by their p-values. Example: Who’s taller? Men or women? • The p-value measures the likelihood that a data outcome is by chance. Example: Flipping a coin. • Small p-values support the claim that a real difference exists. • A p-value is less than 5% indicates there is probably a real difference. • Statistical (not practical) significance is a term that can be used when we determine that a real difference exists. • The p-value is important to the interpretation of a regression because a low value means the regression has explanatory power. • Random forecasts • Linear regression
Unit 6: Putting It All Together Faculty: Dr. Adam Mersereau and Dr. Brad Staats
Making and Implementing Decisions: • Using standard processes in organizations has been a key driver in both productivity and quality in an operation. • Lean production is found in a wide variety of settings, from manufacturing to law firms or healthcare. • Unfortunately, people don’t always do what they should. Examples: food safety, medical errors, etc… • What can we do to address process compliance issues? o Example: Good hygiene practices in medical care to reduce infection rates. First Steps: Example: Hand hygiene compliance rate in a hospital. • Ask Connecting Questions – Ensure you ask the correct questions. • Collect the data – What and how are you collecting data? Do you have the necessary data? • Analyze the data – Consider the volume, velocity and variety of data. • Visualize the data – Are you trying to explore, educate or persuade? Linear regression is the “go-to” tool. Availability Bias: • When you make assumptions based on available data only (remembering a plane crash), not from appropriate sources of data. • Need relevant and representative information to avoid the availability bias. Counterfactual Scenario: What would have happened to the treated group if they did not have this intervention? Example: Hand hygiene compliance rate in a hospital. • Experimental group (receiving intervention) vs. the Control group (not receiving intervention). Ideally, want to assign people to each group randomly and track their behavior over time. • Monitor individual or unit-level performance. Example: How often is an individual washing their hands vs. how many times did the hand sanitizer get used for the entire hospital? Key Takeaways: • Look at all variables across all models. Use data to draw conclusions. • If you aren’t using data analytics as an organization, you are falling behind. • If you are using data analytics, is that enough? Are you analyzing the appropriate data? • Turn data into value: “We are not in banking, we are in information.” – Richard Fairbank, Capitol One Founder, Chairman and CEO.Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8-9 Page 10 Page 11 Page 12 Page 13 Page 14
Made with FlippingBook - Online catalogs