32 posts categorized "analytics & machine learning"

05 May 2016

Counting to 10, science about science, and Explainista vs. Randomista.


SPOTLIGHT 1. Take a deep breath, everybody.
Great stuff this week reminding us that a finding doesn't necessarily answer a meaningful question. Let's revive the practice of counting to 10 before posting remarkable, data-driven insights... just in case.

This sums up everything that's right, and wrong, with data. In a recent discussion of some impressive accomplishments in sports analytics, prior success leads to this statement: “The bottom line is, if you have enough data, you can come pretty close to predicting almost anything,” says a data scientist. Hmmm. This sounds like someone who has yet to be punched in the face by reality. Thanks to Mara Averick (@dataandme).

Sitting doesn't typically kill people. On the KDNuggets blog, William Schmarzo remarks on the critical thinking part of the equation. For instance, the kerfuffle over evidence that people who sit most of the day are 54% more likely to die of heart attacks. Contemplating even briefly, though, raises the question about other variables, such as exercise, diet, or age.

Basic stats - Common sense = Dangerous conclusions viewed as fact

P-hacking is resilient. On Data Colada, Uri Simonsohn explains why P-Hacked Hypotheses are Deceivingly Robust. Direct, or conceptual, replications are needed now more than ever.

2. Science about science.
The world needs more meta-research. What's the best way to fund research? How can research impact be optimized, and how can that impact be measured? These are the questions being addressed at the International School on Research Impact Assessment, founded by RAND Europe, King's College London, and others. Registration is open for the autumn session, September 19-23 in Melbourne.

Evidence map by Bernadette Wright

3. Three Ways of Getting to Evidence-Based Policy.
In the Stanford Social Innovation Review, Bernadette Wright (@MeaningflEvdenc) does a nice job of describing three ideologies for gathering evidence to inform policy.

  1. Randomista: Views randomized experiments and quasi-experimental research designs as the only reliable evidence for choosing programs.
  2. Explainista: Believes useful evidence needs to provide trustworthy data and strong explanation. This often means synthesizing existing information from reliable sources.
  3. Mapista: Creates a knowledge map of a policy, program, or issue. Visualizes the understanding developed in each study, where studies agree, and where each adds new understanding.

21 April 2016

Baseball decisions, actuaries, and streaming analytics.

Cutters from Breaking Away movie

1. SPOTLIGHT: What new analytics are fueling baseball decisions?
Tracy Altman spoke at Nerd Nite SF about recent developments in baseball analytics. Highlights from her talk:

- Data science and baseball analytics are following similar trajectories. There's more and more data, but people struggle to find predictive value. Oftentimes, executives are less familiar with technical details, so analysts must communicate findings and recommendations so they're palatable to decision makers. The role of analysts, and  challenges they face, are described beautifully by Adam Guttridge and David Ogren of NEIFI.

- 'Inside baseball' is full of outsiders with fresh ideas. Bill James is the obvious/glorious example - and Billy Beane (Moneyball) applied great outsider thinking. Analytics experts joining front offices today are also outsiders, but valued because they understand prediction;  the same goes for anyone seeking to transform a corporate culture to evidence-based decision making.

Tracy Altman @ Nerd Nite SF
- Defensive shifts may number 30,000 this season, up from 2,300 five years ago (John Dewan prediction). On-the-spot decisions are powered by popup iPad spray charts with shift recommendations for each opposing batter. And defensive stats are finally becoming a reality.

- Statcast creates fantastic descriptive stats for TV viewers; potential value for team management is TBD. Fielder fly-ball stats are new to baseball and sort of irresistible, especially the 'route efficiency' calculation.

- Graph databases, relatively new to the field, lend themselves well to analyzing relationships - and supplement what's available from a conventional row/column database. Learn more at FanGraphs.com. And topological maps (Ayasdi and Baseball Prospectus) are a powerful way to understand player similarity. Highly dimensional data are grouped into nodes, which are connected when they share a common data point - this produces a topo map grouping players with high similarity.

2. Will AI replace insurance actuaries?
10+ years ago, a friend of Ugly Research joined a startup offering technology to assist actuaries making insurance policy decisions. It didn't go all that well - those were early days, and it was difficult for people to trust an 'assistant' who was essentially a black box model. Skip ahead to today, when #fintech competes in a world ready to accept AI solutions, whether they augment or replace highly paid human beings. In Could #InsurTech AI machines replace Insurance Actuaries?, the excellent @DailyFintech blog handicaps several tech startups leading this effort, including Atidot, Quantemplate, Analyze Re, FitSense, and Wunelli.

3. The blind leading the blind in risk communication.
On the BMJ blog, Glyn Elwyn contemplates the difficulty of shared health decision-making, given people's inadequacy at understanding and communicating risk. Thanks to BMJ_ClinicalEvidence (@BMJ_CE).

4. You may know more than you think.
Maybe it's okay to hear voices. Evidence suggests the crowd in your head can improve your decisions. Thanks to Andrew Munro (@AndrewPMunro).

5. 'True' streaming analytics apps.
Mike Gualtieri of Forrester (@mgualtieri) put together a nice list of apps that stream real-time analytics. Thanks to Mark van Rijmenam (@VanRijmenam).

14 April 2016

Analytics of presentations, Game of Thrones graph theory, and decision quality.


1. Edges, dragons, and imps.
Network analysis reveals that Tyrion is the true protagonist of Game of Thrones. Fans already knew, but it's cool that the graph confirms it. This Math Horizons article is a nice introduction to graph theory: edges, betweeness, and other concepts. 

Decision quality_book

2. Teach your team to make high-quality decisions. Few of us have the luxury of formally developing a decision-making methodology for ourselves and our teams. And business books about strategic decisions can seem out of touch. Here's a notable exception: Decision Quality: Value Creation from Better Business Decisions by Spetzler, Winter, and Meyer.

The authors are well-known decision analysis experts. The key takeaways are practical ideas for teaching your team to assess decision quality, even for small decisions. Lead a valuable cultural shift by encouraging people to fully understand why it's the decision process, not the outcome, that is under their control and should be judged. (Thanks to Eric McNulty.)

 3. Analytics of 100,000 presentations.
Great project we hope to see more of. Big data analysis on 100,000 presentations looked at variables such as word choices, vocal cues, facial expressions, and gesture frequency. Then they drew conclusions about what makes a better speaker. Among the findings: Ums, ers, and other fillers aren't harmful midsentence, but between points they are. Words like "challenging" can tune in the audience if spoken with a distinct rate and volume. Thanks to Bob Hayes (@bobehayes).

4. Evidence-based policy decisions.
Paul Cairney works in the field of evidence-based policy making. His new book is The Politics of Evidence-Based Policy Making, where he seeks a middle ground between naive advocates of evidence-based policy and cynics who believe policy makers will always use evidence selectively.

07 April 2016

Better evidence for patients, and geeking out on baseball.

Health tech wearables

1. SPOTLIGHT: Redefining how patients get health evidence.

How can people truly understand evidence and the tradeoffs associated with health treatments? How can the medical community lead them through decision-making that's shared - but also evidence-based?

Hoping for cures, patients and their families anxiously Google medical research. Meanwhile, the quantified selves are gathering data at breakneck speed. These won't solve the problem. However, this month's entire Health Affairs issue (April 2016) focuses on consumer uses of evidence and highlights promising ideas.

  • Translating medical evidence. Lots of synthesis and many guidelines are targeted at healthcare professionals, not civilians. Knowledge translation has become an essential piece, although it doesn't always involve patients at early stages. The Boot Camp Translation process is changing that. The method enables leaders to engage patients and develop healthcare language that is accessible and understandable. Topics include colon cancer, asthma, and blood pressure management.
  • Truly patient-centered medicine. Patient engagement is a buzzword, but capturing patient-reported outcomes in the clinical environment is a real thing that might make a big difference. Danielle Lavallee led an investigation into how patients and providers can find more common ground for communicating.
  • Meaningful insight from wearables. These are early days, so it's probably not fair to take shots at the gizmos out there. It will be a beautiful thing when sensors and other devices can deliver more than alerts and reports - and make valuable recommendations in a consumable way. And of course these wearables can play a role in routine collection of patient-reported outcomes.


2. Roll your own analytics for fantasy baseball.
For some of us, it's that special time of year when we come to the realization that our favorite baseball team is likely going home early again this season. There's always fantasy baseball, and it's getting easier to geek out with analytics to improve your results.

3. AI engine emerges after 30 years.
No one ever said machine learning was easy. Cyc is an AI engine that reflects 30 years of building a knowledge base. Now its creator, Doug Lenat, says it's ready for prime time. Lucid is commercializing the technology. Personal assistants and healthcare applications are in the works.

Photo credit: fitbit one by Tatsuo Yamashita on Flickr.

30 March 2016

$15 minimum wage, evidence-based HR, and manmade earthquakes.


Photo by Fightfor15.org

1. SPOTLIGHT: Will $15 wages destroy California jobs?
California is moving toward a $15/hour minimum wage (slowly, stepping up through 2023). Will employers be forced to eliminate jobs under the added financial pressure? As with all things economic, it depends who you ask. Lots of numbers have been thrown around during the recent push for higher pay. Fightfor15.org says 6.5 million workers are getting raises in California, and that 2/3 of New Yorkers support a similar increase. But small businesses, restaurants in particular, are concerned they'll have to trim menus and staff - they can charge only so much for a sandwich.

Moody's Analytics economist Adam Ozimek says it's not just about food service or home healthcare. Writing on The Dismal Scientist Blog, "[I]n past work I showed that California has 600,000 manufacturing workers who currently make $15 an hour or less. The massive job losses in manufacturing over the last few decades has shown that it is an intensely globally competitive industry where uncompetitive wages are not sustainable." 

It's not all so grim. Ozimek shows that early reports of steep job losses after Seattle's minimum-wage hike have been revised strongly upward. However, finding "the right comparison group is getting complicated."

Yellow Map Chance of Earthquake

2. Manmade events sharply increase earthquake risk.
Holy smokes. New USGS maps show north-central Oklahoma at high earthquake risk. The United States Geological Survey now includes potential ground-shaking hazards from both 'human-induced' and natural earthquakes, substantially changing their risk assessment for several areas. Oklahoma recorded 907 earthquakes last year at magnitude 3 or higher. Disposal of industrial wastewater has emerged as a substantial factor.

3. Evidence-based HR redefines leadership roles.
Applying evidence-based principles to talent management can boost strategic impact, but requires a different approach to leadership. The book Transformative HR: How Great Companies Use Evidence-Based Change for Sustainable Advantage (Jossey-Bass) describes practical uses of evidence to improve people management. John Boudreau and Ravin Jesuthasan suggest principles for evidence-based change, including logic-driven analytics. For instance, establishing appropriate metrics for each sphere of your business, rather than blanket adoption of measures like employee engagement and turnover.

4. Why we're not better at investing.
Gary Belsky does a great job of explaining why we think we're better investors than we are. By now our decision biases have been well-documented by behavioral economists. Plus we really hate to lose - yet we're overconfident, somehow thinking we can compete with Warren Buffet.

16 March 2016

Equity crowdfunding algorithms, decision-making competitions, and statistical wild geese.


1. CircleUp uses algorithm to evaluate consumer startups.
Recently we wrote about #fintech startups who are challenging traditional consumer lending models. CircleUp is doing something similar to connect investors with non-tech consumer startups (food, cosmetics, recreation). It's not yet a robo adviser for automated investing, but they do use machine learning to remove drudgery from the analysis of private companies. @CircleUp's classifier selects emerging startups based on revenue, margins, distribution channels, etc., then makes their findings available to investors. They've also launched a secondary market where shareholders can sell their stakes twice annually. The company has successfully raised Series C funding.

2. Student decision-making competition.
In the 2016 @SABR case competition, college and university students analyzed and presented a baseball operations decision — the type of decision a team’s GM and staff face over the course of a season. Contestants were required to construct and defend a 2016 bullpen from scratch for any National League team, focusing on that team's quality of starting pitching, defense, home ballpark, division opponents, and other factors. The Carnegie Mellon team from the Tepper School of Business won the graduate division.

3. For many, writing is an essential data science skill.
Matt Asay (@mjasay) reminds us data science breaks down into two categories, depending on whether it's intended for human or machine consumption. The human-oriented activity often requires straightforward steps rather than complex digital models; business communication skills are essential. Besides manipulating data, successful professionals must excel at writing paragraphs of explanation or making business recommendations.

Writing for data science

4. Chasing statistical wild geese.
The American Statistical Association has released a statement on p-values: context, process, and purpose. There's been a flurry of discussion. If you find this tl;dr, the bottom line = "P-values don't draw bad conclusions, people do". The ASA's supplemental info section presents alternative points of view - mostly exploring ways to improve research by supplementing p-values, using Bayesian methods, or simply applying them properly. Christie Aschwanden wrote on @FiveThirtyEight that "A common misconception among nonstatisticians is that p-values can tell you the probability that a result occurred by chance. This interpretation is dead wrong, but you see it again and again and again and again. The p-value only tells you something about the probability of seeing your results given a particular hypothetical explanation...." Hence ASA Principle No. 2: “P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.” Nor can a p-value tell you the size of an effect, the strength of the evidence, or the importance of a result. The problem is the way p-values are used, explains Deborah Mayo (@learnfromerror): “failing to adjust them for cherry picking, multiple testing, post-data subgroups and other biasing selection effects”.

Photo credit: Around the campfire by Jason Pratt.

10 March 2016

Analytics disillusionment, evidence-based presentation style, and network analysis.

Polinode New_Layout_Algorithm

1. Visualizing networks.
@Polinode builds innovative tools for network analysis. One nifty feature allows creation of column charts using a set of nodes. A recent post explains how to use calculated network metrics such as centrality or betweenness.

2. Analytics are disconnected from strategic decisions.
An extensive study suggests analytics sponsors are in the trough of disillusionment. The new MIT Sloan-SAS report, Beyond the hype: The hard work behind analytics success finds that competitive advantage from analytics is declining. How can data do more to improve outcomes?

Analytics insights MIT-SAS report

The @mitsmr article notes several difficulties, including failure to drive strategic decisions with analytics. "Over the years, access to useful data has continued to increase, but the ability to apply analytical insights to strategy has declined." Dissemination of insights to executives and other decision makers is also a problem. The full report is available from SAS (@SASBestPractice).

3. Evidence shows graphics better than bullets.
There's new empirical evidence on communicating business strategy. 76 managers saw a presentation by the financial services branch of an auto manufacturer. Three types of visual support were displayed: bulleted list, visual metaphor, and temporal diagram. Each subject saw only one of the three formats. Those who saw a graphical representation paid significantly more attention to, agreed more with, and better recalled the strategy than did subjects who saw a (textually identical) bulleted list version. However, no significant difference was found regarding the *understanding* of the strategy. Also, presenters using graphical representations were more positively perceived those who presented bulleted lists.

4. Linking customer experience with value.
McKinsey's Joel Maynes and Alex Rawson offer concrete advice on how to explicitly link customer experience initiatives to value. "Develop a hypothesis about customer outcomes that matter. Start by identifying the specific customer behavior and outcomes that underpin value in your industry. The next step is to link what customers say in satisfaction surveys with their behavior over time."

5. Never mind on that reproducibility study.
Slate explains how Psychologists Call Out the Study That Called Out the Field of Psychology. In a comment published by Science, reviewers conclude that "A paper from the Open Science Collaboration... attempting to replicate 100 published studies suggests that the reproducibility of psychological science is surprisingly low. We show that this article contains three statistical errors and provides no support for such a conclusion. Indeed, the data are consistent with the opposite conclusion, namely, that the reproducibility of psychological science is quite high." Evidently, OSC frequently used study populations that differed substantially from the original ones - and each replication attempt was done only once.

02 March 2016

NBA heat maps, FICO vs Facebook, and peer review.



1. Resistance is futile. You must watch Steph Curry.
The Golden State Warriors grow more irresistible every year, in large part because of Curry’s shooting. With sports data analytics from Basketball-Reference.com, these heat maps illustrate his shift to 3-pointers (and leave no doubt why Curry was called the Babyfaced Assassin; now of course he’s simply MVP).

2. Facebook vs FICO.
Fintech startups are exploring new business models, such as peer-to-peer lending (Lending Club). Another big idea is replacing traditional credit scores with rankings derived from social media profiles and other data: Just 3 months ago, Affirm and others were touted in Fortune’s Why Facebook Profiles are Replacing Credit Scores. But now the Wall Street Journal says those decisions are falling out of favor, in Facebook Isn’t So Good at Judging Your Credit After All. Turns out, regulations and data-sharing policies are interfering. Besides, executives with startups like ZestFinance find social-media lending “creepy”.

3. How to fix science journals.
Harvard Med School’s Jeffrey Flier wrote an excellent op-ed for the Wall Street Journal, How to Keep Bad Science from Getting into Print [paywall]. Key issues: anonymous peer reviewers, and lack of transparent post-publishing dialogue with authors (@PubPeer being a notable exception). Flier says we need a science about how to publish science. Amen to that.

4. Longing for civil, evidence-based discourse?
ProCon.org publishes balanced coverage of controversial issues, presenting side-by-side pros and cons supported by evidence. The nonprofit’s site is ideal for schoolteachers, or anyone wanting a quick glance at important findings.

04 February 2016

How Warby Parker created a data-driven culture.


4 pic Creating a Data Driven Organization 04feb16


1. SPOTLIGHT: Warby Parker data scientist on creating data-driven organizations. What does it take to become a data-driven organization? "Far more than having big data or a crack team of unicorn data scientists, it requires establishing an effective, deeply ingrained data culture," says Carl Anderson. In his recent O'Reilly book Creating a Data-Driven Organization, he explains how to build the analytics value chain required for valuable, predictive business models: From data collection and analysis to insights and leadership that drive concrete actions. Follow him @LeapingLlamas.

Practical advice, in a conversational style, is combined with references and examples from the management literature. The book is an excellent resource for real-world examples and highlights of current management research. The chapter on creating the right culture is a good reminder that leadership and transparency are must-haves.


Although the scope is quite ambitious, Anderson offers thoughtful organization, hitting the highlights without an overwhelmingly lengthy literature survey. Ugly Research is delighted to be mentioned in the decision-making chapter (page 196 in the hard copy, page 212 in the pdf download). As shown in the diagram, with PepperSlice we provide a way to present evidence to decision makers in the context of a specific 'action-outcome' prediction or particular decision step.

Devil's advocate point of view. Becoming 'data-driven' is context sensitive, no doubt. The author is Director of Data Science at Warby Parker, so unsurprisingly the emphasis is technologies that enable data-gathering for consumer marketing. While it does address several management and leadership issues, such as selling a data-driven idea internally, the book primarily addresses the perspective of someone two or three degrees of freedom from the data; a senior executive working with an old-style C-Suite would likely need to take additional steps to fill the gaps. The book isn't so much about how to make decisions, as about how to create an environment where decision makers are open to new ideas, and to testing those ideas with data-driven insights. Because without ideas and evidence, what's the point of a good decision process?

2. People management needs prescriptive analytics. There are three types of analytics: descriptive (showing what already happened), predictive (predicting what will happen), and prescriptive (delivering recommended actions to produce optimal results). For HR, this might mean answering "What is our staff retention? What retention is expected for 2016? And more importantly, what concrete steps will improve staff retention for this year?" While smart analytics power many of our interactions as consumers, it is still unusual to get specific business recommendations from enterprise applications. That is changing. Thanks @ISpeakAnalytics.

3. Algorithms need managers, too. Leave it to the machines, and they'll optimize on click-through rates 'til kingdom come - even if customer satisfaction takes a nose dive. That's why people must actively manage marketing algorithms, explain analytics experts in the latest Harvard Business Review.

4. Nonreligious children are more generous? Evidence shows religion doesn't make kids more generous or altruistic. The LA Times reports a series of experiments suggests that children who grow up in nonreligious homes are more generous and altruistic than those from observant families. Thanks @VivooshkaC.

5. Housing-based welfare strategies do not work, and will not work. So says evidence from LSE research, discussing failures of asset-based welfare.  

28 January 2016

Everyone's decision process, C-Suite judgment, and the Golden Gut.


1. SPOTLIGHT: MCDA, a decision process for everyone. 'Multiple criteria decision analysis' is a crummy name for a great concept (aren't all big decisions analyzed using multiple criteria?). MCDA means assessing alternatives while simultaneously considering several objectives. It's a useful way to look at difficult choices in healthcare, oil production, or real estate. But oftentimes, results of these analyses aren't communicated clearly, limiting their usefulness.

Fundamentally, MCDA means listing options, defining decision criteria, weighting those criteria, and then scoring each option. Some experts build complex economic models, but anyone can apply MCDA in effective, less rigorous ways.

You know those checklists at the end of every HouseHunters episode where people weigh location and size against budget? That's essentially it: Making important decisions, applying judgment, and juggling multiple goals (raise the kids in the city or the burbs?) - and even though they start out by ranking priorities, once buyers see their actual options, deciding on a house becomes substantially more complex.

MCDA guidance from ISPOR

As shown in the diagram (source: ISPOR), the analysis hinges on assigning relative weights to individual decision criteria. While this brings rationality and transparency to complex decisions, it also invites passionate discussions. Some might expect these techniques to remove human judgment from the process, but MCDA leaves it front and center.

Pros and cons. Let’s not kid ourselves: You have to optimize on something. MCDA is both beautiful and terrifying because it forces us to identify tradeoffs: Quality, short-term benefits, long-term results? Uncertain outcomes only complicate things futher. 

This method is a good way to bring interdisciplinary groups into a conversation. One of the downsides is that, upon seeing elaborate projections and models, people can become over-confident in the numbers. Uncertainty is never fully recognized or quantified. (Recall the Rumsfeldian unknown unknown.) Sensitivity analysis is essential, to illustrate which predicted outcomes are strongly influenced by small adjustments.

MCDA is gaining traction in healthcare. The International Society For Pharmacoeconomics and Outcomes Research has developed new MCDA guidance, available in the latest issue of Value for Health (paywall). To put it mildly, it’s difficult to balance saving lives with saving money.  To be sure, healthcare decision makers have always weighed medical, social, and economic factors: MCDA helps stakeholders bring concrete choices and transparency to the process of evaluating outcomes research - where controversy is always a possibility.

Resources to learn more. If you want to try MCDA, pick up one of the classic texts, such as Smart Choices: A Practical Guide to Making Better Decisions. Additionally, ISPOR's members offer useful insights into the pluses and minuses of this methodology - see, for example, Does the Future Belong to MCDA? The level of discourse over this guidance illustrates how challenging healthcare decisions have become.  

2. C-Suite judgment must blend with analytics. Paul Blase of PriceWaterhouseCoopers hits the nail on the head, describing how a single analytics department can't be expected to capture the whole story of an enterprise. He explains better ways to involve both the C-Suite and the quants in crucial decision-making.

3. The Man with the Golden Gut. Netflix CEO Reed Hastings explains how and when intuition is more valuable than big data. Algorithms can make only some of the decisions.

4. Embedding analytics culture. How do you compare to the Red Sox? Since Moneyball, clubs have changed dramatically. Is it possible baseball organizations have embedded analytics processes more successfully than other business enterprises?

Subscribe by email