Why many experimenters get confidence intervals wrong Frequentist 90% confidence intervals, e.g. for the mean, cover the true mean 90% of the time. Yet, we cannot claim that a particular confidence interval has a 90% probability of covering the true mean. Put another way: The frequentist procedure generates confidence intervals that cover the... Read more 24 Jun 2024 - 10 minute read
We firmly believe you should measure the impact of all product changes, and running an experiment (or A/B test) is the most effective way to do so. However, it is not always possible to run an experiment. For example, you might not want to withhold an important new feature from some users. In such cases, an encouragement design experiment might... Read more 20 Feb 2024 - 13 minute read
Many different statistical regimes are used in hypothesis testing, and it can be easy to get lost in the array of choices. Although it’s easy to find zealots arguing that one approach is universally better (also known as the “statistics wars”), every statistical method has unique strengths and weaknesses. The wide range of nuance in data analysi... Read more 08 Jan 2024 - 13 minute read
There is growing hype around multi-armed bandit algorithms (bandits for short), and data teams increasingly wonder if they could replace experiments (A/B testing) as a more sophisticated way to make data-driven decisions. If you have not had an enthusiastic data scientist try to sell you on bandits, then surely you will have the pleasure soon. T... Read more 12 Dec 2023 - 12 minute read
The useful useless p-value In experimentation, the p-value is the least understood and most misused concept, with both the most common and egregious error stating that the p-value is the probability that the null hypothesis is true. I certainly relate, as I quite often catch myself moments before making this mistake. To better understand p-val... Read more 17 May 2023 - 12 minute read