In statistics, a hypothesis is an assumption we make about a population parameter such as any quantity or measurement about this population that is fixed and that we can use it as a value to a distribution variable. Typical examples of parameters are the mean and the variance.
You might be wondering what this stuff has to do with you as an engineer, a salesman, a marketer or a customer support specialist. The truth is that these statistical tools are just a different approach to practices that you are already following in your work. Let’s see some examples.
- Marketing: What number of e-mail do we need to send before someone signs-up for the product?
- Product: How long the onboarding process takes for a new sign-up?
- Sales: How long it takes for a lead to turn into a paying customer?
- Customer support: How long it takes from the moment a new ticket is created until the first response?
All the above questions can be easily translate into “statistical” hypotheses or questions.
- What is the mean number of e-mails that we need to send before someone signs up to our product? We can define the population here as the recipients of our drip campaigns.
- What is the mean time between the first login of a user and the any other event on the product? The population here is our customers of course.
- What is the mean time that it takes for a new lead to become a paying customer? The population here is the leads that we have in our CRM system.
- What is the mean first response time to new tickets? Where the population in this case is the tickets we have created so far.
It is actually quite easy to do the translation between the everyday problems that anyone in a business seeks answers for, regardless the position, and the language of statistics.
In statistics what we can do is to test our assumptions or hypotheses that we made for measurements like the ones we described earlier. For example,
- The mean number of e-mails required to turn a lead into a customer is five.
The above assumption, or its negation, is a valid hypothesis that we can use statistical tools to investigate. The way to do that is by using a statistical technique that is called Hypothesis Testing.
To be more precise, the goal of hypothesis testing, is to determine if there’s enough evidence in a given data set to conclude that the assumption we made stands for the whole population. So, although for the above assumption with the e-mails there will be cases where a smaller or a larger number of e-mails is required, by doing such a test we can be confident that the assumption we made is likely to hold for the whole population.
So, can we be confident that overall it will take our company around 5 e-mails to convert a new lead into a customer? To answer this question using statistical tools we need to follow the next four steps:
- State the hypothesis. As an example we did this earlier with the number of e-mails.
- Formulate an analysis plan.
- Analyze the data we have available.
- Interpret the results.
For the formulation of the analysis plan the steps involved are:
- Select a significance level. This a value between 0 and 1 but a common number that is used is 0.05
- Select a test method which involves a test statistic and a sampling distribution. During the analysis of the available data, the test statistic is calculated and together with it we calculate a P-value
A P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming that the hypothesis we made is true.
Finally we interpret the results by comparing the p-value with the selected significance level. Usually when the p-value is less than the significance level the hypothesis is rejected.
In a summary and without the technical details, what we are doing with hypothesis testing is the following,
- Make an initial assumption
- Collect data
- Based on the data decide where the assumptions stands true or not
This is more or less something that everyone does as part of her job regardless the tools used, here we just use a more formal and scientific way of doing it.
Using such a statistical methodology doesn’t guarantee that we can actually have an answer to our questions. More precisely there are errors that can occur during the process.
- Type I error. When we reject the hypothesis but it is actually true. The probability of committing such an error is called the significance level.
- Type II error. When we do not reject the hypothesis while we should. The probability of not committing a type II error is called the power of the test.
We can control type I errors by adjusting accordingly the significance level and there are tools to perform power analysis to decide if the data we have contain enough evidence to perform these tests. Power analysis is a very powerful tool and we’ll discuss it in another post more thoroughly.
We barely scratched the surface of the tools that statistics offer but hopefully this post is a good starting point to investigate further how these tools can be used for the problems that any professional has to deal with. There are many details that someone has to address to make these tools to work properly and we’ll go through these in future posts.