Natural Experiments
You are a marketer and spend a lot of your budget on SEM. You spend on your own brand keyword, even though the competition is scarce. You have started wondering if you are wasting your budget bidding on brand traffic. Won’t users who are searching for your brand keyword, click on the organic result and reach your site anyways ? After all, your own result is the top result in all search engines.
You decide to switch off the brand campaigns for a week and observe if the SEO traffic increases to balance out the dip in SEM.(only for brand keywords).
The week you switched off brand campaigns, your SEO traffic increased for brand keywords and compensated for the drop in SEM traffic. Now is that reason enough for you to conclude that it is better not to spend money on brand keywords ?
Design of experiments
While you ponder over the above questions, let’s move on and describe a good experimental design for establishing a casual relationship.
Theory recommends a randomised controlled experiment in which:
- There should be a control group and a treatment group
- Users should be allocated to one of these groups randomly
- The intervention should be applied only to the treatment group
- The experiment should be run for a duration that covers any cyclical trends
- The metrics measuring the impact of the intervention should be checked for statistically significant differences between control and treatment groups
For folks in the software world who are used to running A/B tests, this is an easy to implement and obvious method. However, many a times controlled experimentation is extremely difficult to implement or unethical.
Consider the below study (as described in Wiki):
An aim of a study by Angrist and Evans (1998) was to estimate the effect of family size on the labor market outcomes of the mother. For at least two reasons, the correlations between family size and various outcomes (e.g., earnings) do not inform us about how family size causally affects labor market outcomes. First, both labor market outcomes and family size may be affected by unobserved “third” variables (e.g., personal preferences). Second, labor market outcomes themselves may affect family size (called “reverse causality”). For example, a woman may defer having a child if she gets a raise at work.
Now it is quite unimaginable to think that a scientist is able to dictate a group of (randomly assigned) families to have three children and study its effect on labour market outcomes over a period of time.
The authors observed that two-child families with either two boys or two girls are substantially more likely to have a third child than two-child families with one boy and one girl.
The sex of the first two children, then, constitutes a kind of natural experiment: it is as if an experimenter had randomly assigned some families to have two children and others to have three. The authors were then able to credibly estimate the causal effect of having a third child on labor market outcomes. Having first two children were coded as an instrumental variable for estimating the impact of more than two children on labor market outcomes.
When nature isn’t kind enough to run an experiment
There are cases where such random assignments do not happen naturally. For example, let’s say we want to measure the health impact of smoking.
In this case, the health impact could be due to smoking or due to another variable that predicts/causes smoking (economic status, not being health conscious, risk taking etc.)
(Another example would be — you launched a new feature for a B2B product and your customers need to enable it and use it to realise its impact. Your marketing team has a list of customers in their panel of early adopters, and they usually push adoption through that channel. How you do you measure impact of this feature? )
A technique used to remove selection biases from observational data is called propensity score matching.
- Here we predict something called propensity score which is the probability of being part of the treatment group (or for this specific example the probability of being a smoker) based on other variables. (eg. income, education)
- Then we compare the health outcomes of this group (smokers) with those from non-smokers who match each other on the propensity score
Now coming back to the original question. Is the lift in SEO traffic compensating for the drop in brand SEM campaigns ?
- The ideal way to measure it would be — by turning off the the ad on brand keyword for random X% of traffic and measuring the drop/lift in total clicks(including organic) to the site from this keyword. Google provides support for campaign experiments, but I am not sure if the above use case is covered.
- A less ideal method is — instead of turning off the brand campaign for a day, you turn it off and on in hourly frequency ( or the least possible time duration) and compare the traffic when it was on to when it was off. If you are using API-s to programmatically control your campaigns, this should not be difficult.
Similar articles written by me:
Vertical Strategy for Search Engines
On what goes into a product roadmap and the perils of a static perspective: Zooming out, drilling down and changing hats
Managing machine learning based products and model evaluation metrics: How to evaluate your model?
A brief note on causal inference for product managers
How to add machine learning to your product?
On tracking and measure product KPI-s: Success Metrics
On learning continuously: About Curiosity, Learning and Eigenvectors
Sources
I wrote this article as an introduction to these ideas. I was inspired by some of the scenarios explained by few data scientists and economists in the links given below
https://www.quora.com/When-should-A-B-testing-not-be-trusted-to-make-decisions/answer/Edwin-Chen-1
Wikipedia: https://en.wikipedia.org/wiki/Natural_experiment
https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/