RKG has long been interested in the question of online to offline spillover. We’ve also long been critical of the sloppy, ill-conceived tests that have misled many advertisers on this spillover in the past. We’ve recently participated with a few of our retail-chain clients on more carefully designed studies to get a more responsible answer to this question.
The Conventional Wisdom
You hear numbers thrown around all the time about the online to offline spillover factor. I’ve heard that for every order tracked online there are 4, 6, 7 — I even heard 10 once — offline orders driven. All of these numbers are misleading and often are cited by people who have a vested interest in leading people to believe online advertising is even more valuable than it is.
It’s not that the numbers are made up — they just don’t mean what some folks imply that they mean. For instance, let’s say you’ve based your spillover estimates on customer survey data, in which x out of 10 people say they do research online before they shop in stores. We know from this survey that online activity impacts offline purchases — but let’s consider what we do not know.
The survey data do not suggest that paid online advertising specifically impacted their buying decision. Nor do we know if customers ultimately bought the brand they researched, or whether they bought it from the reseller of that brand on whose website the research was done.
This is not even to mention the fact that consumers are notoriously wrong in self-reported behavior assessments. My own survey of… well, everyone I’ve met in the last 10 years… would suggest that no one ever clicks on the paid search ads; Google’s market cap of ~$300 billion suggests otherwise.
Other “studies” have taken paid search specifically into account, but then failed to separate brand search from non-brand. This is a very important distinction. After all, people searching for store hours, directions, in-stock inventory, etc., in conjunction with your brand name are wonderful — the problem is you likely can’t buy more of those people through search because your brand ad is already at the top of the page.
Leaving aside the question of whether all that traffic would go to the organic listing absent the ad, the fact that you can only spend more on the non-brand stuff suggests that’s where the focus needs to be. Thus, failing to distinguish between brand and non-brand search traffic can give the false impression that spending more on non-brand search will generate the same generous offline spillover.
So, how do we measure the online to offline spillover effects in a way that allows us to take meaningful, ROI-positive action? We’ll talk about two different approaches and the strengths and weaknesses of each.
Approach #1: Indirect Control-Test Method
RKG partnered with Access 2 Insight to research this very question for a large brick-and-mortar chain with a particularly strong regional history and footprint. For each test, Access 2 Insight created geographic test groups counter-balanced by control geographies with similar trends and levels of historical sales.
Together, we determined the scale necessary to be able to read results in a reasonable period of time (3 weeks of aggressive spending followed by 4 weeks of return to normal levels for the test group; 7 weeks of normal spending for the control group), and the appropriate level of incremental investment to generate that volume.
We carefully constructed 3 different tests:
- A geo-targeted push across all keywords looking for a lift in overall store sales
- A push across specific categories to look for category sales lift within the test stores
- A combined category and geo-targeted pushes targeted to the area of the country where they have widest store footprint, the longest history and hence the greatest brand recognition
Statistically significant lift over the control was found in the third test, but not the first two. The lift in the third test was generated cost-effectively when factoring in both incremental online sales and the lift measured in store, and balanced against the change in ad spend.
The other two tests showed no statistically significant lift. This may suggest the power of brand: greater familiarity, comfort with the brand and convenience of locations creating conditions necessary for material lift. Or, the lift created in the first and second tests were simply too small to detect given the size of the spillover sales relative to baseline offline sales numbers.
The smaller the lift, the more data required to identify the lift and separate it from noise. In this test, we did not drop spend in the control groups to zero, so we only measured whether paid search spend above and beyond standard levels generated incremental in-store sales.
Strengths Of Methodology
Carefully set up and measured control group tests are the gold standard to determine incremental benefits of everything from drug efficacy to education to advertising value. Analysis of Covariance (ANCOVA) controls for extraneous variables impacting the test, giving marketers confidence in the findings. Tests should be repeatable and indeed should be repeated.
Weaknesses Of Methodology
Test design, execution, and analysis is not cheap even without the media cost. The ad push must be large enough relative to offline marketing efforts to have reasonable hope for reading the results, meaning the push is large and therefore the cost of failure to create lift is high. Even with results showing sales lift, the optimal spend increase cannot be determined without additional testing. The test should also be repeated for the sake of confidence and to measure changes in customer behavior, adding further to these costs.
The challenges of this approach are endemic and not a product of the analytics partner. Access 2 Insight was a terrific partner to our client and to RKG, and we look forward to working with them again down the road.
Approach #2: Direct Measurement
POS coupon tests have long been used to try to gauge spillover, but the fact of the coupon incentive, and the fact that we have no way to estimate with confidence the fraction of the total that the coupon users represent, means these tests have sent signals that can’t be meaningfully interpreted. However, better mechanisms for measuring this effect are in the works.
Many of our clients have CRM and Loyalty Program partners with extensive databases of users and the ability to match browsers to human beings whose offline buying behavior they can see. For a few clients, RKG is partnering with the CRM partner to directly track users who clicked on ads to future in-store purchases. Through cookie matching, we can share a unique click ID with the CRM system, which they can then match to their knowledge of future behavior (within a defined cookie window).
Challenge: The CRM partners can’t tie every user to their database of people, so we only see a sample. However, the match-rate achieved at least gives us a reasonable upper boundary to understand the full offline impact.
For every order online that is tied to a non-brand search ad, we see 0.083 offline order. The CRM partners can’t tie every user to their database of people, so we only see a sample. However, the match-rate achieved at least gives us a reasonable upper boundary to understand the full offline impact.
The match rate for Retailer A is about 25%. At most (and we’ll talk about this shortly) we could say we’re therefore only seeing 1/4 of the offline spillover, so multiplying by 4 gets us a ratio of online to offline orders of 1:0.33 driven by non-brand paid search ads. Interestingly, because the average order value offline is significantly higher, from a revenue perspective, the ratio is 1:0.5 online to offline. Really significant and cool!
For every 1 order online tied to non-brand paid search, there is 0.066 offline order measured. In this case, the match rate is 16%, so we can extrapolate the most generous ratio at this point to be ~ 1:0.4. We don’t have revenue figures in this instance, and this is the first cut of data. In the case of Retailer A, the spillover numbers strengthened as the data matured, so it’s early to say where Retailer B’s data will level out.
Strengths Of Methodology
Direct tracking eliminates exogenous factors that can skew control tests. The tracking allows advertisers to learn without changing media buying behavior, greatly lowering the cost of tests. Analysis of the data is relatively easy, requiring no stats modeling, no special testing clusters, etc.
Weaknesses Of Methodology
The fact that a person interacted with a non-brand paid search ad prior to purchase suggests a causal connection that may not exist. Your best customers may shop at your store regularly regardless of whether they liked what they saw online after clicking through. Control tests are better suited to really understanding the lift created by advertising.
Also, the low match rates force us to extrapolate to the whole universe, and extrapolation is always dicey. Is it credible to argue that the untracked majority behave the same way as the loyalty program members?
It may be generous, but it isn’t necessarily crazy — the match rates capture the CRM’s whole database of users, not just this specific advertiser’s loyalty program, so it isn’t quite as biased a sample as it first appears. At the very least, it creates an upper boundary that may be useful for the advertiser in determining their online advertising ROI targets and paid search budgets.
Google has also rolled out a version of direct tracking described here (video), which we have not yet tested. In this system, Google passes a click ID (GCLID) to the advertiser’s website and the website then stores that click ID with some personal identifier (either lead form completion, or user login with email).
The advertiser creates the matching offline conversions tied back to that known user and uploads to AdWords. We haven’t tested this one yet. It has some appeal, but the need for login information may make it hard to interpret the findings in the same way that POS coupons didn’t tell us much. What fraction of people who landed on the site didn’t log in but bought from the store anyway? Absent a reasonable guess, it’s hard to know how to extrapolate.
What do these early tests tell us that can be generalized?
No sane person — particularly no one who calls himself a marketing scientist — would try to draw broad conclusions from such limited data. There is every reason to believe that spillover rates will be hugely category/vertical dependent, and that regional strength of brand and proximity to stores will have a big impact. No doubt device will impact this effect as well.
It does suggest to me two hypotheses for further testing:
- Online advertising can drive foot traffic to stores. The ratio of non-brand actionable spillover is not 1:0 (online to offline).
- The early data from our clients suggest that the ratio might be more like 1:0.5 or 1:0.4, rather than the 1:6, 1:7 and 1:10 which others have cited in the past.
It is important for every advertiser to measure this as best they can, rather than to trust “benchmarks.” Hopefully, given the importance of the topic and with the holiday season approaching, these sketchy benchmarks might be helpful.