Data Analytics

Jump to: navigation, search

Contents

This Page Is Currently Under Construction And Will Be Available Shortly, Please Visit Reserve Copy Page

This is an attempt to explain the arcane art of Data Analytics in reasonably plain English.

Introduction

Few non-statistically minded people understand data, modelling and analytics but any digital marketer, be they a creative, planner, techie or account manager would benefit from a basic understanding the basic principles.

Treating the analytics department as a 'black box' or incomprehensible silo is extremely risky. Analysts often struggle to explain what it is they do and what they have created, and non-specialists are often simply not interested.

Anyone with a PC and software package can get the data to fit a model. But good data needs to be caressed and cared for. Analytics is just as much an art as a science and that is why you would benefit from joining in.

The internet is of course producing lots and lots more data. As a result (and due to the pressures of recession and procurement departments) measurability and accountability are being increasingly demanded by clients.

We are now working in an environment where we can continually optimise activity, observe behavioural activity and in many instances infer attitudinal dimensions from customers on line behaviour. This is such a rich area. It means we can link “think” to “do”. Behavioural targeting is an obvious application of this.

Analytics for agencies

There are basically four ways we use analytics.

  1. Campaign metrics and dashboards
  2. Web analytics
  3. Segmentation work
  4. CRM optimisation, propensity modelling and forecasting

The first two most of you will be familiar with. We will focus on the last two. But first we need to cover some basic principles which apply to nearly all data work but, more importantly, to how you think about the problem you are trying to solve.

A data model on paper

First, a bit of language … models can be written like this

Y func [Xi …. Xn]

Y is the dependant variable, the thing you are trying to model. The Xs are the explanatory variables.

e.g. Sales of cans of lager is function of [price, advertising, seasonal factors]

Guinness is the Y, everything else are the Xs. Seasonal factors could include anything from Christmas to a World Cup tournament.

You should try writing one of these before you speak to analytics specialists. It may take you all of 10 minutes’ discussion with someone who knows the market; more of this down the page.

Some key principles of data modelling you should understand

Data will not tell you causality

Whilst we know people eat ice cream when it gets hot, a software program does not know this. If you just look at data correlation you could infer that if we eat lots of ice cream the weather changes. And of course you could build a reasonable model “showing” that ice cream brings the sun out.

Data is rarely uncorrelated

Two sets of data will inevitably be correlated either negatively or positively. Zero correlation is an extremely rare event, a perfect storm. Of course in most cases any correlation is coincidence. This does not mean there is a relationship.

e.g., 5 year old child's shoe sizes will be highly correlated with UK house price inflation since 2002. This does not mean there is a relationship. This is a trite example but based on a very famous economics example linking cumulative rainfall to inflation. (Google “Professor Tinbergen” to find out more).

A more real example: if you did not know better you could easily attribute a decline in sales of tinned soup to less shelf space in-store, not the increase in fresh chilled soup products available. We have the causality back to front. Knowing your market is key

Prediction is a key test of accuracy

If your model can predict results then it is a good model - with some caveats.

First, test it on a subset of data you have NOT used to build the model. If it serves as a good predictor then you have something that represents the market. In segmentation work if you can predict fairly accurately which segment someone is going to be in from one or two pieces of advance data, you have struck gold.

Two, it may be of little strategic value if it is a “pure time series”. (In statistics, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the Nile River at Aswan).

e.g. - Credit card spend could be modelled as: Monthly spend = a. [Spend last month] + b. [spend month before] This model will be very accurate; there will be a good fit. It may even be useful for spotting decline in usage and resultant re-activation programs but it tells you little about customer behaviour or even potential value.

A model like this would be much more informative:

  • Monthly spend = a [income] + b [number of credit cards]. + seasonal factor + d. [av cash withdrawal value]

With a model like this you could start to predict customer spend and potential value. It has a bit more “why” in it.

Observe your errors

The error is the difference between the actual value and the predicted value. Think of it as a tendency to either over or underestimate. If there is a pattern in your errors there is a trend, which means you have probably missed a factor. So go back to your conceptual model. Have another think, rather than just “force” the “data”. Most statistical tests utilise the error terms and measure “goodness of fit”.

Structural breaks change everything

Models don't last for ever. Mortgage applications would be a topical one. The credit crunch, house price deflation and the banks’ changing policy will means that older models will have stopped working. (Any model of UK house prices will probably be breaking down round about … early 2008).

Make sure the model is usable

Many relationships are not linear so to get a better fit an analyst might transform the data, perhaps expressing it in algorithms to reflect changes in the data rather than the actual values themselves. Whilst it could give a good fit a modeller should have a view as to whether this is a representation of reality or just a nice statistical exercise. (This is also true of segmentation work. More of this later).

Develop a conceptual model of the market before you do anything with the data, do this with a data person if you like. It could be as simple as something like this:

  • Company spend on IT = Func [industry SIC, number of PCs in the office, number of employees].

Then look for that data or perhaps test the model on market data, or a piece of research data before you go through a massive data collection exercise. Your analytics friend will run a few tests, eyeball the data, look for some basic correlations between the data sets or perhaps lags of the data sets if you are building a time series model.

The biggest problem is inevitably getting hold of the data or suitable proxies (a proxy is a reasonable substitute for data you cannot get hold of).

Some other things to bear in mind

Simple models need only apply

The models you build are relatively simple. They are not Bank of England models attempting to forecast changes in GDP. An analyst will probably tell you that much of the explanatory value lies in one or two variables.

The statistical tests can be confused

There are host of statistical tests to validate your analysis. Occasionally there are problems if

Y func [Xi] BUT Xi is a func of [Xj]

Give this some thought as you write out your conceptual model.

An economics example: when interest rates go down, the FTSE 100 tends to climb. They are well connected. Putting both sets of data straight into a model, without a little transformation, could confuse things and the statistical tests. As they say, if the problem persists consult your analyst.

You can't observe the dependant variables

If you want to model happiness (and economists do, it's really interesting) you have to create the dependant variables, i.e. a “happiness score”.

Before you start segmentation work no one tells you what the segments are going to be, they don't appear in data sets, you have to create them. Segmentations based on cluster analysis allow you to specify the number of clusters it will produce.

You can then analyse a number of versions of the model to see which provides the “best fit”. How does it “know” it is a good fit?

If you imagine it as a scatter diagram, the software is effectively trying to draw a series of circles round the data so that as many observations are within the (few) circles and the distance between the centres of the circles are maximised.

Segmentation (cluster analysis) is very much art, test and play. You could argue it is the least robust tool in the statistician's armoury.

Keep it simple

If you only have one product and three basic messages there is not much point in building a 12 segment model. You can “train” the model to produce three to five segments.

“If it moves, measure it” is not a great mantra

Don't fall into the “we measure it because we can” trap. Web analytics is where there is a temptation to measure everything that moves but what we really want to know about is - visits, repeat visits, engagement, drop out, purchase, re-purchase - not hits, impressions, time on site etc.

Make the analytics work for you, not against you.

Engagement may be what you are really interested in. An engaged prospect could be defined by a repeat visitor who spends at least five minutes beyond the home page. Now that makes it much more interesting and usable.

The art of analytics - transforming data

Numbers are not always what they seem.

e.g. A modelling program will assume that 20 degC is twice as hot as 10 degC, but is it? In terms of ice cream sales, you may find the model works better if you assume 15 degC is your base zero and temperature is relative to that so 20 degC becomes 5, 19 degC is 4 and so on. An analyst will need to play with the numbers, do a few transformations, experiment and generally just eyeball the data (that means look at it on a chart).