Four key areas to excel in data and tech

Technology lies at the heart of any publisher’s transformation from an ad-centric model to reader revenue. Having worked with dozens of publishers over the past year, FT Strategies has identified four areas in which we think having the right philosophy is crucial: reader knowledge, modelling, testing and marketing.

1. Reader knowledge: join up everything you know about readers and their behaviour

Reader knowledge is crucial for digital subscriptions success. Customer-centric metrics (e.g. RFV) and predictive scoring (e.g. propensity to churn) require the ability to stitch together everything we know about a reader, from demographics to behavioural data. What does this mean in terms of your data and technology stack?

Every publisher generates and stores reader data in a multitude of tools and platforms. For example, each reader’s usage of the website and app – visit frequency, average time on site – will probably be tracked within an analytics service (e.g. Google Analytics). Meanwhile, their billing address and subscription tier may be stored in a separate database or CRM. Data about readers may also be stored in a marketing tool (e.g. interactions with a promotional email campaign). Comments readers make on articles might be stored in the website’s CMS. Readers will likely have been included in a cohort that was involved in A/B tests, and this information may live within a separate testing tool.

This fragmentation typically leads to challenges in building a “joined up” picture of individual readers – which can stand in the way of every other topic area in this article (modelling, testing and marketing). It is therefore critical that publishers are able to join up all of these records. This is not a trivial operation – each of these systems may store data in very different ways, and most are capturing new information in real-time.

Any publisher that wants to build a true picture of its readers will therefore face a data engineering decision. At a high level, there is a choice between two philosophies:

 

  • The traditional approach, which has been used for decades: ETL (Extract, Transform, Load) into a structured Data Warehouse before performing any analysis
  • The newer approach, that offers increased speed and agility: ELT (Extract, Load, Transform) into a Data Lake, and add structure at the point of usage

An analogy for the difference here might be found in how you manage your own email inbox. Are you the sort of person that files every email you receive into different folders, taking up more time now but making it easier to find things later? Or do you (like me) let the emails pile up in an unstructured mega inbox, and find what you’re looking for later? There are benefits to each approach, and in reality, most organisations will use a mixture of both ETL/warehouses and ELT/lakes.

For most of the publishers I have worked with this year, a proper data infrastructure project along these lines is in progress (on at least in the planning stages). However, for many it is unlikely that this infrastructure will be live in the short term. Rather than lose valuable time waiting for Data Nirvana, we have typically recommended that publishers create a temporary view of the reader (using a tool such as Google’s BigQuery) and use this to experiment with reader-centric metrics and predictive models.

2. Modelling: get the right data in place, and start simple

Advanced analytics and machine learning techniques are now within reach of any business, thanks to a growing range of off-the-shelf tools and open source software. One common application among publishers is ‘propensity’ algorithms. These statistical models allow publishers to predict the likelihood of specific readers to act in particular ways (e.g. to subscribe, to churn, etc). Best-in-class publishers calculate propensity scores in real time, and use them to personalise marketing, paywall tightness and preventative churn activity. Models are also typically used for content recommendation widgets.

The good news is that many publisher tools already have built-in predictive models. For example, Piano offers a LT[x] (Likelihood To [act]) propensity score that can be used to power a dynamic paywall. Of course, models such as this can only be based on the data that the tool knows about – hence they importance of building a ‘joined up’ picture of the reader. Equally, once you’ve calculated that a particular reader has a high likelihood to subscribe – that information needs to be accessible to other systems in order to implement it elsewhere (e.g. to send onsite messages or marketing comms).

Other than focusing on getting a consistent view of each reader, publishers early in the journey would do well to start small. For example, it might not be necessary to spend a lot of money on tools and a vast data science team. Simple regression models can be calculated within existing tools (even in Excel if very small scale), and machine learning methods are becoming commoditised (via tools such as BigML or Microsoft Azure). And once data science teams do mature, the FT has found that the best tool is whatever your data scientists are most comfortable using – typically open source programming languages such as Python and R, and a vast universe of libraries and frameworks they enable.

3. Testing: widen the scope of what can be tested with a mix of in-house and off-the-shelf tools

Experimentation is vital for any business moving towards a customer-centric model. Every publisher I have worked with has some capacity to run A/B tests, but there are almost always limitations and challenges. For many, changes to design and content are easier to test than changes to core functionality. In some cases, certain parts of the publisher ecosystem may be easier to test (e.g. testing on the paywall may be straightforward, while the homepage proves more complex). Tracking campaign impact is typically challenging – particularly observing longer term impact on engagement, which requires an ability to track cohorts of users after a test has ended.

Some of the main limiting factors around performance and test coverage can be addressed with a hybrid approach (testing on the client and server side). This required investment in the right tools, and successful publishers tend to strike a careful balance between ‘buy’ and ‘build’. Tools such as AB Tasty and Optimizely can offer performance benefits that in-house teams will find it hard to match. However, these off-the-shelf products may not meet all every need. Publishers including the FT also develop their own software to address their own unique challenges.

4. Marketing: bring a joined up message to readers via modern marketing automation tools

When it comes to marketing, the goal is to bring a consistent message to each reader across all of the relevant channels. What you are trying to avoid, for example, is a reader seeing a banner on the website offering a specific discount, then receiving an email with a completely different offer. Another important capability is being able to take readers on a campaign journey (and customise it based on their actions), rather than being limited to one-off campaigns. Tracking progress throughout the funnel is key.

Luckily, publishers are well-served by analytics software (e.g. GA360) and marketing automation platforms (e.g. Adobe Campaign). Assuming that these tools are affordable, the only real challenge that remains is the one that underlies each of these areas – creating a joined up picture of the reader. 


About the author

John Ridpath is a technology and learning consultant. Before working at the FT, he was Head of Product at Decoded. He is experienced in prototyping, product discovery, strategy implementation and subscriptions strategies.