How to Calculate Cohort Lifetime Value (LTV) with Excel or R

Eleni Markou

One of the most important SaaS metrics of your customer acquisition strategy is the Lifetime Value – or LTV– of a customer. Having a solid customer acquisition strategy is an essential component for any business as it is highly correlated with its viability.

Thus, being able to calculate an estimation of the average gross revenue that each customer is more probable to generate, can help a company determine how much money they can spend in acquiring customers and still generate profit.

Furthermore, in many cases, it is less costly, and thus preferable, to primarily concentrate on retaining the existing customers rather than focusing exclusively on generating new. In this case, the evaluation of the effectiveness of the adopted retention strategy again demands the calculation of customers’ lifetime value (LTV), i.e. the aforementioned average expected gross revenue per customer.

What is LTV?

A lot of ink has been spilled in developing various descriptions of the LTV, the majority of which ends up with mathematical formulas that are based on margin (m), retention rate (r) and discount rate (d) like the following (here):

However, this model appears to be not that realistic as it is based on a few quite restrictive assumptions:

  • Retention is assumed to be constant during the lifetime of a customer, i.e. the probability r of remaining retained remains the same across all months.
  • An infinite time horizon is assumed when calculating the present value of future cash flows.
  • The unit economics are supposed to be constant throughout lifetime which leads to a constant contribution margin.

Yet when dealing with an actual company, it easily becomes evident that none of the aforementioned conditions actually hold. Especially in early-stage businesses the size of the time periods across which you would like to calculate the LTV is month – or week – sized while at the same time the retention rate across them can vary significantly as the company’s products evolve quickly.

In order to overcome the limitations that the above restrictions create, apart from the imputed lifetime value based on some formula it is strongly advised to also evaluate the empirically realized cohort LTV.

In this post we are going to concentrate exactly on this: the calculation of LTV based on cohort analysis. When working with this type of analysis instead of examining each customer individually, we organize them in appropriate groups, known as cohorts, based on some common trait. Probably the most widely used trait is the date at which the customers started using the product.

In this case the insights one can get have to do with how the behaviour of customers belonging to different segments changes overtime. But before moving on to the actual computation, we would like to stress out the importance of having available all the right data and being able to understand them.

…the importance of having available all the right data and being able to understand them…

Regarding the first part, depending on how your company is set up you most probably need to dig into some ERP system, a customer data storage, a financial data warehouse or a subscription & payments software.

You may also have to request (or even beg if necessary) your IT team to enrich these data in order to include as many metadata as possible. For example, if dealing with an online order you can include the campaign id and the acquisition cost for each order so that to be able to compute segmented LTVs based on the campaign id.

Assuming you have collected all the data you need, you have to be able to understand them. At this point, it would be a strategic decision to make the finance team your best friends in order to make them teach you some of the financial intricacies you need to know. Finally armed with all the data and the relevant knowledge it is time to move on.

Compute LTV using Excel

So, in order to calculate the cohort lifetime value (LTV) in cohorts, the steps explained below must be taken.

Decide the right setting: Depending on the goal you are willing to achieve you have to decide the type and the size of the cohorts. In the special case of LTV, the type of cohort will most probably be a calendar. Regarding the size and range, you can choose whatever suits you best. Here we are going to assume monthly cohorts that cover a 12-month period so that the calculation of LTV will take place on a yearly basis.

Measure customers’ engagement and retention: For each cohort count the total number of new customers generated and monitor their retention during the subsequent months. This will lead you to the construction of a triangular matrix like the following:

The values contained in column ‘Month 0’ represent the number of new customers acquired during each calendar month while the rest of the columns, how many of them remained customers in the subsequent months.

Measure Net revenue: Following exactly the same methodology as we did with the number of customers, we are now going to compute the total net revenue. The result will be another triangular matrix, this time containing the total net revenue generated by the customers belonging to the specific cohort.

By dividing the above matrices element-wise we can get an approximation of the average net revenue generated by each customer in every cohort. The result is shown below :

You can then compute the average by column in order to find out “how much is the average net revenue generated by my customers during their n-th month of subscription”. Finally, by summing all the averages you get the expected lifetime value per customer.

It is worth noting that for a given cohort in any month the total net revenue generated at this cohort is divided by the total number of customers including those who may churn out during this period. This is a rational choice since the company had to pay to acquire all those customers by investing in marketing, advertisement etc and that expenditure does not disappear when a customer churns.

Compute LTV using R

However, this cohort analysis requires some manual work when done in Excel and can be become time-consuming and error-prone as the number of customers increases. For this reason, we thought that especially for those who already have coding skills, a script that automates the previously described procedure can be a real savior.

Luckily for you, we have already created an indicative script in R that calculates the lifetime value for a SaaS company using data from Chargebee that we have already exported to a PostgreSQL database using Blendo.

Import your Chargebee data into your data warehouse Sync your subscription and billing data from Chargebee to any data warehouse. Analytics-ready data with no hassle. Integrate Now

Among all the exported tables those we mostly need are the ‘subscriptions’ and the ‘lineitems’. The ‘subscriptions’ table contains detailed information for each customer including his current plan name and unit price, his subscription status (i.e. whether he is active or not), the subscription data as well as the cancellation date in case he has already churned.

On the other hand, the ‘lineitems’ table includes all the line items of the invoices issued along with information regarding the entity type (i.e. whether it refers to a recurring or adhoc charge), the amount charged, any possible discount as well as the date of issue.

In the following code snippet in R, we start by loading all the necessary libraries, connecting to the PostgreSQL  database, retrieving the data we need and renaming the columns of the table appropriately:

#==================================================
# Importing necessary libraries
#==================================================
library(lubridate)
library(dplyr)
library(reshape)
library(stringr)
library(readr)
library(RPostgreSQL)
#==================================================
# Connecting to database 
#==================================================
pg <- dbDriver("PostgreSQL")
con <- dbConnect(pg, user="", password="",
                 host="", port=5432, dbname="")
# Query the database
subscriptions_query <- "select subscription_customer_id,
             		date_trunc('month', subscription_activated_at)::date
                        from chrb_subscriptions
                        where subscription_activated_at is not null; "  
lineitems_query <- ' select chrb_invoice_line_items.invoice_id, 
                             chrb_invoice_line_items.entity_type, 
                             chrb_invoice_line_items.entity_id, 
                             to_timestamp(chrb_invoice_line_items.date_from)::date as date_from, 
                             chrb_invoice_line_items.amount/100, 
                             chrb_invoice_line_items.discount_amount/100, 
                             chrb_invoices.invoice_customer_id
                    from chrb_invoice_line_items
                    left join chrb_invoices
                    on chrb_invoice_line_items.invoice_id = chrb_invoices.invoice_id;'
subscriptions_red <- dbGetQuery(con, subscriptions_query)
lineitems_red <- dbGetQuery(con, lineitems_query)
# Rename tables' columns
colnames(lineitems_red) <- c("invoice_number", "entity_type", "entity_id", "date_from", "amount", "discount", "customer_id")
colnames(subscriptions_red) <- c("customer_id", "activated_at")

 

The two tables were then joined together so that for every line item we have available all the necessary information for the customer related to it. Additional columns were created regarding the month of subscription and the actual amount paid by subtracting any discount from the amount initially issued.

After finishing data preprocessing, the computation of the triangular matrices requires only basic grouping and aggregating operations. The final element-wise averaging and summation can be then easily performed .

# Create amount_paid column
lineitems_red$amount_paid <- lineitems_red$amount + lineitems_red$discount # discount is already inserted as negative
# Combine lineitems and subscriptions
full_data <-merge(x = lineitems_red, y = subscriptions_red, by = "customer_id", all.x = TRUE)
# Compute subscription month column 
full_data$month_of_subscription <- as.numeric(round(abs(full_data$date_from - full_data$activated_at)/(365.25/12)))+1

#==================================================
# ACTUAL LTV COMPUTATION
#==================================================
# Net Revenue triangular matrix
net_revenue <-full_data %>% 
  group_by(activated_at, month_of_subscription) %>%
  summarise(total_amount = sum(amount_paid))
# Customers triangular matrix
customers <-full_data %>% 
  group_by(activated_at, month_of_subscription, customer_id) %>%
  summarise() %>%
  group_by(activated_at, month_of_subscription) %>%
  summarise(cust = n())
# Create pivot tables
net_triu <- cast(net_revenue, activated_at~month_of_subscription)
cust_triu <- cast(customers,  activated_at~month_of_subscription)
# Create the revenue per customer by dividing element-wise the previous tables
revenue_per_customer<- cbind(net_triu[1],round(net_triu[-1]/cust_triu[-1],2))
# Compute final LTV value
LTV <- sum(colMeans(revenue_per_customer[,2:dim(revenue_per_customer)[2]], na.rm=TRUE),na.rm=TRUE)

Outro

To sum up, until now we have discussed about the importance of computing the customers’ lifetime value while accessing both the acquisition and retention strategy of a company. We have also introduced an alternative approach of computing this indicator according to which customers are segmented into cohorts based on the month of their initial subscription.

Finally, an attempt is also made to automate the whole process using an R script like the indicative one presented in this blog post in hope to change the perception about customer lifetime value in cohorts as being an obscure and unreachable approach.

Do you have other ways that you calculate LTV from your billing data like Chargebee or Stripe? Let me know in the comments below.

Ready to get started with calculating your LTV?

Import your Chargebee data into your data warehouse Sync your subscription and billing data from Chargebee to any data warehouse. Analytics-ready data with no hassle. Integrate Now

Help your sales and executive team take ownership of the insights of your payments' data that live in Chargebee.

Blendo is the easiest way to automate powerful data integrations.

Try Blendo free for 14 days. No credit card required.