“The holy grail of marketing is to proactively pounce upon every individual customer opportunity by predicting beforehand who will respond and to preemptively intervene each customer loss by predicting who will defect” – Dr. Eric Siegel Author, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die

Any retailer would try to maximize the revenue out of individual customers. If you have visited the common stores of the olden days, just over a block away, you would be amazed to find out how the shop keeper tries to connect with the customer through his small talk. Your neighborhood storekeeper has a better understanding of the customers. He would try to up-sell and cross-sell relying on his instincts to increase the amount spent by each customer. Cut back to the 21st century where the modern e-commerce sites would have highly targeted recommendations for each customer which points to a high level of personalization. It is a clear indication of the use of analytics to increase the revenue from each purchase the customer makes.

Collecting data

Before any predictive model can be built, the first step is to collect the data from the customers. Any model built to analyze the customer; needs to collect 2 kinds of data: demographics and psychographics

Demographics: Demographics pertain to the quantifiable properties of the given population. Customers may be grouped by similar variables, such as age, education, income, occupation, address (geographic location), gender, industry, employees count, and years of experience in business, services offered, products interested or other criteria.  We can get these details from the government (Census Bureau) and from other sources which will eventually help us to be reported from these studies.

Psychographics: Customers are first classified according to their attitudes, mindset, beliefs, values and lifestyle choices, which in turn include recycling, hobbies and other interests, attitudes and perceptions. These variables provide insights into the nuances of customers buying process and can be leveraged to increase our sales and market share. The seller or the retailer or the manufacturer who better understands these complexities are better occupied to leverage it for maximizing revenue. However getting this information would be tricky as customer’s preferences may change over time and this type of information must be collected frequently.

For example, a manufacturer who is trying to identify the product which is to be launched for a particular market will consider the characteristics and behavior of the customers in that particular market. The manufacturer needs to know demographics and psychographic details of the customers. Demographic details would include where the customer would buy an item, the channel through which he buys and the potential to buy. For example, the Psychographic details for buying golf balls depend on the price range, desire and interests.

We can collect demographic details about the intended customer from the Census Bureau, government and other research sources that track customer detailed information. Some amount of this data would have already been captured in the CRM system or even the ERP system. Psychographic information can be collected using surveys and other forms of secondary research aiming at information specific to the intended customers. The surveys which are conducted frequently by the marketing team will help us in getting this information.

Building the Model

After getting the information about the customers, the best predictive models are built by incorporating both types of data – structured and unstructured. In most cases Structured Data is readily available from the data warehouse and also the legacy systems. We can easily extract information about the customers from the data warehouse.  Unstructured Data can be collected from different search engines like Google, Bing, and Yahoo etc. We can make use of automated tools which will capture the customer’s most visited pages and click stream data which gives the results of the clicks on the pages and the most frequently searched products. The social media is another effective means to collect the data. It is always the social media where the millennial customers heap praise or vent out on the products and services. Listening to these data and leveraging it would ensure a robust model and would ensure best ROI on the analytics project.

Generate a predictive model based on the historical data and unstructured data to predict the future values according to it. The first step here is to get the customers potential through analytics. The customer lifetime value can be arrived by using the predictive models to identify the potential of each customer.

Decision Tree algorithm uses CART algorithm to create a model that predicts a value is either true/false (Classifying categorical values). This can be applied to identify the type of items which a customer is likely to buy. Cluster-K-means algorithm is mainly used to group the objects into clusters. This method can be used to identify the categories, which are best suited for each customer. To make this model more robust Cluster algorithm can be used to group the customers and further predictive models can be built on these groups. Cohort analysis can be performed on each cluster to further understand the needs of different customers.


Design the model against the samples of the collected data– The predictive model can be applied to the additional data samples based on historical data. This gives a good indication of how accurate the model is. It can then be further trained to improve the accuracy.  With the unknown outcome, apply this and we will get the end result.

Actions to leverage the insights

By knowing the customer’s spending and service habits and the potential to pay for these items will help the companies to take an appropriate decision.  Information regarding spending patterns can be retrieved from annual customer expenditure survey.  With this information, we can get customer spending areas and the trends. The customer spending areas will, in turn, help us to identify what the customers are going to purchase currently and what they will be going to purchase in near future. It would also help in category management as it would give a fair idea regarding the product categories the customer is interested in.

Personalized product offerings to the potential customers ensure that the investment into building these prediction models derive the desired returns. Further, this would enable the retailers in retaining their valued customers and in turn maximize their revenues.

Latest posts by Vinutha Pinthepu (see all)