Chief Technology Officer
AWS, SQL, Docker
The customer prediction tools that Manifold created have significantly impacted our business. We are able to make quick and aggressive decisions with confidence and have a focus on quality instead of just quantity.
— August Flanagan, CTO
Babylist, a leading US baby registry site, wanted to leverage the data they collected—both to make better and faster business decisions, and to become more customer-centric when making those decisions.
Manifold accelerated development of predictive customer analytics tools that help optimize day-to-day decision-making by answering questions such as the following:
- Which of my customers are likely to churn?
- How likely are my customers to activate?
- How much revenue will each of my customers generate for the company? How much revenue will this month’s set of new users generate?
- What behaviors should I incentivize in my promotion? Is my promotion working?
Using a variety of AWS services, including ECR, RDS, and VPC, we built an infrastructure that supported those tools. Babylist’s Product team can now prioritize and build the right customer-focused features, while the Marketing team has a quantitative basis on which to predict the effectiveness of new promotions.
Babylist has already noticed better customer acquisition through rapid marketing optimization, as well as faster product improvement through rapid experimentation and validation.
To succeed in e-commerce, you must have a deep understanding of your site visitors so you can optimize product features and marketing strategies for your target customers. However, the several months it typically takes to run end-to-end marketing campaigns and analyze results for the next campaign is far too long in the current competitive landscape.
Babylist needed to be able to validate marketing channels and product changes within a week. In order to address this problem, Manifold created a family of machine learning models that could estimate the total revenue a customer would generate and classify whether a user was going to add items to their registry (adopt) or generate revenue (activate). We created a custom decision framework that traded absolute accuracy for time, and allowed our client to make decisions based on estimates of CLTV and probability of customer activation.
Intelligent Feature Engineering
After assessing and cleaning our data, we wrote complex SQL queries to tie together relevant information from multiple tables across the client’s database and to generate predictive features. Feature engineering included reconciling nuanced cases, such as how to include both cash gifts—with varying dollar values—and physical goods. The most important data features included the number of items and value of items added to the registry, as well as the number of unique sessions—defined as adding registry items outside of a 12-hour period. From our prior experience with RFM (recency, frequency, monetary) analysis, we aggregated time-varying features into relevant cumulative periods, creating features from daily activity as well as features from activity over the course of seven days and multiple weeks.
We determined over 300 potentially predictive features with the client, including attributes such as area code, signup platforms, how customers added items to their registry, whether they entered a due date, and the number of independent sessions they had.
After detecting seasonal variation in the data due to consumer behavior around the holidays, as well as general trends in changing user behavior over several years, we decided to train the models on one full year of data. We were careful to exclude training data from anomalous months when Babylist was running a promotion, and to exclude features that had changed over time.
After experimenting with various models including logistic regression, random forests, and gradient boosting, as well as optimizing hyperparameters, we selected gradient boosting as the most accurate model for the client’s dataset, as it yielded the highest ROC-AUC (receiver operating characteristic—area under the curve).
We created different models for different use cases, and settled on a family of models that would allow the client to make marketing and feature decisions as soon as a day after user signup, but also provided higher-accuracy predictions for those users as more data was collected.
For the marketing department, we created monthly revenue forecasts based on features with 30 days’ worth of data, and trained on a year’s worth of data prior to the month of prediction. Using this approach, we were able to create high-accuracy models (ROC-AUC = .92) that estimated monthly revenue to within 5% of actual revenue. Once trained, the model accurately predicted who would save items for purchase, who would make eventual purchases, and how much revenue those would generate.
Manifold’s commitment to knowledge transfer ensures that our clients see value in our work well beyond the end of the engagement. As part of our training process, we worked closely with a BabyList data scientist to carry out experiments by running data, feature, and model evaluation in Jupyter notebooks. We walked through several examples, evaluating our models with five-fold cross validation, and explaining feature importances, separation of posterior PDFs, picking operating points on the ROC curve, and the implications for TPR (true positive rate) and FPR (false positive rate).
Among our deliverables was a private Docker repository created in a client-owned Elastic Container Registry (ECR) instance. Tagged production images were created for inference-time execution and periodically pushed to the ECR repository for ease of deployment. The image contained an out-of-the-box prediction engine that could be run with a single command and user configurable parameters. The container-based deployment leveraging ECR ensured that we could seamlessly and reliably deploy the engine to the client’s AWS production infrastructure after testing in Manifold’s QA environment. In addition to the baked inference images, we delivered a GitHub repository that contained the prediction engine source code and Dockerfiles used to build the development and production images.
Once the prediction engine image was pushed to ECR, we worked with the client’s operations team to determine an appropriate location where the inference containers could run within the production infrastructure. The team worked with the client to map out the topology of the production Virtual Private Cloud (VPC) and identified a private subnet that was currently being used for other batch jobs (accessible via bastion host). As an initial approach a new EC2 instance was launched in this subnet and was associated with an existing Security Group that was used for allowing inbound traffic to a Postgres production RDS instance. This was the client’s first use of ECR, and so we helped to create a custom IAM role assigned to the EC2 instance for read-only ECR access to the specific repository of interest. Figure 1 shows a high-level overview of the infrastructure for running the nightly prediction service.
With this infrastructure in place, the team wrote a nightly cron job that runs the inference image in ephemeral containers and executes the parameterized prediction script. The prediction engine consisted of twelve distinct models. The prediction script:
- Read new user data from a database
- Pulled versioned model artifacts from S3
- Generated various prediction scores for each user using each of the models
- Wrote the predictions back to the SQL database
The predictions were then available for downstream use by other analytics services to drive strategic marketing decisions.
Created a fully functional, out-of-the-box prediction engine that could be run with a single command
Created ML models to estimate customer lifetime value (CLV), and classify whether a user would adopt or activate
Created monthly revenue forecasts that estimated monthly revenue to within 5% of actual revenue
Streamlined deployment of the new infrastructure with AWS tools