We trust Manifold and their experience not only to deliver on promised work but also to provide solid guidance on technology, product, and implementation process details to help us reduce the effort, cost, and friction of implementing our plans.
Abacus Insights enables healthcare plans and related organizations to deliver better care and service to members and patients. To accomplish its goals, Abacus developed a rigorous product roadmap, and contacted Manifold for help with the first step: leverage Amazon Web Services (AWS) and other technologies to develop a data integration platform.
The platform consists of connectors that collect and bring together healthcare data from disparate sources to reduce the cost, effort, and time Abacus' clients expend on exchanging healthcare information. The platform handles a variety of data, including:
- all administrative data, such as membership and claims
- clinical and other healthcare data coming from EMRs
- consumer data
- member engagement and other member services data
Abacus now has a platform that acts as a solid foundation for future connector development and, more broadly, for their overall product roadmap. Manifold helped develop the platform such that additional connectors can be added from multiple data providers, as well as multiple sources from each provider, with a diverse set of requirements across sources.
This platform, along with the implementation of several specific connectors, is now running in a multi-tenant environment in Abacus' AWS environment. The available suite of AWS managed services allows for easily adding capabilities and addressing scalability challenges—enabling faster time to market as well as future growth.
Abacus' desired ETL framework had several requirements:
- Create a connector marketplace—the ability to build configurable connectors that can be reused across clients, resulting in lower cost of on-boarding new clients and reduced time to take a client live.
- Leverage the Abacus Domain Model—transform, cleanse, and aggregate incoming data sources into a comprehensive domain model to enable a breadth of use cases across diverse data sources, resulting in lower cost and faster time to market for reporting, advanced analytics, and machine learning use cases for Abacus' clients.
- Provide data management platform capabilities, including:
- Data quality capabilities to ensure the data stored in the domain model is useful for consumption of current and future use cases.
- Data lineage, data catalog, data governance, and other key functions to track changes, search/explore data, and govern usage.
- Data distribution with late binding principles to deliver data for diverse analytic and operational consumption use cases.
- Support for third-party tools that facilitate report generation, data science, advanced analytics, and machine learning use cases.
Leveraging the Cloud
Manifold subscribes to a Lean mindset. The team focused their efforts on a robust solution that could be brought to market quickly while laying the foundation to build additional capabilities in the future. They worked with the Abacus team to select smaller data sources for initial go-to-market, so that Abacus could rapidly deliver and validate the value of the solution. For this smaller scale solution, the system primarily used Python on EC2.
To enable scaling out, each step of the ETL framework sourced from and loaded data into S3. This architecture allowed the team to change individual components of the framework and data pipelines as feature and scaling needs changed over time, while maintaining a relative simple interface to S3. For example, as the data sources scale in size, the team can modify a single-threaded transformation step and move it to Amazon EMR, while preserving the input and output interfaces.
AWS Secrets provides key and password management for credentials as appropriate for accessing each data source.
For data delivery, the platform supports two methods of receiving data:
- A bulk data transfer using AWS Secure File Transfer. This allows Abacus’ clients to deliver bulk data in a familiar pattern, while directly integrating with the S3 backend.
- For realtime streams, the Manifold team created an API that delivers data and preps it for batch processing. This method uses API Gateway, backed by AWS Lambda as the API endpoint. The API writes into an AWS Kinesis stream, which triggers the connectors at certain large batch intervals. This process smooths out load spike during the day and delivers predictable scale to the connector framework.
The overall system is orchestrated by Airflow, an open source Apache project. With Airflow, the team could provide Abacus and their end users an easy-to-understand interface and include a breadth of capabilities out of the box. For example, it was possible to display the progress of each step in the connector pipeline and the overall progress of the pipeline without any additional code. The platform is deployed to Amazon EKS, which is used for repeatable deployments and for scaling up using EC2 AutoScaling groups.
The analysis output from each step of the pipeline includes data quality, data lineage, performance and debugging information. The output is in JSON and can then be loaded into AWS Glue and Athena for querying. Abacus uses this capability to provide reports to their customer's data governance and operations teams.
- Abacus’ new connector platform efficiently brings together healthcare data from disparate sources
- We chose smaller data sources for initial go-to-market, in order to rapidly deliver and validate the platform
- We used AWS managed services, including S3 and EMR, to address scalability