Challenge

A public health IT company sought Manifold's help to build a fully HIPAA compliant cloud environment for a new data science initiative working with large amounts of PHI collected over several years.

Results

Within seven weeks, our client had a HIPAA-compliant data science environment in the cloud, built from scratch, that passed multiple rounds of security review.

We were able to work closely with several security teams, iterating quickly based on suggestions and feedback for the secure cloud environment. An advantage of building a new cloud environment from scratch is that we were able to build the secure enclave exactly to our client's desired specification. After security and design reviews were complete, our client was able to upload a hardware-encrypted drive of PHI data to the environment to begin EDA.

All of this was done in the client's infrastructure—meaning they now have a custom environment in house, and are well-situated to create future value with data science initiatives.

Solution

We knew that our client needed this foundation in order to perform future data engineering and data science projects. Our engagements begin with a discovery phase that includes business discovery, data discovery, and infrastructure discovery. We usually spend time with an operations leader about where to work within their infrastructure.

In this case, we were asked to build an environment from scratch given our deep experience building HIPAA compliant cloud infrastructure. This way, the client was able to fully drive the process and end up with an infrastructure that met all their specific security measures, as well as HIPAA security requirements. In addition, we designed the infrastructure specifically for data science and AI, which involve an important R&D cycle with rapid iterations, rather than for traditional business intelligence reporting.

Infrastructure as Code

We chose to create a new AWS sub-account using AWS Organizations. The separate account allowed us to work in an environment that is completely isolated from the client's existing AWS infrastructure. AWS Organizations allows for easy management and consolidated billing across all sub-accounts in the organization hierarchy.

A key benefit of creating a new account is that we were able to design the infrastructure from a blank canvas and build it exactly to a security specification approved by the client's security team. At Manifold we use Infrastructure as Code (IaC) wherever possible because of its many benefits—including traceability through code commits, peer review of changes and updates, remote state persistence, and streamlined disaster recovery. Almost every component was built using Terraform and following IaC best practices.

Networking Architecture

AWS provides a reference architecture for building infrastructure to support HIPAA compliant web applications, but it provided infrastructure for a secure web application framework that did not suit our client's specific business need. We modified the reference architecture to be specifically tailored for our client's use case while not sacrificing any of the original security features. The resulting architecture was a simpler design and had a smaller resource footprint to minimize costs.

Detailed Security Diagram

Here are some brief descriptions of the AWS services listed at the side of the diagram:

  • AWS Config Rules: Managed service that monitors your AWS resource configurations and alerts on undesired states.
  • CloudTrail: Monitoring and audit service that logs all IAM user account activity and API service calls made on the account.
  • CloudWatch Alarms: Service that aggregates server and account (inc. billing) logs for building automated notification systems.
  • S3 Lifecycle Policy: Bucket policy that allows for automated migration of objects to secure cold storage. We chose to not use Glacier for cold storage unless we found that we had more data than expected and wanted to start minimizing S3 storage costs.

Network Security

We implemented several levels of network security, which ranged from native AWS offerings and other third-party appliances. They included:

  • A vendor appliance that provides a breadth of network security features such as SSL VPN, local traffic monitoring, and advanced application firewalling.
  • Route Tables: Private subnet route tables route VPC destinations internally. Public route tables similarly route VPC destinations locally, but route all other traffic to the internet gateway configured for the VPC.
  • Network Access Control Lists (NACLs) and Security Groups: We used Security Groups for instance level network ingress and egress control for development and database instances. Custom NACLs were also used at the subnet level to explicitly define allowable inbound and outbound traffic.

In addition to the technical aspects, we were also comfortable with articulating our plan and goals to the executive team, given our previous experience in health IT.

Monitoring and Logging

The core of the monitoring and logging framework was AWS CloudWatch, which has two main components: Logs and Events. AWS services are tightly integrated with CloudWatch to ship service logs for aggregation and to publish Events as necessary. By aggregating logs and events into a single service, CloudWatch is able to coordinate alarm triggering and notifying appropriate user groups via text or email for security events (or any other custom alarm we want configured).

Key Takeaways

  • Designed specifically for data science and AI, rather than for more traditional BI reporting

  • Used a sub-account with AWS Organizations to allow for easy management and consolidated billing

  • Tailored an AWS reference architecture for our client's use case without sacrificing any security features