Pharma Giant Migrates R&D Analytics to AWS
- May 04, 2015
We recently had the opportunity to work with a pharmaceutical company that is breaking new ground when it comes to treatments for life-threatening ailments like cancer. Seeking to innovate across the organization — from R&D to IT — this company reached out to the DevOps consulting team at Flux7 to help it migrate its Cloudera Hadoop-based analytics systems to AWS. Specifically, the vision was to take all of its diverse data sets to the cloud, establishing a highly available and secure environment where the firm could conduct data modeling and data analysis while protecting sensitive data and ensuring GxP and HIPAA compliance.
This firm has many different data sets from multiple sources. The job of the R&D analytics group is to standardize the data across these sets and sources to ensure that the data is usable. One of the goals of the migration, therefore, was to create an integrated layer for the incoming data where the data would be standardized. The layer would then organize the data in such a manner that it could be easily accessed by different stakeholder and their different tools. To begin, we assessed the business and technical requirements and designed the architecture.
The DevOps consulting team applied the Flux7 Enterprise DevOps Framework, a model for marrying DevOps process improvement with digital transformation. We started by building a cloud landing zone architecture with a new AWS account for Analytics with Dev, Test, and Production deployed in separate VPCs, as well as one for services.
Next, we developed CloudFormation templates that will be used to deploy the underlying infrastructure for the Cloudera Hadoop cluster and subsequent deployment of the Cloudera Hadoop services. Further, using AWS CodeCommit and CodePipeline, with CodeBuild, the teams effectively established the new AWS infrastructure pipeline in which code is merged to the master branch, the CI/CD detects a new commit via polling and creates a ChangeSet. The admin approves the changeset and CloudFormation updates the infrastructure.
The services are made available to end-users through Active Directory verification; once approved, they can access the data through a Cloudera Hadoop cluster. While demand for the system will not increase dramatically with the migration, having on-demand access to the data is critical. Therefore, the new architecture was built using ELB to ensure consistent application performance.
AWS Security and Compliance
Security is a top-most priority for this project given the extremely sensitive nature of the data involved. As a result, layered security by design approach was taken, with security built-in across the environment. For example, the solution includes among other security and compliance features:
A GxP lock-in for Cloudera and Hadoop where we created a definition written in such a way that no one can change the process without auditor approval; the structure is locked until an auditor is brought back in for a change.
The use of AWS VPCs which span multiple Availability Zones (AZ) in order to ensure redundancy, fault tolerance and provide for disaster recovery.
AWS EC2 Security Group firewall rules with auto-configuring rules for AWS WAF forthcoming.
Logging for governance, compliance, and risk auditing with AWS CloudTrail.
Configuration control managed with AWS Config which tracks and alerts on the compliance of configurations to a defined secure state. E.g. this pharmaceutical company’s external and internal compliance requirements.
And, we used AWS Inspector, a security assessment service that helps improve security and compliance, to create detailed lists of security findings prioritized by level of severity, for quick and easy remediation.
Combined, these features help define and maintain a secure state for the pharma’s R&D analytics, keeping this sensitive data safe and compliant with GxP and HIPAA requirements.
The security and compliance teams are happy with the security controls built into the new architecture. From HIPAA to GxP and internal security controls, the new AWS architecture applies important compliance and validation rules through monitoring, logging, config rules, firewalls and more. IS is assured that security is working as per the corporate policy.
Working together, the teams created a data model in which all incoming data is organized in a single layer with different data marts for entities such as patient data. This architecture allows team members to report on both predefined and ad hoc/exploratory analytics. Researchers are now seamlessly aggregating and processing data through a transformed architecture. This means that the R&D Analytics team is more agile than ever, delivering faster results to the business for greater future innovation.