E-science is central in enabling many of the core processes in pharmaceutical R&D, from sequence analytics and target selection, high throughput chemical screening, ligand docking and protein structure characterisation, predictive safety model construction, drug metabolism modelling through to the molecular characterisation of disease states. To date, the scale of these analyses and the throughput of the assay technology has meant internal infrastructures have sufficed. However due to significant changes in the pharmaceutical R&D model and rapid advances in key assay technologies, to be competitive, the industry needs to consider alternative models.
The drivers influencing change in pharmaceutical requirements are:
1. The shift to a personalised medicine strategy in drug R&D: improvements in drug safety and efficacy achieved through the provision of individually tailored therapies using advances in knowledge about genetic factors and biological mechanisms of disease coupled with individual’s patient care and medical history. Many drug R&D programmes now have associated biomarker discovery and patient stratification studies, exploring patient and animal specific drug response, metabolism and safety. Typically such studies involve the molecular profiling (e.g. next generation sequencing, transcriptomics) of individual clinical and in vivo samples with projects capable of producing terabytes of primary data for relatively little cost (e.g. the $10k/human genome, with $1k soon to be achieved for the sequencing). With such data volumes only set to grow exponentially (or more) and an increasing need to manage, integrate and analyse these data as part of the core R&D process, there is a significant growing demand for advanced HPC solutions.
2. Modelling and simulation: the use of predictive technologies to guide bench research and shorten cycle times, at all stages of R&D. These include virtual high throughput screening, predictive safety pharmacology models, predictive pharmacokinteic/pharmacodynamic modelling and systems biology approaches to modelling disease/drug interventions. As the scale and diversity of the data grow this type of modelling will be increasingly important for representing complex systems and predicting behaviour. Such large scale modelling will have significant scale challenges for existing infrastructures requiring computational science research and possibly specialised heterogeneous hardware solutions.
3. The move from a tightly integrated vertical R&D operating model to highly networked R&D organisations, partnering with numerous contract research organisations (CROs), academia, charities and biotechnology companies. Such partnerships enable drug R&D projects to harness external innovation and specialist skills and knowledge. However this networked model requires secure e-collaboration environments, allowing iterative, collaborative analyses over shared, often large scale, heterogeneous data sets.
4. The explosion of publically available, pharma R&D relevant, massive, heterogeneous and complex biomedical and chemical data sets, such as the 1000 Genomes project or the Cancer Genome Project, or the many transcriptomic, proteomic and metabolomic datasets. The existing policy of internalising, integrating and analysing such public data within pharmaceutical firewalls is no longer feasible requiring access to public/private HPC facilities proximal to these emerging, R&D relevant data sets.
The next generation of pharmaceutical e-infrastructure therefore needs not only to facilitate large scale clinical and molecular analytics and modelling but also allow secure collaborative working over both public and proprietary data sets. To this end many pharmaceutical companies are exploring external HPC services (and research), including commercial cloud solutions, allowing flexible access to significant infrastructure as and when needed. However, such options raise other issues, for example data security (for commercial reasons but also importantly for ethical reasons deriving from the need to protect the confidentiality of the patient data), indemnity insurance, practicalities of moving extremely large data sets over remote networks, access to contextual knowledge (e.g. EnsEMBL, literature and proprietary knowledge) on remote clouds, etc.