Over the last decade, genomics has become the backbone of drug discovery. It has allowed scientists to develop more targeted therapies, boosting the chances of successful clinical trials. In 2018 alone, over 40% of FDA-approved drugs had the capacity for being personalized to patients, largely based on genomics data. As the percentage has doubled over the past four years, this trend is unlikely to slow down anytime soon.
The ever-increasing use of genomics in the realm of drug discovery and personalized treatments can be traced back to two significant developments over the past decade: plunging sequencing costs and, consequently, an explosion of data.
As sequencing technologies are constantly evolving and being optimized, the cost of sequencing a genome has plummeted. The first sequenced genome, part of the Human Genome Project, cost €2.4B and took around 13 years to complete. Fast forward to today, and you can get your genome sequenced in less than a day for under €900.
According to the Global Alliance for Genomics and Health, more than 100 million genomes will have been sequenced in a healthcare setting by 2025. Most of these genomes will be sequenced as part of large-scale genomic projects stemming from both big pharma and national population genomics initiatives. These efforts are already garnering immense quantities of data that are only likely to increase over time. With the right analysis and interpretation, this information could push precision medicine into a new golden age.
Are we ready to deal with enormous quantities of data?
Genomics is now considered a legitimate big data field – just one whole human genome sequence produces approximately 200 gigabytes of raw data. If we manage to sequence 100M genomes by 2025 – we will have accumulated over 20B gigabytes of raw data. The massive amount of data can partially be managed through data compression technologies, with companies such as Petagene, but that doesn’t solve the whole problem.
What’s more, sequencing is futile unless each genome is thoroughly analyzed to achieve meaningful scientific insights. Genomics data analysis normally generates an additional 100 gigabytes of data per genome for downstream analysis, and requires massive computing power supported by large computer clusters – a feat that is economically unfeasible for the majority of companies and institutions.
Researchers working with large genomics datasets have been searching for other solutions because relying solely on such high-performance computers (HPC) for data analysis is economically out of the question for many. Large servers require exorbitant amounts of capital upfront and incur significant maintenance overheads. Not to mention, specialized and high-level hardware, such as graphics processing units, require constant upgrades to remain performant.
Furthermore, as most HPCs have different configurations, ranging from technical specs to required software, the reproducibility of genomics analyses across different infrastructures is not a trivial feat.
Cloud computing: a data solution for small companies
Cloud computing has emerged as a viable way to analyze large datasets fast without having to worry about maintaining and upgrading servers. Simply put, Cloud computing is a pay-as-you-go model allowing you to rent computational power and storage. and it’s pervasive across many different sectors.
According to Univa – the industrial leader in workload scheduling in the cloud and HPC – more than 90% of organizations requiring high-performance computing capacity have moved, or are looking into moving to the cloud. Although this is not specific for companies in the life sciences, Gary Tyreman – Univa’s CEO – suggests that pharmaceutical companies are ahead of the market in terms of adoption.
The cloud offers flexibility, an alluring characteristic for small life science companies that may not have the capital on-hand to commit to large upfront expenses for IT infrastructure: HPC costs can make or break any company. As a consequence, many opt to test their product in the cloud-first, and if numbers look profitable, they can then invest in an in-house HPC solution.
The inherent ‘elasticity’ of cloud resources enables companies to scale their computational resources in relation to the amount of genomic data that they need to analyze. Unlike with in-house HPCs, this means that there is no risk money will be wasted on idle computational resources.
Elasticity also extends to storage: data can be downloaded directly to the cloud and removed once the analyses are finished, with many protocols and best practices in place to ensure data protection. Cloud resources are allocated in virtualized slices called ‘instances’. Each instance hardware and software is pre-configured according to the user’s demand, ensuring reproducibility.
Why isn’t cloud computing more mainstream in genomics?
In the world of drug discovery, privacy and data safety are paramount. While cloud providers have developed protocols to ensure the data is safe, some risks still exist, for example, when moving the data. Therefore, large pharmaceutical companies prefer internal solutions to minimize these risks. Privacy remains the main obstacle for pharmaceutical companies to fully embrace the cloud, while the cost to move operations away from HPCs is no longer a barrier. While risks will always exist to a certain extent, the cloud allows seamless collaboration and reproducibility, both of which are essential for research and drug discovery.
Getting ready for the genomics revolution
It’s no secret that genomics is key to enabling personalized medicine and advancing drug discovery. We are now seeing a genomics revolution where we have an unprecedented amount of data ready to be analyzed.
The challenge now is: are we ready for it? To be analyzed, big data requires massive computation power, effectively becoming an entry barrier for most small organizations. Cloud computing provides an alternative to scale analyses, while at the same time, facilitating reproducibility and collaboration
While the cost and security limitations of cloud computing are preventing companies from fully embracing the cloud, these drawbacks are technical and are expected to be resolved within the next few years.
Many believe that the benefits of the cloud heavily outweigh its limitations. With major tech giants competing to offer the best cloud solutions – a market valued at $340 billion by 2024 – we might be able to expect a drastic reduction in costs. While some privacy concerns may still exist, leading genomics organizations are developing new tools and technologies to protect genomic data.
Taken as a whole, it is likely that the cloud will be increasingly important in accelerating drug discovery and personalized medicine. According to Univa’s Tyreman, it will take around 10–15 years to see the accelerated transition from HPC to cloud, as large organizations are often conservative in embracing novel approaches.
“Distributed big data is the number one overwhelming challenge for life sciences today, the major obstacle impeding progress for precision medicine,” Chatzou Dunford concluded.
“The cloud and associated technologies are already powering intelligent data-driven insights, accelerating research, discovery and novel therapies. I have no doubt we are on the cusp of a genomics revolution.”