More than half a year into the COVID-19 pandemic, scientists and policymakers still have a host of unanswered questions about the disease and the SARS-CoV-2 virus strain that causes it. They’re turning to some of the world’s most powerful high-performance computers as they search for answers, targeting the novel coronavirus at scales ranging from the atomic to the national. Here is a cross-section of their work.
One area of intense focus for the HPC community is molecular modeling. Detailed models of SARS-CoV-2 and physics-based molecular dynamics simulations can provide insights into how the novel coronavirus works and what aspects of its molecular structure might present suitable targets for vaccines and treatments.
A team headed by University of California San Diego biophysicist Rommie Amaro is constructing an atom-by-atom model of the SARS-CoV-2 exterior, including its lipid envelope and the proteins on its surface. The completed model will comprise approximately 200 million atoms.
To handle the demanding work of creating the model and exploring how its 200 million atoms move and interact, the UCSD team is using the world’s most powerful academic supercomputer: a Dell EMC PowerEdge system with 8,008 nodes based on Intel Xeon Platinum processors. Dubbed Frontera, the system is funded by the National Science Foundation and installed at the Texas Advanced Computer Center at The University of Texas.
The Amari team’s code is one of approximately 40 SARS-CoV-2 applications running on Frontera and other TACC supercomputers. It is both compute- and memory-intensive, and has been one of the largest consumers of cycles on Frontera during the coronavirus crisis. The work uses Nanoscale Molecular Dynamic (NAMD) simulation software.
A research team led by Jean-Philip Piquemal, professor of theoretical chemistry at Sorbonne University in Paris, is taking a similar approach, performing all-atoms molecular dynamics simulations of SARS-CoV-2 proteins. The team runs its simulations on the Joliot-Curie supercomputer, an Atos Bull Sequana X1000 system designed for the French Atomic Energy Commission and based on Intel Xeon Platinum processors. The supercomputer supports a range of French and European research projects. Its name honors Irène Joliot-Curie, a French biochemist and the daughter of Marie and Pierre Curie.
Converged AI and Molecular Dynamics
What are the most promising biological targets for a vaccine or treatment? A multifaceted collaboration led by computational biologists at the Department of Energy’s Argonne National Laboratory is using supercomputers to conduct an exhaustive search of millions of small molecules, looking for those that best match the potential receptors on the SARS-CoV-2 virus.
Argonne’s approach is both compute- and data-intensive. It runs millions of drug docking and molecular dynamics simulations, and uses deep learning to help examine and classify the results and determine which ones are most deserving of further investigation. The approach aims to provide greater accuracy and throughput than previous virtual screening techniques while helping lab scientists identify potential targets for new drug development. It also highlights a prominent trend in high-performance computing: the convergence of AI with HPC’s traditional simulation and modeling tasks.
The collaboration is nationwide in scope. Spearheaded by Argonne computational biologist Arvind Ramanathan, it uses DeepDriveMD, an AI-based molecular simulations toolkit designed for protein folding and developed by Ramanathan and other experts from Argonne and Rutgers University/Brookhaven National Laboratory. The work runs on TACC’s Frontera system, along with HPC platforms at the Argonne Leadership Computing Facility, Oak Ridge National Laboratory, and the San Diego Supercomputer Center.
Finding Clues in Genomics Analysis
Genomics processing is another field that is both crucial to understanding SARS-CoV-2 and highly demanding of HPC resources. Bioinformatics analysis of the virus’ genetic data can show how it may be mutating and evolving over time. Analyzing patients’ DNA may help explain why people who contract COVID-19 experience such diverse symptoms. The interplay between human and viral genetic information is crucial to developing diagnostics, vaccines, and immunotherapies.
Global genomics giant BGI, formerly known as BGI Genomics, developed the world’s first diagnostic test kits for COVID-19. The company’s researchers are also focusing on population-scale genomics for the disease, working with Lenovo and Intel and using a large HPC cluster to rapidly analyze the enormous datasets produced by BGI’s high-throughput sequencing equipment. Their goal is to pinpoint genetic traits that might predict or influence an individual’s ability to resist the disease or shape the course of the disease once a person is infected.
In the open source community, biochemists, molecular biologists, and other scientists rely on Galaxy, a leading resource for genomics and other data-intensive biomedical research. Galaxy is hosted at TACC and the Pittsburgh Supercomputer Center and runs approximately half a million jobs each month, including many that analyze viral data sets. Researchers in the Galaxy community are also developing and sharing tools and best practices to aid researchers around the world who are sequencing and studying SARS-CoV-2 and other viruses.
PSC’s supercomputers include multiple generations of Intel Xeon Scalable processors. In addition to supporting Galaxy, PSC’s Intel-based infrastructure hosts the 2019nCoVR Novel Coronavirus Resource. This fast-growing dataset combines genomic and proteomic sequencing data on SARS-CoV-2 from sources such as the US National Center for Biotechnology Information (GenBank) and the China National Center for Bioinformation.
A Digital Twin for the United States
Agent-based epidemiological analysis, accelerated by data science and AI, provides a basis for tracking how COVID-19 is spreading, how well a community’s supply of ICU beds matches the projected demand, and more. Armed with this information, policymakers are in a better position to develop science-based strategies that can improve outcomes.
A team from the University of Virginia Biocomplexity Institute has used PSC’s 720-node Intel Xeon Scalable processor-based supercomputer to run detailed, overnight epidemiological simulations of COVID-19 in the United States. This project has constructed a digital twin model of the entire nation — a network model of 300 million nodes and between 8 and 15 billion connections. The simulations consumed more than 5 million core hours at PSC between mid-March and the beginning of May.
The institute’s nightly simulations run up-to-date models that attempt to provide information relevant to the day’s most pressing planning and policy questions. They ask “what if” questions aimed at exploring issues such as how various interventions might impact the course of the pandemic. AI and other data science techniques are used to analyze each night’s simulations and search out even subtle patterns in the massive volumes of data. Results are presented to federal agencies such as the Department of Defense, Department of Health and Human Services, and Centers for Disease Control and Prevention, and the Virginia Departments of Health and Emergency Management.
The team is led by Madhav Marathe, a distinguished professor and division director of the Network System Science and Advanced Computing Division at the Biocomplexity Institute. Marathe’s team includes more than 80 experts at UVA and other institutions.
Ensuring Access to Vital Resources
Reflecting HPC’s importance to pandemic-related research, leaders from government, academia, and industry have come together to make sure promising projects get the supercomputing resources they need.
The White House Office of Science and Technology Policy created the COVID-19 HPC Consortium, a public-private collaboration to coordinate access to supercomputing resources and expertise. Members include technology companies, cloud service providers, supercomputer leaders, DOE national laboratories, NASA, NSF agencies, and universities.
Infrastructure available through the consortium includes supercomputers with approximately 70,000 CPU nodes based on Intel Xeon processors, along with thousands of GPUs and storage processors. Consortium members are also sharing expertise to help researchers take full advantage of the hardware and software available to them.
Across the Atlantic, PRACE — the Partnership for Advanced Computing in Europe — has established a fast-track process to review coronavirus-related project proposals for researchers from academic, research, and commercial organizations across Europe. PRACE is an international not-for-profit association that offers HPC resources to 26 European nations. The organization has awarded more than two dozen SARS-CoV-2 projects, including Dr. Piquemal’s molecular dynamics research. The European Commission launched 18 SARS-CoV-2 research projects involving 151 research teams in March 2020.
Many HPC centers are offering supercomputing cycles for pandemic-related research and innovation projects. Eni, the Rome-based oil and gas leader, is making time available on its brand new HPC5 EMC Dell supercomputer based on Intel Xeon Gold processors. Eni’s system is the fastest supercomputer in Europe, the world’s most powerful industrial supercomputer, and the sixth most powerful HPC system on the June 2020 TOP500. Through the EC-funded EXSCALATE4CoV public-private consortium, Eni is working with Dompé, a biopharmaceutical company headquartered in Milan, and Cineca, a nonprofit research consortium, to accelerate the path to COVID-19 treatments. Eni is also sharing molecular modeling skills and resources that it normally uses for seismic research for modeling the virus.
Germany’s Leibniz Supercomputing Centre has invited SARS-CoV-2 research teams to apply for cycles on its SuperMUC-NG platform. A Lenovo ThinkSystem platform based on Intel Xeon Platinum processors, SuperMUC-NG ranks 13th on the June 2020 TOP500 .
In Saudi Arabia, the King Abdullah University of Science and Technology has offered access on its Shaheen II to select research teams. Shaheen II is a Cray XC-40 system based on Intel Xeon processors.
BP has joined the US COVID-19 HPC Consortium and is providing cycles on its industrial-scale HPE Apollo supercomputer powered by Intel processors.
The Association for Computing will award a $10,000 Gordon Bell Special Prize in 2020 and 2021 to recognize performance achievements related to COVID-19 and SARS-CoV-2 research. The awards will spotlight HPC work that furthers understanding of the science or makes significant advances in computational methods.
Looking Beyond COVID-19
COVID-19 has disrupted millions of lives and cut short hundreds of thousands. The scientific, economic, and societal challenges remain significant. But HPC offers some bright spots as we begin to look ahead.
Scientists are running programs with a scale and sophistication that their peers in earlier eras could only dream of — and upcoming exascale computers are poised to deliver further breakthroughs in performance. Researchers are advancing their algorithms and methodologies in ways that will accelerate work in other fields as well. The collaboration of government, industry, and universities provides a model for future all-hands-on-deck initiatives.