New supercomputer opens doors for researchers in Sweden
At the time it was put in in the summer time of 2018, Tetralith was extra than simply the quickest of the six conventional supercomputers in the National Supercomputer Centre (NSC) at Linköping University. It was essentially the most highly effective supercomputer in the Nordic area.
But simply three years later, it was obligatory to enhance Tetralith with a brand new system – one that might be particularly designed to satisfy the necessities of fast-evolving synthetic intelligence (AI) and machine studying (ML) algorithms. Tetralith wasn’t designed for machine studying – it didn’t have the parallel processing energy that might be wanted to deal with the more and more massive datasets used to coach synthetic intelligence algorithms.
To help analysis programmes that depend on AI in Sweden, the Knut and Alice Wallenberg Foundation donated €29.5m to have the larger supercomputer constructed. Berzelius was delivered in 2021 and commenced operation in the summer time. The supercomputer, which has greater than twice the computing energy of Tetralith, takes its identify from the famend scientist Jacob Berzelius, who got here from Östergötland, the area of Sweden the place the NSC is positioned.
Atos delivered and put in Berzelius, which incorporates 60 of Nvidia’s newest and strongest servers – the DGX methods, with eight graphics processing models (GPUs) in every. Nvidia networks join the servers with each other – and with 1.5PB (petabytes) of storage {hardware}. Atos additionally delivered its Codex AI Suite, an software toolset to help researchers. The whole system is housed in 17 racks, which when positioned side-by-side lengthen to about 10 metres.
The system might be used for AI analysis – not solely the massive programmes funded by the Knut and Alice Wallenberg Foundation, but in addition different tutorial customers who apply for time on the system. Most of the customers might be in Sweden, however some might be researchers in different components of the world who cooperate with Swedish scientists. The largest areas of Swedish analysis that can use the system in the close to future are autonomous methods and data-driven life sciences. Both instances contain lots of machine studying on huge datasets.
NSC intends to rent workers to assist customers – not a lot core programmers, however quite to assist customers put collectively components that exist already. There are lots of software program libraries for AI they usually must be understood and used accurately. The researchers utilizing the system sometimes both do their very own programming, have it executed by assistants, or just adapt good open supply tasks to their wants.
“So far, around 50 projects have been granted time on the Berzelius,” says Niclas Andresson, know-how supervisor of NSC. “The system is not yet fully utilised, but utilisation is rising. Some problems use a large part of the system. For instance, we had a hackathon on NLP [natural language processing], and that used the system quite well. Nvidia provided a toolbox for NLP that scales up to the big machine.”
In truth, one of many largest challenges now’s for researchers to scale the software program they’ve been utilizing to match the brand new computing energy. Many of them have one or a small variety of GPUs that they use on their desktop computer systems. But scaling their algorithms to a system with a whole bunch of GPUs is a problem.
Now Swedish researchers have the chance to suppose massive.
Autonomous methods
AI researchers in Sweden have been utilizing supercomputer sources for a number of years. In the early days, they used methods primarily based on CPUs. But in more moderen years, as GPUs advanced out of the gaming trade and into supercomputing, their massively parallel buildings have taken quantity crunching to a brand new stage. The earlier GPUs have been designed for picture rendering, however now they’re being tailor-made to different purposes, resembling machine studying, the place they’ve already change into important instruments for researchers.
“Without the availability of supercomputing resources for machine learning we couldn’t be successful in our experiments,” says Michael Felsberg, professor on the Computer Vision Laboratory at Linköping University. “Just having the supercomputer doesn’t solve our problems, but it’s an essential ingredient. Without the supercomputer, we couldn’t get anywhere. It would be like a chemist without a Petri dish, or a physicist without a clock.”
Michael Felsberg, Linköping University
Felsberg was a part of the group that helped outline the necessities for Berzelius. He can be a part of the allocation committee that decides which tasks that get time on this cluster, how time is allotted, and the way utilization is counted.
He insists that not solely is it essential to have a giant supercomputer, but it surely have to be the fitting sort of supercomputer. “We have enormous amounts of data – terabytes – and we need to process these thousands of times. In all the processing steps, we have a very coherent computational structure, which means we can use a single instruction and can process multiple data, and that is the typical scenario where GPUs are very strong,” says Felsberg.
“More important than the sheer number of calculations, it’s also necessary to look at the way the calculations are structured. Here too, modern GPUs do exactly what’s needed – they easily perform calculations of huge matrix products,” he says. “GPU-based systems were introduced in Sweden a few years ago, but in the beginning, they were relatively small, and it was difficult to gain access to them. Now we have what we need.”
Massive parallel processing and large information transfers
“Our research does not require just a single run that lasts over a month. Instead, we might have as many as 100 runs, each lasting two days. During those two days, enormous memory bandwidth is used, and local filesystems are essential,” says Felsberg.
“When machine studying algorithms run on trendy supercomputers with GPUs, a really excessive variety of calculations are carried out. But an infinite quantity of knowledge can be transferred. The bandwidth and throughput from the storage system to the computational node have to be very excessive. Machine studying requires terabyte datasets and a given dataset must be learn as much as 1,000 instances throughout one run, over a interval of two days. So all of the nodes and the reminiscence must be on the identical bus.
“Modern GPUs have thousands of cores,” provides Felsberg. “They all run in parallel on totally different information however with the identical instruction. So that’s the single-instruction, multiple-data idea. That’s what we’ve got on every chip. And then you may have units of chips on the identical boards and you’ve got units of boards in the identical machine so that you’ve got huge sources on the identical bus. And that’s what we want as a result of we regularly cut up our machine studying onto a number of nodes.
“We use a large number of GPUs at the same time, and we share the data and the learning among all of these resources. This gives you a real speed-up. Just imagine if you ran this on a single chip – it would take over a month. But if you split it, a massively parallel architecture – let’s say, 128 chips – you get the result of the machine learning much, much faster, which means you can analyse the result and you see the outcome. Based on the outcome you run the next experiment,” he says.
“One other challenge is that the parameter spaces are so large that we cannot afford to cover the whole thing in our experiments. Instead, we have to do smarter search strategies in the parameter spaces and use heuristics to search what we need. This often requires that you know the outcome of the previous runs, which makes this like a chain of experiments rather than a set of experiments that you can run in parallel. Therefore, it’s very important that each run be as short as possible to squeeze out as many runs as possible, one after the other.”
“Now, with Berzelius in place, this is the first time in the 20 years I’ve been working on machine learning for computer vision that we really have sufficient resources in Sweden to do our experiments,” says Felsberg. “Before, the computer was always a bottleneck. Now, the bottleneck is somewhere else – a bug in the code, a flawed algorithm, or a problem with the dataset.”
The starting of a brand new period in life sciences analysis
“We do research in structural biology,” says Bjorn Wallner, professor at Linköping University and head of the boinformatics division. “That involves trying to find out how the different elements that make up a molecule are arranged in three-dimensional space. Once you understand that, you can develop drugs to target specific molecules and bind to them.”
Most of the time, analysis is coupled to a illness, as a result of that’s when you possibly can resolve a direct drawback. But typically the bioinformatics division at Linköping additionally conducts pure analysis to attempt to get a greater understanding of organic buildings and their mechanisms.
The group makes use of AI to assist make predictions about particular protein buildings. DeepMind, a Google-owned firm, has executed work that has given rise to a revolution in structural biology – and it depends on supercomputers.
DeepMind developed AlphaFold, an AI algorithm it educated utilizing very massive datasets from organic experiments. The supervised coaching resulted in “weights”, or a neural community that may then be used to make predictions. AlphaFold is now open supply, out there to analysis organisations, resembling Bjorn Wallner’s group at Linköping University.
Bjorn Wallner, Linköping University
There continues to be an enormous quantity of uncharted territory in structural biology. While AlphaFold affords a brand new method of discovering the 3D construction of proteins, it’s solely the tip of the iceberg – and digging deeper will even require supercomputing energy. It’s one factor to grasp a protein in isolation, or a protein in a static state. But it’s a wholly totally different factor to determine how totally different proteins work together and what occurs after they transfer.
Any given human cell incorporates round 20,000 proteins – they usually work together. They are additionally versatile. Shifting one molecule out and one other one binding a protein to one thing else are all actions that regulate the equipment of the cell. Proteins are additionally manufactured in cells. Understanding the essential equipment is necessary and may result in breakthroughs.
“Now we can use Berzelius to get a lot more throughput and break new ground in our research,” says Wallner. “The new supercomputer even offers us the potential to retrain the AlphaFold algorithm. Google has lots of sources and may do lots of massive issues, however now we will perhaps compete just a little bit.
“We have just started using the new supercomputer and need to adapt our algorithms to this huge machine to use it optimally. We need to develop new methods, new software, new libraries, new training data, so we can actually use the machine optimally,” he says.
“Researchers will expand on what DeepMind has done and train new models to make predictions. We can move into protein interactions, beyond just single proteins and on to how proteins interact and how they change.”