Increased Storage for CAI GPU Infrastructure

The CAI is continuously improving its compute infrastructure. After investing in additional computing power in recent years, now the storage system is being renewed. This will allow running machine learning experiments considerably easier and faster.

Tuesday, 31 May 2022

The Centre for Artificial Intelligence (CAI) at the ZHAW specializes in the areas of Autonomous Learning Systems, Computer Vision, Perception and Cognition, Trustworthy AI, AI Engineering, and Natural Language Processing. All these areas utilize deep learning, a methodology that optimizes complex machine learning systems with enormous compute power and many examples in the form of large data sets. For these computationally intensive optimization problems, graphics processing units (GPUs) are typically used because they can process many calculations in parallel. In addition to GPUs, CPUs are needed to load the data from the file system, as well as memory (RAM) to temporarily store the large amounts of data on the system.

For this purpose, the CAI, together with the InIT, maintains a state-of-the-art infrastructure that is constantly being renewed and optimized. Currently, staff researchers have access to several systems with a total of 124 GPUs, 2648 CPU cores and 21.5 TB of RAM. After a massive increase in compute power in recent years, the storage system has now been renewed. This improves the management of deep learning training data sets that typically consist of millions of tiny files such as text snippets, images or short video sequences. To this end, a new storage system from DALCO that offers 216 TB HDD and 15 TB SSD storage replaces the existing system. The new storage server is connected to the compute cluster with two redundant 100 GB network connections and thus allows lightning-fast data access.

With this new storage server, the CAI is equipped to train deep learning models even more efficiently and thus accelerates research and industrial applications.

go back