IBM’s Almaden, California, research lab are in the process of building the largest data drive ever. We’re not just talking on a large scale, we’ll talking 120 petabyes worth.
120 petabytes is the equilavent of 120 million gigabytes or enough space to hold around 24 billion average sized MP3’s. IBM’s task was aligning individual drives in horizontal drawers, but made the spaces wide enough to contain them. IBM’s engineers developed a series of new hardware and software techniques to enable such a large hike in data-storage capacity. The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate. Finding a way to efficiently combine the thousands of hard drives that the system is built from was one challenge.
The new system also benefits from a file system known as GPFS that was developed at IBM Almaden to enable supercomputers faster data access. It spreads individual files across multiple disks so that many parts of a file can be read or written at the same time. GPFS also enables a large system to keep track of its many files without laboriously scanning through every one. Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours.