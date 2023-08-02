Enterprise efforts to scale AI infrastructure are gaining momentum, but the underlying data infrastructure required often receives less attention. VAST Data is taking a different approach by focusing on the physics of how data is stored on decentralized solid-state drives (SSDs) and unveiling a decentralized AI infrastructure that runs directly on this decentralized data store.

Rather than moving all the data to a central location, VAST argues for training AI models on the data in place. This not only saves on transit and egress costs but also addresses data residency mandates that restrict cross-border data transfer.

VAST Data’s competitors in the storage infrastructure business include companies like Dell EMC, IBM, and Pure Storage. With its new AI enhancements, VAST aims to compete in the larger market for decentralized cloud database services offered by companies like Snowflake and Databricks, as well as the emerging market for data stores used in training AI-based Large Language Models (LLMs).

Early customers of VAST include NVIDIA, which is using the technology to build data infrastructures for AI supercomputers, and Pixar, which has adopted the architecture for storing 3D assets and training algorithms to improve movie production processes.

VAST’s innovation lies in its approach to SSDs. Unlike traditional hard drives that sort data into larger blocks, VAST breaks the data into smaller 2-bit blocks. This allows for more efficient duplicate data identification and compression, resulting in a four-fold increase in compression ratio compared to larger block sizes.

While SSD drive costs are still higher than traditional hard disk drives (HDDs), VAST’s similarity reduction technique enables enterprises to leverage SSD performance at cost parity. This eliminates the need for multiple storage tiers to organize large data sets, as well as the trade-off between price and performance.

The decentralized AI infrastructure provided by VAST also addresses the challenge of sharing data across distributed systems. Transaction databases, in particular, require ACID transactions for atomicity, consistency, isolation, and durability. VAST’s approach allows for scalability and cost efficiency without sacrificing transaction reliability.

By using the NVMe over fabric interface standard, VAST organizes commodity machines in a way that enables all CPUs to access the data on the SSDs concurrently, bypassing the need for constant intercommunication. This enables a decentralized global namespace and empowers enterprises to build systems that run functions on any node with access to all the data.

VAST’s decentralized AI infrastructure promises to create a coherent ecosystem where processes for managing transactions, analytics, and AI development can be connected. Data scientists can train algorithms on versioned similarity-reduced replications of live transaction and IoT data without impacting the system of record.

In conclusion, VAST Data’s decentralized AI infrastructure built on decentralized data stores provides a novel solution for scaling AI infrastructure while optimizing data storage and management. This approach offers cost efficiency, data residency compliance, and improved performance for enterprises across various industries.