Cerebras Systems Releases Seven New GPT Models Trained on CS-2 Wafer-Scale Systems

Cerebras-GPT Models Set Benchmark for Training Accuracy, Efficiency, and Openness

Cerebras Systems, the pioneer in artificial intelligence (AI) compute for generative AI, today announced it has trained and is releasing a series of seven GPT-based large language models (LLMs) for open use by the research community. This is the first time a company has used non-GPU based AI systems to train LLMs up to 13 billion parameters and is sharing the models, weights, and training recipe via the industry standard Apache 2.0 license. All seven models were trained on the 16 CS-2 systems in the Cerebras Andromeda AI supercomputer.

Spearheaded by OpenAI’s ChatGPT, the rapid growth of LLMs has spurred a race to create more powerful, specialized AI chips. While many companies have promised alternatives to Nvidia® GPUs, none have demonstrated both the ability to train large-scale models and the willingness to open source the results with permissive licenses. In fact, competitive pressures have systematically reduced the willingness of companies to release their LLMs to the public, even with restrictive licenses (see GPT-4 for the most recent example). This concentrates ownership, limits ecosystem growth and creates safety risk.

Cerebras’ release today directly addresses these issues. In a first among AI hardware companies, Cerebras researchers trained, on the Andromeda supercomputer, a series of seven GPT models with 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B parameters. Typically a multi-month undertaking, this work was completed in a few weeks thanks to the incredible speed of the Cerebras CS-2 systems that make up Andromeda, and the ability of Cerebras’ weight streaming architecture to eliminate the pain of distributed compute. These results demonstrate that Cerebras’ systems can train the largest and most complex AI workloads today.

This is the first time a suite of GPT models, trained using state-of-the-art training efficiency techniques, has been made public. These models are trained to the highest accuracy for a given compute budget (i.e. training efficient using Chinchilla recipe) so they have lower training time, lower training cost, and use less energy than any existing public models.

“Few organizations are capable of training truly large-scale models. Even fewer have done so on dedicated AI hardware,” said Sean Lie, co-founder and Chief Software Architect at Cerebras. “Releasing seven fully trained GPT models into the open-source community shows just how efficient clusters of Cerebras CS-2 systems can be and how they can rapidly solve the largest scale AI problems – problems that typically require hundreds or thousands of GPUs. We are very excited to share these models and our learnings with the AI community.”

As the cost and complexity of LLM development grows, companies have stopped releasing their models to the public. To promote open research and access, Cerebras is releasing all seven models, the training methodology, and training weights to the research community under the permissive Apache 2.0 license. This release provides several benefits:

  • The training weights provide a highly accurate pre-trained model for fine tuning. By applying a modest amount of custom data, anyone can create powerful, industry specific applications with minimal work.
  • The models’ various sizes and their accompanying checkpoints allow AI researchers to create and test new optimizations and workflows that broadly benefit the community.
  • By releasing under the industry standard Apache 2.0 license, these models can be used for research or commercial ventures without royalties.

Building on top of the GPT architecture, the models released today make a number of technical contributions:

  • The derivation of a new scaling law based on an open dataset. Scaling laws are as fundamental to AI as Moore’s Law is to semiconductors. Specifically they allow researchers to predict how a given compute training budget translates to model performance. Cerebras’ scaling law extends prior work done by OpenAI and DeepMind and is the first scaling law derived using an open dataset, making it reproducible by the AI community.
  • The demonstration of a simple, data-parallel only approach to training. Traditional LLM training on GPUs require a complex amalgam of pipeline, model, and data parallelism techniques. Cerebras’ weight streaming architecture is a data-parallel only model that requires no code or model modification to scale to arbitrarily large models.
  • Cerebras-GPT is the first family of GPT models that are compute-efficient at every model size. Existing open GPT models are trained on a fixed number of data tokens. By applying the Chinchilla training recipe across every model size, Cerebras-GPT is a new high-accuracy baseline for broad use.

"Training large language models at scale is a technically challenging endeavor. With this release Cerebras joins the ranks of a handful of organizations to train and open source a model suite of this scale. We’ve worked to make this task easier with releases such as the Pile and the Eval Harness, and we are very excited to see Cerebras build on our work to produce a family of open models that will be useful to researchers around the world," said Stella Biderman, Executive Director at EleutherAI.

“By releasing seven GPT models, Cerebras not only demonstrates the power of its CS-2 systems and Andromeda supercomputer as being amongst the premier training platforms, but elevates Cerebras researchers to the upper echelon of AI practitioners,” said Karl Freund, founder and principal analyst, Cambrian AI. “There are a handful of companies in the world capable of deploying end-to-end AI training infrastructure and training the largest of LLMs to state-of-the-art accuracy. Cerebras must now be counted among them. Moreover, by releasing these models into the open-source community with the permissive Apache 2.0 license, Cerebras shows commitment to ensuring that AI remains an open technology that broadly benefits humanity.”

All seven Cerebras-GPT models are immediately available on Hugging Face and Cerebras Model Zoo on GitHub. The Andromeda AI supercomputer used to train these models is available on-demand on https://www.cerebras.net/andromeda/.

For those interested in the technical details, Cerebras has published a technical blog post with the details of the 7 models and the scaling laws that they produce. A research paper will be released shortly.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computer system, designed for the singular purpose of accelerating generative AI work. Our flagship product, the CS-2 system, powered by the world’s largest and fastest AI processor, makes training large models simple and easy, by avoiding the complexity of distributed computing. Cerebras solutions are available in the cloud, through the Cerebras AI Model Studio or on premises.

Contacts