OpenAI rival Cohere launches language model API

Cohere, a startup that creates major language models to compete with those from OpenAI and AI2Labs, today announced the general availability of its commercial platform for app and service development. Through an API, customers can access models that have been fine-tuned for a range of natural language applications, in some cases at a fraction of the price of competing offers.

The pandemic has accelerated the world’s digital transformation and pushed companies to become more dependent on software to streamline their processes. As a result, the demand for natural language technology is now higher than ever – especially in the company. According to a 2021 survey by John Snow Labs and Gradient Flow, 60% of technology executives indicated that their natural language processing (NLP) budgets grew by at least 10% compared to 2020, while a third – 33% – said their spending increased . by more than 30%.

The global NLP market is expected to increase in value from $ 11.6 billion in 2020 to $ 35.1 billion in 2026.

“Language is essential to humanity and without a doubt its greatest invention – alongside the evolution of computers. Ironically, computers still lack the ability to fully understand language and have difficulty analyzing syntax, semantics and context,” as everyone works together to make the words meaningful, “Cohere CEO Aidan Gomez told VentureBeat via email.” But the latest in NLP technology is continually improving our ability to communicate seamlessly with computers. “


Headquartered in Toronto, Canada, Cohere was founded in 2019 by a pedigree team including Gomez, Ivan Zhang and Nick Frosst. Gomez, a former intern at Google Brain, co-authored the academic paper “Attention Is All You Need,” which introduced the world to a basic AI model architecture called Transformer. (Among other high-profile systems, OpenAI’s GPT-3 and Codex are based on the Transformer architecture.) Together with Gomez, Zhang is a contributor to, an open AI research team involving computer scientists and engineers. As for Frosst, he worked, like Gomez, on Google Brain, publishing research on machine learning with Turing Prize winner Geoffrey Hinton.

In a statement of confidence, even before the launch of its commercial service, Cohere raised $ 40 million from institutional venture capitalists such as Hinton, Google Cloud AI chief executive Fei-Fei Li, UC Berkeley AI lab co-director Pieter Abbeel and former Uber autonomous chief executive Raquel Urtasun. “Very large language models now give computers a much better understanding of human communication. The team at Cohere is building technology that will make this revolution in natural language comprehension much more accessible, “Hinton said in a statement to Fast Company in September.

Unlike some of its competitors, Cohere offers two types of English NLP models, generative and representative, in languages ​​that include shrimp, otter, seal, shark, orca. The generative models can perform tasks that involve the generation of text – for example, writing product descriptions or extracting document metadata. In contrast, the representation models are about understanding language, running apps such as semantic search, chatbots and sentiment analysis.

Introduction to large language models with Cohere |  Cohere API documentation

Cohere already provides NLP capabilities to Ada, a company in the chatbot area. Ada utilizes a Cohere model to match customer chat queries with available support information.

“By being in both [the generative and representative space]”Cohere has the flexibility that many corporate customers need and can offer a range of model sizes that allow customers to choose the model that best suits their needs across the spectrum of latency and performance,” said Gomez. “[Use] Cross-industry cases include the ability to more accurately track and categorize expenses, speed up data entry for medical providers, or utilize semantic search for lawsuits, insurance policies, and financial documents. Companies can easily generate product descriptions with minimal input, prepare and analyze legal contracts and analyze trends and emotions to inform investment decisions. “

To keep its technology relatively affordable, Cohere charges access per. characters based on the size of the model and the number of characters apps use (ranging from $ 0.001 to $ 0.012 per 10,000 characters for generation and $ 0.0100 to $ 0.019 per 10,000 characters for representation). Apps powered by Coheres generation models are charged on the number of input and output characters, while apps powered by the representation models are charged only on the number of input characters. In the meantime, all fine-tuned models – ie. models tailored to specific domains, industries or scenarios – charged twice the base model price.

The problem remains that the only companies capable of leveraging NLP technology require seemingly bottomless resources to access the technology to large language models – due to the fact that the cost of these models ranges from tens to hundreds of millions of dollars to build, “Gomez said. “Cohere is easy to implement. With just three lines of code, companies can apply [our] full-stack engine to power all their NLP needs. The models themselves are… already pre-trained. ”

Introduction to large language models with Cohere |  Cohere API documentation

For Gomez ‘point, training and implementing large language models in production is not an easy feat, even for companies with enormous resources. For example, Nvidia’s recently released Megatron 530B model was originally trained across 560 Nvidia DGX A100 servers, each hosting 8 Nvidia A100 80GB GPUs. Microsoft and Nvidia say they observed between 113 and 126 teraflops per second. second per GPU while training the Megatron 530B, which would cost the training costs millions of dollars. (A teraflop rating measures the performance of hardware including GPUs.)

Inference – actually driving the trained model – is another challenge. On two of its expensive DGX SuperPod systems, Nvidia claims that inference (eg auto-completion of a sentence) with the Megatron 530B only takes half a second. But it can take over a minute on a CPU-based local server. While cloud alternatives may be cheaper, they are not dramatic – an estimate puts the cost of running GPT-3 on a single Amazon Web Services instance at a minimum of $ 87,000 per year.

Training of the models

To build Coheres models, Gomez says the team scrapes the web and feeds billions of ebooks and web pages (such as WordPress, Tumblr, Stack Exchange, Genius, BBC, Yahoo, and the New York Times) to the models so they learn to understand the meaning and purpose of language. (The training dataset for the generation models amounts to 200 GB of data sets after some filtering, while the data set for the representation models that were not filtered totals 3TB.) Like all AI models, Cohere trains by consuming a set of examples to learn patterns among data points, as grammatical and syntactic rules.

It is well established that models can amplify the imbalances in data on which they were trained. In a paper, the Middlebury Institute of International Studies’ Center for Terrorism, Extremism and Combating Terrorism claims that GPT-3 and similar models can generate text that can radicalize people into extremist extremist ideologies. A team at Georgetown University has used GPT-3 to generate misinformation, including stories about a false narrative, articles altered to push a false perspective, and tweets riffing about certain points of misinformation. Other studies, such as one published by Intel, MIT and the Canadian AI initiative CIFAR researchers in April, have found high levels of stereotypical bias from some of the most popular open source models, including Google’s BERT and XLNet and Facebook’s RoBERTa.

Generation |  Cohere API documentation

Cohere, for its part, claims it is committed to safety and trains its models “to minimize bias and toxicity.” Customers must adhere to the company’s guidelines for use or risk having their access to the API revoked. And Cohere – which has an external advisory board in addition to an internal security team – says it plans to monitor “evolving risks” with tools designed to identify harmful outputs.

But Coheres NLP models are not perfect. In its documentation, the company admits that the models can generate “obscenities, sexually explicit content, and messages that mischaracterize or stereotype groups of people based on problematic historical biases perpetuated by Internet communities.” For example, when fed with prompts about people, professions, and political / religious ideologies, the output of the API can be toxic 5 to 6 times per second. 1,000 generations and discuss men twice as much as women, Cohere says. Meanwhile, the Otter model in particular tends to associate men and women with stereotypical “male” and “female” occupations (e.g., male scientist versus female housekeeper).

In response, Gomez says the Cohere team “makes a significant effort to filter out toxic content and bad text,” including running conflicting attacks and measuring models against security research benchmarks. “[F]iltration is performed at keyword and domain levels to minimize bias and toxicity, ”he added. “[The team has made] meaningful progress that sets Cohere apart from others [companies developing] large language models… [W]I am convinced of the impact it will have on the work of the future during this transformative era. “


VentureBeat’s mission is to be a digital marketplace for tech makers to learn about transformative technology and trade. Our site provides essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:

  • updated information on topics of interest to you
  • our newsletters
  • gated thoughtful content and reduced access to our valued events, such as Transformation 2021: Learn more
  • networking features and more

sign up

Leave a Comment