fbpx
Techitup Middle East
AIB2B Technology

G42 Launches JAIS 70B and 20 other AI Models

20 models of different sizes, from 590M to 70B parameters, become available in the largest release of the JAIS family of AI tools

The latest JAIS large language model (LLM), JAIS 70B, was released today by Inception, a G42 company specializing in the development of advanced AI models and applications, all provided as a service. A 70 billion parameter model, JAIS 70B is built for developers of Arabic-based natural-language processing (NLP) solutions and promises to accelerate the integration of Generative AI services across various industries, enhancing capabilities in areas such as customer service, content creation, and data analysis.     

JAIS 70B delivers Arabic-English bilingual capabilities at an unprecedented size and scale for the open-source community. As a 70 billion parameter model, it has increased ability to handle complicated and nuanced tasks, as well as better capability to process complex datasets. JAIS 70B was developed using continuous training, a process of fine-tuning a pre-trained model, on 370 billion tokens of which 330 billion were Arabic tokens, the largest Arabic dataset ever used to train an open-source foundational model.    

In this release, the company has also unveiled a comprehensive suite of JAIS foundation and fine-tuned models; 20 models, across 8 sizes, ranging from 590M to 70B parameters, and specifically fine-tuned for chat applications, trained on up to 1.6T tokens of Arabic, English, and code data. In response to feedback from the Arabic NLP community, this extensive release now delivers a breadth of tools, including the first Arabic-centric model small enough to run on a laptop, delivering both small, compute-efficient models for targeted applications, and advanced model sizes for enterprise precision.     

This suite of JAIS models accommodates a wide range of use cases, and aims to accelerate innovation, development, and research opportunities for multiple downstream applications for the Arabic speaking and bilingual community.  

Inception released JAIS-13B and JAIS-13B-chat in August 2023 and subsequently launched the state-of-the-art Arabic-centric models, JAIS-30B and JAIS-30B-chat. JAIS 70B and JAIS 70B-chat have proven to be even more performant in benchmarking data in both English and Arabic compared to previous models.    

JAIS 70B retains, and in specific cases, exceeds, the high-quality English-language processing capabilities of Llama2, while vastly excelling on Arabic outputs versus the base model. The JAIS development team trained an expanded tokenizer based on the Llama2 tokenizer to enhance Arabic text processing efficiency, doubling the model’s base vocabulary. According to Sengupta, the model “splits Arabic words less aggressively and makes training and inferencing cheaper” than the standard Llama2 model.     

Related posts

Qualys Launches its API Security Platform

Editor

SentinelOne Launches Singularity RemoteOps Forensics

Editor

Vertiv and Gulf Data Hub to Deploy 16MW Data Center in DSO  

Editor

Leave a Comment