A recent study has highlighted the growing importance of large Arabic language models in strengthening the Arabic language’s presence in digital spaces and enhancing its global competitiveness. The study found that Saudi Arabia ranked first among countries developing Arabic language models in 2025, according to SPA.
Conducted by the Saudi Data and Artificial Intelligence Authority (SDAIA) in cooperation with the King Salman Global Academy for Arabic Language (KSGAAL), the study aims to support the development of an Arabic-language AI ecosystem and identify the requirements for building more advanced models capable of understanding Arabic and its diverse dialects, generating content, and executing instructions.
The study traced the evolution of Arabic language models from early rule-based systems prior to 2000, through statistical and neural network models, to the current generation of large language models and generative AI applications between 2022 and 2025. This period saw the launch of dozens of Arabic models, including conversational and generative systems designed to serve technological, educational, and knowledge-based use cases.
As of the first quarter of 2025, the study identified more than 53 Arabic language models, with Saudi Arabia leading in their development. It also noted increasing international interest in Arabic-supportive models, while pointing out limited investment in audio-visual and multimodal models despite their future significance. Text-based models accounted for 81% of the total, compared with 7% for multimodal models.
According to results from the BALSAM benchmark issued by KSGAAL, global language models generally outperformed Arabic-specific models across most linguistic skill categories. However, several Arabic models demonstrated promising strengths, showing a slight advantage in summarisation and comparable performance in creative writing and reading comprehension.
The study outlined a roadmap to achieve leadership in Arabic large language models, recommending the development of high-quality Arabic datasets covering multiple dialects and domains, models with varying sizes and capabilities, dedicated Arabic benchmarks, and broader adoption by public and private institutions as well as the wider community.
The findings form part of ongoing cooperation between SDAIA and KSGAAL and reflect Saudi Arabia’s commitment to integrating linguistic and cultural identity with technological advancement, reinforcing the Kingdom’s position as a regional hub for Arabic language AI development.