The Arabic LLM Edition

On governance, technology, and progress

Sep 12, 2023

Colin here. As AI matures, more languages will be needed in translation. The UAE has been an early mover in positioning itself to reap the benefits of the advances, and is already making moves. The country is rushing to purchase the high powered Nvidia chips needed for AI software and are taking things a step further with their own models. The FT reports:

An artificial intelligence group with links to Abu Dhabi’s ruling family has launched what it described as the world’s highest-quality Arabic AI software, as the United Arab Emirates pushes ahead with efforts to lead the Gulf’s adoption of generative AI. The large language model known as Jais is an open-source, bilingual model available for use by the world’s 400mm-plus Arabic speakers, built on a trove of Arabic and English-language data.

The LLM uses modern standard Arabic, and is also being trained on other regional dialects via social posts and other methods.

Why is this interesting?

Current large language models (LLMs) like ChatGPT are great in English—which provides the majority of its corpus—but can have some limitations in other languages, particularly those less common on the web. More importantly, though, they’re naturally biased towards the Western world, since so much of the training data comes from those countries. Going a level deeper, that also means a focus on Western/Judeo-Christian values. Purposefully or not, American-trained LLMs won’t always sync up to the beliefs of the Arab world.

But there’s a deeper story here as well. As the role of AI and LLMs grow, the layer that sits on top—currently called alignment—which ensures that the model doesn’t go too far off script, offers quite a bit of editorial power to its creators. Try asking ChatGPT how to do something violent and it won’t give you an answer. Even a simple question about installing a light switch returns a heavy caveat: “If you're unfamiliar with electrical work or have any uncertainties, you should consult or hire a qualified electrician to complete the installation for you.” That almost definitely doesn’t come from the original training data, but is a layer on top that ensures ChatGPT doesn’t stray far from the pasture.

If you’re the UAE’s National Security Advisor, who, according to the FT, is one of the main drivers behind the new model, it makes sense that you would want to help drive this kind of decision. Just as governments have long recognized that changes to curriculum can help drive agendas, in a world of LLMs the ability to control what it means to be “aligned” holds great power. (CJN)

—

Thanks for reading,

Noah (NRB) & Colin (CJN)

—

Why is this interesting? is a daily email from Noah Brier & Colin Nagy (and friends!) about interesting things. If you’ve enjoyed this edition, please consider forwarding it to a friend. If you’re reading it for the first time, consider subscribing (it’s free!).

Why is this interesting?

The Arabic LLM Edition

On governance, technology, and progress

Discussion about this post