To Make Language Models Work Better, Researchers Sidestep Language

Descriptive image for To Make Language Models Work Better, Researchers Sidestep Language

Language isn’t always necessary. While it certainly helps in getting across certain ideas, some neuroscientists have argued that many forms of human thought and reasoning don’t require the medium of words and grammar. Sometimes, the argument goes, having to turn ideas into language actually slows down the thought process.

Now there’s intriguing evidence that certain artificial intelligence systems could also benefit from “thinking” independently of language.

When large language models (LLMs) process information, they do so in mathematical spaces, far from the world of words. That’s because LLMs are built using deep neural networks, which essentially transform one sequence of numbers into another — they’re effectively complicated math functions. Researchers call the numerical universe in which these calculations take place a latent space.

But these models must often leave the latent space for the much more constrained one of individual words. This can be expensive, since it requires extra computational resources to convert the neural network’s latent representations of various concepts into words. This reliance on filtering concepts through the sieve of language can also result in a loss of information, just as digitizing a photograph inevitably means losing some of the definition in the original. “A lot of researchers are curious,” said Mike Knoop, co-creator of one of the leading benchmarks for testing abstract reasoning in AI models. “Can you do reasoning purely in latent space?”

Two recent papers suggest that the answer may be yes. In them, researchers introduce deep neural networks that allow language models to continue thinking in mathematical spaces before producing any text. While still fairly basic, these models are more efficient and reason better than their standard alternatives.

“It’s an exciting new research direction,” said Luke Zettlemoyer, a computer scientist and natural language processing expert at the University of Washington who wasn’t involved in either paper.

Getting Loopy

A team led by Tom Goldstein, of the University of Maryland, had also been working on the same goal. Last year, they designed and trained a transformer that not only learned to reason in latent space, but also figured out when to stop and switch back to language on its own. But this team came at the task from a different direction than Hao’s.

All modern LLMs have a fixed number of transformer layers. “It seems fundamentally limiting,” Goldstein said, since it means that problems that need extra computations — more passes through layers — don’t get them. This was particularly true for early LLMs, which had relatively few layers. Goldstein wanted to figure out a way to increase the number of layers in an LLM on demand.

Click HERE to read the full article

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.