Language Modelling

What is a ‘model,’ a ‘language model‘ and what makes it ‘large’

Aug 14, 2024

August 9, 2024

It may seem pedantic, but it'd be useful to begin by dissecting the expression, ‘Large Language Model’, into its individual terms.

What is a model? A model is an informative representation of a phenomenon. This might be a physical phenomenon, where the model serves to overcome inconveniences of size. A map, for example, helps navigating a territory by condensing spatial information of many square kilometers into a handheld object. A model of a building or car forms expectations ahead of its projected construction/ manufacturing. On the other hand, there are big models of small physical phenomena, like a ball and stick representation of a molecule to help study chemistry, or charts of bodily tissues and their cells. And in between there are 1:1 models, such as fashion models and mannequins, which gives us an idea about how a piece of clothing would look like worn on us.

But models don't have to be spatial models. In medicine, for example, ‘animal models’ are a fancy term for laboratory animals. Experiments are conducted on them to derive information about the human-animal, under the presumption that the corresponding species shares key physiological features with ours.

And then there are entirely abstract, non-physical, models, such as those used in sciences and other theoretical disciplines. An economic model is a mathematical formulation that describes the relationship and interaction between economical units and is used to project the development of an economy given the present state or to predict consumer behaviour after a price change, for example. A psychological model does the equivalent to human cognition and behaviour. These are derived from empirical observations of regularities in the respective domain.

In a manner, all sciences are instructions for modelling phenomena, meant to carefully improve on our intuitive modelling. Without being necessarily physicists, we all have a model in our head of the physical world. When we meet a brick, we automatically generate expectations, for example: how much force we would need to pick it up, how it would feel in our hand, what trajectory it would make if we threw it this or that way, how it will affect the dynamics of a blanket on a windy day. All of these expectations even though it was the first time we met that particular brick. Indeed, what looked like a piece of baked clay might turn out to be a piece of textured foam, paintsprayed brown-red, contradicting all of our expectations and breaking our heart.

Though nobody had called it so, ‘language modelling’ is what linguists were busy with since they first emerged in the world. They took the phenomenon that is language and applied analysis on it, deriving rules and categories. Physics predicts that the apple will fall to the ground when detached from the tree, linguistics ‘predicts’ that the past tense of to fall is fell (or that to bly's —not a real verb— would be blied), that verbs correspond in number and gender to their subjects, that words take a certain order in a sentence, and so on.

An LLM is such a language model. It might not seem obvious at first glace, and indeed there's an important caveat which we will come back to address on part 2. For now I'll just say that an LLM is different from the linguists' model, but a model of language it is nonetheless. Mind that the agent-simulating systems such as the celebrated ChatGPT are more than a bare LLM (we'll return to the relation between the two at the end of part 1). LLMs complete texts, one word at a time. Therefore they model how texts begin; for example, with the word It. Then they model how texts that begin with the word It continue. And so on.

The ‘large’ part comes from LLMs being, indeed, large. An LLM is a kind of an artificial neural network. An artificial neural network is conceptually a collection of nodes, ‘neurons,’ connected in a network through which information flows from one end to the other. In effect an artificial neural network is a complicated function, not unlike,

\(f(x) = x^2 + x*a\)

but less pretty. Here on the left we have an input x and a single parameter a; how we set this parameter would affect the input-output relationship of the function. An LLM is a function whose input and output is text, with billions of parameters set during the process of ‘training the network.’

Next ☞

‘LLMs’ are merely large language models

Table of Contents

Discussion about this post