A curious relationship between IRT and achievement motivation

curiosity
reinforcement learning
Author

Alexandr Ten

Published

2026-Mar

TL;DR

There is a neat relationship between (1) item response theory, (2) Elo rating system, (3) achievement motivation theory and (4) the learning progress hypothesis (from the psychology of curiosity literature). This 4-way relationship is not obvious from the contexts and purposes of these four entities, yet it pops out quite effortlessly, once we look closely at each of them with the others in mind.

Item Response Theory

Item Response Theory (IRT) is a popular psychometric framework often used in education science to quantitatively characterize abilities of students and difficulties of test items. IRT proposes mathematical models for describing the relationship between item difficulty, student ability, and the probability of the student to respond correctly to the item. Perhaps the simplest model is the so-called “one parameter logistic” (1PL) Rasch model, named after the Danish mathematician Goerg Rasch. Suppose we are interested in a binary random variable Z: \{\mathrm{incorrect},~\mathrm{correct}\} \mapsto \{0, 1\}. The 1PL model defines the probability of ‘correct’ as:

P(Z = 1 | \theta, \delta) = f(\theta; \delta) = \frac{1}{1 + e^{-(\theta - \delta)}}

where \theta is interpreted as the student’s ability and \delta is the difficulty parameter of the item. For a given item difficulty \delta, the probability of responding correctly increases. The difficulty parameter shifts the entire curve horizontally. Thus, at a given ability level (e.g., \theta=1), the probability of responding correctly to an item with \delta=0 is higher compared to an item with \delta=1.

Elo Rating System

The Elo rating system is another popular framework used in competitive games (e.g., chess) to rank players and determine matchmaking. It uses a logistic function similar to the IRT’s to model the probability of one player winning against another. In the Elo system, each player has a numeric rating that predicts the outcome of a match between two players. I will borrow an example from the Wikipedia article. Suppose player A has a rating r_A and player B has a rating r_B. From these, the Elo system calculates a player’s score s_{XY} \in (0, 1), interpreted as the probability of some player X winning against an opponent Y, plus half the probability of a draw between them:

s_{AB} = \frac{1}{1 + 10^{(r_B - r_A)/400}}

This function is similar to f(\theta) from above and it is not a coincidence. Both functions range between 0 and 1 and are interpreted as mapping to a probability measure. The details are different but the fundamental shape is the same and the interpretation is similar. At a fixed rating of player A, increasing player B rating implies lower probability of player A winning or drawing. The use of 10 as the base of the exponent and the scaling constant of 400 are motivated by the specific problem setting.

Importantly, the Elo system prescribes a way to update the ratings of the players once a match between them concludes. Player A’s rating is updated into a new rating r'_A according to:

r'_A = r_A + K (s_A - r_A)

where s_A is the outcome for player A (0 for loss, 1 for win); K, called “the K-factor, is a positive parameter determining how much the outcome changes the rating. E.g., if K=0 the rating would not change at all. This rule is analogous to the Rescorla-Wagner learning rule.

IRT and Elo in Adaptive Learning Systems

The relationship between the Elo systems and IRT is leveraged in learning systems like “Math Garden”. Namely, each student and practice item can be viewed as two players competing against one another. The student wants to “win” against the item by solving it; the items wants to “win” against the learner by preventing the student from solving it. Every time a student is “matched up” against an item, we note the outcome (success if the student solves the item) and update the student ability and item difficulty according to the Rescorla-Wagner rule. This scheme has nice intuitive properties. For example, if a student solves an item, we should update their ability. We don’t want to update by much if this outcome was expected (e.g., if a very smart student solves a very easy problem). However, if an average student solves a very difficult problem, we’d want to change our view of the student as “average”. Applying the update rule from the Elo system to the ability-testing context of IRT, we can define a rule for updating the estimated ability and item difficulty:

\theta' = \theta + K_\theta (Z - f(\theta; \delta)) \\ \delta' = \delta + K_\delta (1 - Z - f(\theta; \delta))

It seems reasonable to define learning progress as “change in ability over time”, or \Delta \theta (allowing progress to be negative, i.e., allowing regress). The magnitude of learning progress is determined by K_\theta and the discrepancy between Z and f(\theta; \delta). Letting p_z = P(Z = z \mid \theta, \delta), we can define the expected learning progress as:

\begin{align*} \mathbb{E}_{Z}[|\Delta \theta|] = \sum_z p_z K_\theta |z - p_z| & = K_\theta \sum_z (z - p_z) \\ & = K_\theta \Big[ (1 - p_z) |0 - p_z| + p_z |1 - p_z| \Big] \\ & = 2 K_\theta p_z (1 - p_z) \end{align*}

In other words:

\mathbb{E}_{Z}[|\Delta \theta|] \propto p_z (1 - p_z)

The function has an inverted-U shape – it peaks at p_z = 0.5, suggesting that the student can expect to learn the most from items that match their ability level (in IRT terms). If the item is too easy or too difficult, learning progress is expected to be low.

Achievement Motivation

In 1957, a motivation psychologist John Atkinson proposed model of achievement motivation. The model assumes that motivation to pursue a particular goal g is a function of 3 factors: motive m, expectancy p, and incentive value v: g = mpv. Defining the incentive value as 1-p, we get:

g = mp(1-p)

Setting v = (1-p) corresponds to valuing goals that are difficult to achieve.

Concluding Remark

It should be obvious that Atkinson’s model is similar to the definition of expected learning progress we arrived at above. In 2025, I co-authored a paper, where we discussed several theories of curiosity that predict an inverted-U relationship between curiosity and knowledge. There, I speculated that motivational mechanisms like Atkinson’s may approximate an idealistic notion of learning-progress maximization as an optimal way to optimize one’s knowledge. Considering how learning progress might be defined in the context of IRT and automated learning systems, my speculation gains more traction (at least in my head). It seems like achievement motivation might be a mechanism that achieves (expected) learning-progress maximization without an explicit computation (or approximation) of the derivative of some learning function.