Why ARC?

Intro

In November 2019, François Chollet released a paper called “On the measure of Intelligence”. I highly recommend you read this paper if you have interest in AGI. You can very roughly split the work into two, lopsided pieces: the paper and the abstraction and reasoning corpus (ARC) dataset.

There are plenty of great discussions of the paper, among them this post and a series of videos, which I suggest you read.

The ARC benchmark proposal in the paper poses a number of phsychometric tasks. From the paper itself:

ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test

Tasks in the dataset look like this:

The top row are the starting prompts, the bottom row are the correct answers.

To solve the task above, you would have to understand an object is being contained by a frame made up of four points (one in each corner). You would select this object and recolor the object, using the color of the frame’s points. Finally, you’d understand the output is only the selected and recoloured object.

Why ARC?

I think the ARC dataset laid out in Chollet’s paper is one of the best characterisations of the requirements of robust machine intelligence which currently exist. The paper provides a clear and comprehensive definition of intelligence to goal on and the data set materializes it’s requirements.

Two fundamental considerations in the design of ARC tasks are:

  • well constrained assumptions about priors are required to solve a task. Tasks assume a closed set of previous knowledge, like what an object is (objectness) or that a dot may represent an agent trying get to a specific spot (goal directedness)
  • limited experience, in the form of a small number of samples per task (1-10), and a small number of samples altogether (400)

These two considerations pose an interesting interplay.

  • The limited experience (in the form of a small dataset) would likely limit large end to end models that require a large number of parameters. There isn’t enough experience to learn or evolve the right model!
  • The constrained assumptions about priors , like the perception of topology or geometry, are good candidates for big models to encode from larger datasets that describe ARC world, those priors.

So why ARC ? I think ARC forces researchers to consider how two modes of building machine learning (symbolic and approximative) should co-exist in order to build systems that exhibit more general intelligence. In doing so, it’ll encourage practitioners to consider hybrid systems for far less ambitious tasks, that could potentially be much more robust, efficient and perhaps understadable.