An Explorative Study on Abstract Images and Visual Representations Learned from Them

Abstract

Imagine living in a world composed solely of primitive shapes—could you still recognise familiar objects? Recent studies have shown that abstract images—constructed by primitive shapes—can indeed convey visual semantic information to deep learning models. However, representations obtained from such images often fall short compared to those derived from traditional raster images. In this paper, we study the reasons behind this performance gap and investigate how much high-level semantic content can be captured at different abstraction levels. To this end, we introduce the Hierarchical Abstraction Image Dataset (HAID), a novel data collection that comprises abstract images generated from common raster image datasets at multiple levels of abstraction. We then train and evaluate conventional vision systems on HAID across various tasks of classification, segmentation, and object detection, providing a comprehensive study between rasterised and abstract image representations. We also discuss if the abstract image can be considered as a potentially effective format for conveying visual semantic information and contributing to vision tasks.

HAID-MiniImageNet

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

HAID-Caltech-256

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

HAID-CIFAR-10

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

User study

We also conduct a user study to evaluate how well humans can recognize objects in abstract images at different levels of abstraction. We select 36 images in total across six levels (30, 50, 100, 500, 1,000 shapes, and original images) from HAID-MiniImageNet and MiniImageNet (each level contains 6 images). Images for each abstract level were balanced by a priori difficulty: three single-object images with the simple background (labelled "easy samples") and three images with multiple objects, complex textures, or cluttered scenes (labelled "hard samples"). Participants viewed all 36 images in randomised order and provided a single 1-5 rating per image reflecting how confident they are to perceive the object(s) (1 = cannot recognise at all, 5 = extremely confident). The user study shows that: 1. HAID abstractions retain perceptually relevant structure: at moderate-to-high fidelity (500 and 1,000 primitives), observers report confidence of perception close to original images. 2. Harder samples require more primitives to reach comparable perceptual clarity, suggesting adaptive allocation of abstraction budget may benefit recognition tasks that require fine-grained discrimination.

Left: Mean Opinion Score (MOS) of all samples. Right: MOS for easy and hard samples.

BibTeX

@inproceedings{li2025explorative, title={An Explorative Study on Abstract Images and Visual Representations Learned from Them}, author={Li, Haotian and Jiao, Jianbo}, booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025}, publisher = {BMVA}, year = {2025} }

An Explorative Study on Abstract Images and Visual Representations Learned from Them

Abstract

Hierarchical Abstraction Image Dataset

HAID-MiniImageNet

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

500 shapes

1,000 shapes

Original image

HAID-Caltech-256

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

HAID-CIFAR-10

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

10 shapes

30 shapes

50 shapes

100 shapes

Original image

Results

Classification

Semantic Segmentation

Object Detection

User study

BibTeX

An Explorative Study on Abstract Images
and Visual Representations
Learned from Them