An Explorative Study on Abstract Images
and Visual Representations
Learned from Them

The MIx Group
School of Computer Science, University of Birmingham
BMVC 2025

Abstract

Imagine living in a world composed solely of primitive shapes—could you still recognise familiar objects? Recent studies have shown that abstract images—constructed by primitive shapes—can indeed convey visual semantic information to deep learning models. However, representations obtained from such images often fall short compared to those derived from traditional raster images. In this paper, we study the reasons behind this performance gap and investigate how much high-level semantic content can be captured at different abstraction levels. To this end, we introduce the Hierarchical Abstraction Image Dataset (HAID), a novel data collection that comprises abstract images generated from common raster image datasets at multiple levels of abstraction. We then train and evaluate conventional vision systems on HAID across various tasks of classification, segmentation, and object detection, providing a comprehensive study between rasterised and abstract image representations. We also discuss if the abstract image can be considered as a potentially effective format for conveying visual semantic information and contributing to vision tasks.

Hierarchical Abstraction Image Dataset

HAID-MiniImageNet

HAID-Caltech-256

HAID-CIFAR-10

Results

We evaluate conventional vision systems on HAID across various tasks of classification, segmentation, and object detection, providing a comprehensive study between rasterised and abstract image representations.

Classification

Classification Result 1

Results of MobileNetv2 on HAID-MiniImageNet and MiniImagenet

Classification Result 2

Results of ResNet50 on HAID-MiniImageNet and MiniImagenet

Semantic Segmentation

Segmentation Result 1

Results of DeepLabv3 initialized by different MobileNetv2 backbones

Segmentation Result 2

Results of DeepLabv3 initialized by different ResNet50 backbones

Object Detection

Detection Result 1

Results of SSD-Lite initialized by MobileNetv2 backbones

Detection Result 2

Results of Faster-RCNN initialized by ResNet50 backbones

User study

We also conduct a user study to evaluate how well humans can recognize objects in abstract images at different levels of abstraction. We select 36 images in total across six levels (30, 50, 100, 500, 1,000 shapes, and original images) from HAID-MiniImageNet and MiniImageNet (each level contains 6 images). Images for each abstract level were balanced by a priori difficulty: three single-object images with the simple background (labelled "easy samples") and three images with multiple objects, complex textures, or cluttered scenes (labelled "hard samples"). Participants viewed all 36 images in randomised order and provided a single 1-5 rating per image reflecting how confident they are to perceive the object(s) (1 = cannot recognise at all, 5 = extremely confident). The user study shows that: 1. HAID abstractions retain perceptually relevant structure: at moderate-to-high fidelity (500 and 1,000 primitives), observers report confidence of perception close to original images. 2. Harder samples require more primitives to reach comparable perceptual clarity, suggesting adaptive allocation of abstraction budget may benefit recognition tasks that require fine-grained discrimination.

User Study bar chart
User Study line chart

Left: Mean Opinion Score (MOS) of all samples. Right: MOS for easy and hard samples.

BibTeX

@inproceedings{li2025explorative,
title={An Explorative Study on Abstract Images and Visual Representations Learned from Them},
author={Li, Haotian and Jiao, Jianbo},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025}
}