Logo Do LVLMs Understand Charts?

Analyzing and Correcting Factual Errors in Chart Captioning

1University of Illinois Urbana-Champaign
2Columbia University 3University of Macau
ACL 2024 Findings

Abstract

Recent advancements in Large Vision-language Models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications.

This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models, ultimately forming the foundation of a novel dataset, Logo Chocolate. The Chocolate dataset is split into three subsets based on the model that produces the caption: Lvlm (Large Vision-Language Model), Llm (Large Language Model), and Ft (Fine-tuned Model).

Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies. In response to this challenge, we establish the new task of Chart Caption Factual Error Correction and introduce ChartVE, a model for visual entailment that outperforms proprietary and open-source LVLMs in evaluating factual consistency. Furthermore, we propose C2TFec, an interpretable two-stage framework that excels at correcting factual errors. This work inaugurates a new domain in factual error correction for chart captions, presenting a novel evaluation mechanism, and demonstrating an effective approach to ensuring the factuality of generated chart captions.

Leaderboard on Factual Inconsistency Detection in Chart Captioning

Kendall's Tau on the Logo Chocolate dataset.

# Model Method Source Chocolate-Lvlm Chocolate-Llm Chocolate-Ft
SummaC Reference-based Text-only Link -0.011 0.023 0.036
QAFactEval Reference-based Text-only Link 0.064 0.045 0.054
LLaVA-1.5-13B Large Vision-language Model Link 0.002 0.057 0.214
ChartLlama Large Vision-language Model Link 0.010 0.065 0.141
ChartAssistant-S Large Vision-language Model Link 0.015 0.020 0.036
Bard Large Vision-language Model Link -0.014 0.105 0.291
Gemini 1.5 Pro Large Vision-language Model Link 0.034 0.060 0.175
GPT-4V Large Vision-language Model Link 0.157 0.205 0.215
GPT-4o Large Vision-language Model Link 0.250 0.244 0.305
DePlot + GPT-4 Tool-augmented Large Language Model Link 0.129 0.117 0.109
ChartVE Small Vision-language Model Link 0.178 0.091 0.215

Logo CHOCOLATE Dataset

Overview

Logo CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six most advanced models, which are categorized into three subsets:

  • Lvlm: GPT-4V, Bard (before Gemini)
  • LLM-based pipeline: DePlot + GPT-4
  • Ft: ChartT5, MatCha, UniChart
The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, CHOCOLATE consists of 1,187 examples. Each instance in CHOCOLATE consists of a caption generated by one of the model and the annotations of the factual errors for each caption sentence.

You can download the dataset on Hugging Face Dataset.

data-overview

Key statistics of Logo Chocolate.

Experiment Results

Evaluation and Qualitative Analysis

BibTeX

@inproceedings{huang-etal-2024-lvlms,
    title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
    author = "Huang, Kung-Hsiang  and
      Zhou, Mingyang and
      Chan, Hou Pong  and
      Fung, Yi R. and
      Wang, Zhenhailong and
      Zhang, Lingyu and
      Chang, Shih-Fu and
      Ji, Heng",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.85",
    doi = "10.18653/v1/2023.findings-acl.85",
    pages = "1314--1326",
}