CLOMO: Counterfactual LOgical MOdification with Large Language Model

1City Univeristy of Hong Kong, 2Tsinghua University,
3Tencent AI Lab, Seattle, 4The Hong Kong University of Science and Technology (Guangzhou)
5Shenzhen Campus of Sun Yat-sen University, 6MBZUAI, 7DarkMatter AI Research, 8City University of Hong Kong (Shenzhen) Research Institute

Task and Dataset

In the Counterfactual Logical Modification (CLOMO) task, a model is given a pair of Argument and Premise 1 in the relation R, and then is given an additional Premise 2 that perturbs R. The model is required to modify Argument to Argument' such that R stands in the Argument'-Premise 2 pair.

We thus introduce the CLOMO dataset with 1,000 high-quality and challenging questions in four logical relations. The data is collected by multi-turn human annotation and verification.

You can download the dataset on the GitHub repo.

Statistics of CLOMO.

CLOMO input length statistics by number of tokens. CoT: The chain-of-thought setting. Few: The few-shot setting. Zero: The zero-shot setting.

Examples

Examples in CLOMO.



Examples of different prompting setups for CLOMO.

Self-Evaluation Score (SES)

Additionally, we introduce a Self-Evaluation Score (SES) for the logically consistent generation in CLOMO. SES decomposes the evaluation into several LLMs basic discrimination tasks, which is demonstrated comparable with human evaluation.

Decomposed SES evaluation tasks.

BibTeX


      @article{huang2023clomo,
        author = {Huang, Yinya and Hong, Ruixin and Zhang, Hongming and Shao, Wei and Yang, Zhicheng and Yu, Dong and Zhang, Changshui and Liang, Xiaodan and Song, Linqi},
        journal = {The 62nd Annual Meeting of the Association for Computational Linguistics},
        title = {{CLOMO}: Counterfactual Logical Modification with Large Language Models},
        booktitle = {The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)},
        year = {2024}
       }