Background and Scope
---------------------
While interest in automatic approaches to Counterspeech generation has been steadily growing,
including studies on data curation (Chung et al., 2019a; Fanton et al., 2021), detection (Chung
et al., 2021a; Mathew et al., 2018), and generation (Tekiroglu et al., 2020; Chung et al., 2021b;
Zhu and Bhat, 2021; Tekiroglu et al., 2022), the large majority of the published experimental work on automatic Counterspeech generation has been carried out for English. This is due to the scarcity of both non-English manually curated training data and to the crushing predominance of English in the generative Large Language Models (LLMs) ecosystem. A workshop on exploring Multilingual Counterspeech Generation is proposed to promote and encourage research on multilingual approaches for this challenging topic.
Thus, this workshop aims to test monolingual and multilingual LLMs in particular and Language Technology in general to automatically generate counterspeech not only in English but also in languages with fewer resources. In this sense, an important goal of the workshop will be to understand the impact of using LLMs, considering for example how to deal with pressing issues such as biases, hallucinated content, data scarcity or data contamination.
We seek to maximize the scientific and social impact of this workshop by promoting the
creation of a community of researchers from diverse fields, such as computer and social sciences, as well as policy makers and other stakeholders interested in automatic counterspeech generation. By doing so we aim to gain a deeper understanding of how counterspeech is currently used to tackle abuse by individuals, activists, and organizations
and how Natural Language Processing (NLP) and Generation (NLG) may be best applied to counteract it.
Call for Papers
---------------------
We welcome submissions on the following topics (but not limited to):
- Models and methods for generating counterspeech in different languages.
- Automatic Counterspeech generation for low resource languages with scarce training data.
- Dialogue agents that use counterspeech to combat offensive messages that are directed to individuals or groups, targeted based on various aspects such as ideology, gender, sexual orientation and religion.
- Methods for human and automatic evaluation of counterspeech.
- Multidisciplinary studies providing different perspectives on the topic such as computer science, social science, psychology, etc.
- Development of taxonomies and quality datasets for counterspeech in multiple languages.
- Potentials and limitations (e.g., fairness, biases, hallucinated content) of applying different NLP methods, such as LLMs, to generate counterspeech.
- Social impact and empirical studies of counterspeech in social networks, including research on the effectiveness and consequences for users of using counterspeech to combat hate online.
Submission
---------------------
We welcome two types of papers: regular workshop papers and non-archival submissions. Regular workshop papers will be included in the workshop proceedings. All submissions must be in PDF format and made through START [https://softconf.com/coling2025/MCG25/]
- Regular workshop papers: Authors can submit papers up to 8 pages, with unlimited pages for references. Authors may submit up to 100 MB of supplementary materials separately and their code for reproducibility. All submissions undergo an double-blind single-track review. Accepted papers will be presented as posters with the possibility of oral presentations.
- Non-archival submissions: Cross-submissions are welcome. Accepted papers will be presented at the workshop, but will not be included in the workshop proceedings. Papers must be in PDF format and will be reviewed in a double-blind fashion by workshop reviewers. We also welcome extended abstracts (up to 2 pages) of papers that are work in progress, under review or to be submitted to other venues. Papers in this category need to follow the COLING format.
Important Dates
---------------------
- Submission: November 20th, 2024
- Notification of Acceptance: December 2nd, 2024
- Camera-Ready Papers Due: December 10th, 2024
-----------------------------------------------------
Shared Task on Multilingual Counterspeech Generation
-----------------------------------------------------
In addition to paper contributions, we are organizing a shared task on multilingual counterspeech generation with the aim of sharing in a central space current efforts, especially those for languages different to English.
It is envisaged that the shared task would allow the community to study how we can improve counterspeech generation for both lower resource languages but also to reinforce the strong body of research already existing for English.
The counterspeech generated by participants should be respectful, non-offensive, and contain information that is specific and truthful with respect to the following targets: Jews, LGBT+, immigrants,, people of color, women.
Data
---------------------
We release new data consisting of 597 Hate Speech-Counter Narrative (HS-CN) pairs. In this dataset, the HS are taken from MTCONAN [https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN], while the CN are newly generated. Together with each HS-CN pair, we also provide 5 background knowledge sentences, some of which are relevant for obtaining the Counter Narratives. The dataset is available in 4 different languages (Basque, English, Italian and Spanish) and divided in the following splits:
- Development: 100 pairs. [AVAILABLE NOW!] [https://huggingface.co/datasets/LanD-FBK/ML_MTCONAN_KN]
- Train: 397 pairs [AVAILABLE in OCTOBER!]
- Test: 100 pairs [TBA]
In order to score the shared task participants, the CNs will be kept hidden during the shared task while the HS and the background knowledge will be released for participants to prepare their submissions.
The languages, Basque, English, Italian and Spanish, offer a varied spectrum of complexity including an agglutinative language isolate (Basque), two romance languages (Italian, Spanish) and a Germanic one (English). The choice of languages obeys to the existing linguistic expertise among the organizers required to successfully run the shared task.
Additional Multilingual Data
---------------------
Participants also have available the English, Italian, Basque and Spanish CONAN manually curated data:
- CONAN (English and Italian) [https://github.com/marcoguerini/CONAN/tree/master/CONAN]
- CONAN (Basque and Spanish) [https://huggingface.co/datasets/HiTZ/CONAN-EUS]
- CONAN-MT-SP (Spanish) [https://github.com/sinai-uja/CONAN-MT-SP] or [https://huggingface.co/datasets/SINAI/CONAN-MT-SP]
- Multitarget-CONAN (English) [https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN]
Participation
---------------------
The aim of the shared task is, given a HS (and optionally any additional knowledge the participants may like to use), generate a CN to counteract the HS.
An example of CN appear bellow:
- English:
+ HS: We should ban all homosexuals.
+ CN: When will the love prosper and the hatred start to dissipate? I will not only respect my fellow LGBT+ people, I will promote their rights.
- Italian:
+ HS: Dovremmo espellere tutti gli omosessuali.
+ CN: Quand'è che l'amore prospererà e l'odio inizierà a dissiparsi? Non solo rispetterò i miei compagni LGBT+, ma promuoverò i loro diritti.
- Spanish:
+ HS: Deberíamos prohibir a todos los homosexuales.
+ CN: ¿Cuándo prosperará el amor y empezará a disiparse el odio? No sólo respetaré a mis compañeros LGBT+, sino que promoveré sus derechos.
- Basque:
+ HS: Homosexual guztiak debekatu beharko genituzke.
+ CN: Noiz hasiko da maitasuna irabazten eta gorrotoa desagertzen? LGBT+ pertsonak errespetatzeaz gain, haien eskubideak sustatuko ditut.
Participants will download the test HS for the 4 languages and generate at most three different CNs per HS for each language). The test window will last 5 days.
Participants are allowed to use any resource (language model, data, etc.) to generate the CN.
Evaluation
---------------------
The CNs submitted by the participants will be evaluated:
- Using traditional automatic metrics as in Tekiro ̆glu et al.( 2022), which include BLEU, ROUGE, Novelty and Repetition Rate.
- Using LLM as a Judge following the approach described in this paper: https://arxiv.org/abs/2406.15227
Important Dates
---------------------
- Test dataset release: October 21st, 2024
- Results submission: October 25th, 2024
- Results notification: November 10th, 2024
- Working papers submission: November 20th, 2024
- Notification of Acceptance: December 3rd, 2024
- Camera-Ready Papers Due: December 10th, 2024
- Workshop: January 19th, 2025
For more information you can joint the Google group [https://groups.google.com/g/multilingual-cs-generation-coling2025] or visit our website [https://sites.google.com/view/multilang-counterspeech-gen/home]
Best regards,
The Multilingual Counterspeech Generation Workshop Organizers.