The Battlefront of Combating Misinformation and Coping with Media Bias
Schedule
August 14, 2022 (Sunday)
Table of Contents
Abstract
The growth of online platforms has greatly facilitated the way people communicate with each other and stay informed about trending events. However, it has also spawned unprecedented levels of inaccurate or misleading information, as traditional journalism gate-keeping fails to keep up with the pace of media dissemination. These undesirable phenomena have caused societies to be torn over irrational beliefs, money lost from impulsive stock market moves, and deaths occurred that could have been avoided during the COVID-19 pandemic, due to the infodemic that came forth with it, etc. Even people who do not believe the misinformation may still be plagued by the pollution of unhealthy content surrounding them, an unpleasant situation known as information disorder. Thus, it is of pertinent interest for our community to better understand, and to develop effective mechanisms for remedying, misinformation and biased reporting.
Target Audience and Prerequisites
Based on the level of interest in this topic, we expect around 150 participants. While no specific background knowledge is assumed of the audience, it would be best for the attendees to know basic deep learning, pre-trained word embeddings (e.g., Word2Vec) and language models (e.g.,BERT).
Outline
Tutorial Slides (Please download the slides as PowerPoint format for best viewing.)
Background and Motivation
We begin motivating the tutorial topic with a selection of real-world examples of fake news and their harmful impacts to society, followed by a pedagogical exercise of how humans tend to approach the problem of fake news detection, characterization, and correction. We will point out conceptual distinctions amongst various types of fake news, including serious fabrication in news journalism about misattributed or nonexisting events, over-sensationalized clickbaits, hoaxes which are false with the intention to be picked up by traditional news websites and satire which mimic genuine news but contain irony and absurdity. For example, in general, news articles more likely involve serious fabrications, while social media posts involve more humour such as satire and hoaxes. We will also describe the cognitive, social and affective factors that lead people to form or endorse misinformed views (e.g., intuitive thinking, illusory truths, source cues, emotions, etc.), and the psychological barriers to knowledge revision after misinformation has been corrected, including correction not integrated, selective retrieval, and continued influence theories.
Fake News Detection (60min)
Bearing in mind the different categories and psychological drivers of misinformation, we introduce detection based on:
- stylistic approaches that focus on lexical features, readability, and syntactic clues.
- fact-checking approaches that compare check-worthy content with background knowledge, such as external knowledge bases (FreeBase, WikiBase, etc) and previously fact-checked claims.
- semantic-consistency approaches that extract features related to single-document discourse-level coherence and cross-document event-centric coherence in text. Extending to cross-media domain, the common strategy is to check text-image consistency and text-video consistency
- propagation patterns that capture confounding factors from the dynamics of how a news topic spreads and the social network interactions .
We will discuss the merits and the limitations of these different lines of fake news detection approaches. For example, fact-checking approaches may not fare well for early rumours or breaking news not yet groundable to an established background knowledge, in which case, the credibility of the news source can offer complementary assistance. Stylistic approaches may be simple but yet effective for detecting low-quality human-written fake news, but not so good for machine-generated misinformation, which is stylistically consistent regardless of the underlying motives. We then cover recent approaches that leverage a combination of these elements for greater representation power and robustness. Importantly, we also cover works that explore the diachronic bias of fake news detection and portability across data in different time and language settings.
Special Note on Neural Fake News Generation & Detection:
Advancements in natural language generation spawn the rise of news generation models which represent a double-edged sword. On one hand, malicious actors may irresponsibility take advantage of the technology to influence opinions and gain revenue. But, on the other hand, it can also be used as a source of machine-synthesized training data for detector models to overcome data scarcity since real-world fake news tends to be eventually removed by platforms, as well as a tool for threat modeling to develop proactive defenses against potential threats. We review how popular detectors perform on fake news created from large-scale language and vision generator model. We also review progress in generating fake news that better aligns with the key topic and facts, and work towards applying topic/fact-constrained fake news generation to construct silver-standard data annotations for finer-grained fake news detection.
Fake News Characterization (30min)
To better understand and fight fake news, we next address some fundamental questions of characterizing fake news based on underlying source bias, reporting agenda, propaganda techniques, and target audience. First, we introduce modeling approaches for detecting political and socio-cultural biases in news articles. Next, we introduce the recent EMU benchmark that require models to answer open-ended questions capturing the intent and the implications of a media edit. We cover methodologies for identifying the specific propaganda techniques used, e.g., smears, glittering generalities, association transfer, etc. We also discuss the latest explorations in predicting the intended target of harmful media content, e.g., the person, the organization, the community, or the society level.
Corrective Actions for Misinformation and Biased News Reporting (30min)
After misinformation has been detected and categorized based on its various characteristics, there is naturally follow-up interest in corrective explanations on why a piece of information is fake or misleading, and how to report less biased and more comprehensive news in general. Hence, we cover frameworks for explaining why a given piece of news is actually fake news through the leverage of reader comments, as well as appropriate strategies for placing the corrective explanations based on user studies. We also cover research on mitigating media bias, such as through neutral article generation.
Industry Initiatives:
We further point out recent actions by tech companies with media-hosting platforms for fighting fake news. With urges from the government, they experiment with removing economic incentives for traffickers of misinformation, promoting media literacy, suspending improper posts and accounts, and adding colored labels, with corrections constructed from a community-based point system similar to Wikipedia, directly beneath misinformation posted by public figures (https://www.nbcnews.com/tech/tech-news/twitter-testing-new-ways-fight-misinformation-including-community-based-points-n1139931).
Concluding Remarks & Future Directions (30min)
Finally, we summarize the major remaining challenges in this space, including the detection of subtle inconsistencies, enforcing schema or logical constraints in the detection, identifying semantically consistent but misattributed cross-media pairings, and greater precision in fine-grained explanations for the detected misinformation.
Tutors
Yi R. Fung is a second-year Ph.D. student at the Computer Science Department of UIUC, with research interests in knowledge reasoning, misinformation detection, and computation for the social good. Her recent works include the InfoSurgeon fake news detection framework, and multiview news summarization. Yi is a recipient of the NAACL’21 Best Demo Paper, the UIUC Lauslen and Andrew fellowship, and the National Association of Asian American Professionals Future Leaders award. She has also been previously selected for invited talk (1 hour presentation) at the Harvard Medical School Bioinformatics Seminar. Additional information is available at https://yrf1.github.io.
Kung-Hsiang Huang is a first-year Ph.D. student at the Computer Science Department of UIUC. His research focuses on fact-checking and fake news detection. Prior to joining UIUC, he obtained his B.Eng. in Computer Science from the Hong Kong University of Science and Technology, and his M.S. in Computer Science from USC. He is also a co-founder of an AI startup, Rosetta.ai. Additional information is available at https://khuangaf.github.io.
Preslav Nakov is a Professor at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), who received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant). Dr. Nakov is President of ACL SIGLEX, Secretary of ACL SIGSLAV, a member of the EACL advisory board, as well as a member of the editorial board of Computational Linguistics, TACL, CS&L, IEEE TAC, NLE, AI Communications, and Frontiers in AI. His research on fake news was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, MIT Technology Review, Science Daily, Popular Science, The Register, WIRED, and Engadget, among others. He has driven relevant tutorials such as:
- WSDM’22: Fact-Checking, Fake News, Propaganda, Media Bias, and the COVID-19 Infodemic.
- CIKM’21: Fake News, Disinformation, Propaganda, and Media Bias.
- EMNLP’20: Fact-Checking, Fake News, Propaganda, and Media Bias: Truth Seeking in the Post-Truth Era.
Additional information is available at https://en.wikipedia.org/wiki/Preslav_Nakov.
Heng Ji is a Professor at the Computer Science Department of the University of Illinois Urbana-Champaign, and an Amazon Scholar. Her research interests focus on NLP, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as “Young Scientist” and a member of the Global Future Council on the Future of Computing by the World Economic Forum. The awards she received include “AI’s 10 to Watch” Award, NSF CAREER award, Google Research Award, IBM Watson Faculty Award, Bosch Research Award, Amazon AWS Award, ACL2020 Best Demo Paper Award, and NAACL2021 Best Demo Paper Award. She has given a large number of keynotes and 20 tutorials on Information Extraction, Natural Language Understanding, and Knowledge Base Construction in many conferences including but not limited to ACL, EMNLP, NAACL, NeurIPS, AAAI, SIGIR, WWW, IJCAI, COLING and KDD. A selected handful of her recent tutorials include:
- AAAI’22: Deep Learning on Graphs for Natural Language Processing.
- EMNLP’21: Knowledge-Enriched Natural Language Generation.
- ACL’21: Event-Centric Natural Language Processing.
Reading List
Fake News Detection
Style
- The language of legal and illegal activity on the Darknet. Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri Abend. ACL 2019
- Automatic detection of fake news. Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, Rada Mihalcea. COLING 2018
- Truth of varying shades: Analyzing language in fake news and political fact-checking. Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi. EMNLP 2017.
- This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. Benjamin D. Horne, Sibel Adali. Arxiv 2017.
Fact Checking
- Predicting factuality of reporting and bias of news media sources. Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov. EMNLP 2018
- A survey on automated fact-checking. Zhijiang Guo, Michael Schlichtkrull, Andreas Vlachos. TACL 2022
- Compare to the knowledge: Graph neural fake news detection with external knowledge. Linmei Hu, Tianchi Yang, Luhao Zhang, Wanjun Zhong, Duyu Tang, Chuan Shi, Nan Duan, Ming Zhou. ACL 2021
- That is a known lie: Detecting previously fact-checked claims. Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, Preslav Nakov. ACL 2020
- Fact or Fiction: Verifying Scientific Claims. David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, Hannaneh Hajishirzi. EMNLP 2020
- FEVER: a Large-scale Dataset for Fact Extraction and VERification. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal. NAACL 2018
- COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic. Arkadiy Saakyan, Tuhin Chakrabarty, Smaranda Muresan. ACL 2021
- “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. William Yang Wang. ACL 2017
- X-Fact: A New Benchmark Dataset for Multilingual Fact Checking. Ashim Gupta, Vivek Srikumar. ACL 2021
- Integrating Stance Detection and Fact Checking in a Unified Corpus. Ramy Baly, Mitra Mohtarami, James Glass, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov. NAACL 2018
- Stance Prediction and Claim Verification: An Arabic Perspective. Jude Khouja. FEVER 2020
- Joint Rumour Stance and Veracity Prediction. Anders Edelbo Lillie, Emil Refsgaard Middelboe, Leon Derczynski. NoDaLiDa 2019
- DanFEVER: claim verification dataset for Danish Jeppe Nørregaard, Leon Derczynski. NoDaLiDa 2021
- MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen. EMNLP 2019
- Towards Debiasing Fact Verification Models. Tal Schuster, Darsh Shah, Yun Jie Serene Yeo, Daniel Roberto Filizzola Ortiz, Enrico Santus, Regina Barzilay. EMNLP 2019
- End-to-End Bias Mitigation by Modelling Biases in Corpora. Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson. ACL 2020
- Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance. Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych. ACL 2020
- Scientific Claim Verification with VerT5erini. Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin. LOUHI 2021
- Dense Passage Retrieval for Open-Domain Question Answering. Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih. EMNLP 2020
- What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context. Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov. ACL 2020
- Generating Fact Checking Explanations. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. ACL 2020
- Explainable Automated Fact-Checking for Public Health Claims. Neema Kotonya, Francesca Toni. EMNLP 2020
Semantic Consistency
- Learning hierarchical discourse-level structure for fake news detection. Hamid Karimi, Jiliang Tang. NAACL 2019
- Cross-document misinformation detection based on event graph reasoning. Xueqing Wu, Kung-Hsiang Huang, Yi Fung, Heng Ji. NAACL 2022
- Cosmos: Catching out-of-context misinformation with self-supervised learning. Shivangi Aneja, Chris Bregler, Matthias Nießner. Arxiv, 2021
- Text-Image De-Contextualization Detection Using Vision-Language Models. Mingzhen Huang, Shan Jia, Ming-Ching Chang, Siwei Lyu. ICASSP 2022
- Detecting cross-modal inconsistency to defend against neural fake news. Reuben Tan, Bryan Plummer, Kate Saenko. EMNLP 2020
- Misinformation detection in social media video posts. Kehan Wang, David Chan, Seth Z. Zhao, J. Canny, A. Zakhor. Arxiv 2022
- Compare to The Knowledge: Graph Neural Fake News Detection with External Knowledge. Linmei Hu, Tianchi Yang, Luhao Zhang, Wanjun Zhong, Duyu Tang, Chuan Shi, Nan Duan, Ming Zhou. ACL 2021
- Learning Transferable Visual Models From Natural Language Supervision.Alec Radford, Jong Wook Kim, Chris Hallacy, A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. ICML 2021
- NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media. Grace Luo, Trevor Darrell, Anna Rohrbach. EMNLP 2021
- Fine-Grained Visual Entailment. Christopher Thomas, Yipeng Zhang, Shih-Fu Chang. ECCV 2022
Propagation Patterns
- Causal understanding of fake news dissemination on social media. Lu Cheng, Ruocheng Guo, Kai Shu, Huan Liu. KDD 2021
- GCAN: Graph-aware co-attention networks for explainable fake news detection on social media. Yi-Ju Lu, Cheng-Te Li. ACL 2020
- Hierarchical propagation networks for fake news detection: Investigation and exploitation. Kai Shu, Deepak Mahudeswaran, Suhang Wang, Huan Liu. ICWSM 2020
- FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, Huan Liu. Big Data 2020
- FANG: Leveraging Social Context for Fake News Detection Using Graph Representation. Van-Hoang Nguyen, Kazunari Sugiyama, Preslav Nakov, Min-Yen Kan. CIKM 2020
Special notes on Fake News Generation
- Defending against neural fake news. Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. NeurIPS 2019.
- Protecting world leaders against deep fakes. Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, Hao Li. CVPR Workshops 2019.
- Deepfake video detection using recurrent neural networks. David Guera, E. Delp. AVSS 2018.
- InfoSurgeon: Cross-media fine-grained information consistency checking for fake news detection. Yi Fung, Christopher Thomas, Revanth Gangi Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, Avi Sil. ACL 2021
- Fact-enhanced synthetic news generation. Kai Shu, Yichuan Li, Kaize Ding, Huan Liu. AAAI 2021
- Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation. Kung-Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji. Arxiv 2022
Fake News Characterization
- In plain sight: Media bias through the lens of factual reporting. Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, Lu Wang .EMNLP 2019
- Social chemistry 101: Learning to reason about social and moral norms. Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, Yejin Choi. EMNLP 2020
- Multi-view models for political ideology detection of news articles. Vivek Kulkarni, Junting Ye, Steve Skiena, William Yang Wang. EMNLP 2018
- Edited media understanding frames: Reasoning about the intent and implications of visual misinformation. Jeff Da, Maxwell Forbes, Rowan Zellers, Anthony Zheng, Jena D. Hwang, Antoine Bosselut, Yejin Choi. ACL 2021
- Detecting propaganda techniques in memes. Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino. ACL 2021
- MOMENTA: A multimodal framework for detecting harmful memes and their targets. Shraman Pramanick, Shivam Sharma, Dimitar I. Dimitrov, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty. Findings of EMNLP 2021
- Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi. EMNLP 2017
Corrective Actions for Misinformation and Biased News Reporting
- Timing matters when correcting fake news. Nadia M. Brashier, Gordon Pennycook, A. Berinsky, David G. Rand. PNAS 2021
- dEFEND: Explainable fake news detection. Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, Huan Liu. KDD 2019
- NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias. Nayeon Lee, Yejin Bang, Tiezheng Yu, Andrea Madotto, Pascale Fung. NAACL 2022
Additional information is available at https://blender.cs.illinois.edu/hengji.html.
For more information about this tutorial, please refer to our proposal.