Fact Checking: Theory and Practice (KDD 2018 Tutorial)


Was Da Vinci born in Florence? Does patient ‘Johnson’ really have 300 heart-beats per minute? Checking the accuracy of facts is vital, for question answering, data cleaning, anomaly detection, fraud detection, and more.

Here we present three families of fact-checking approaches, based on the domains to which they apply:

  1. text documents
  2. graphs and knowledge bases, and
  3. relational databases.

The emphasis is on the intuition behind each method, as well as on a practitioner’s guide, highlighting the applicability of each method to each setting.

For more details, see Proposal - Fact Checking: Theory and Practice (KDD 2018)


Fact Checking: Theory and Practice (KDD 2018) [All slides]

1. Introduction [Slides]

2. Fact Checking from Structured Data [Slides]

3. Fact Checking from Graph [Slides]

4. Anomaly Detection from Graphs [Slides]

5. Fact Checking from Text [Slides]

6. Conclusion [Slides]

Target audience

Data Scientists and practitioners, with interest in Knowledge Bases, Database Quality, Truth Finding and Discovery, Credibility Analysis.


A B.Sc. in computer science should suffice. The tutorial assumes familiarity with basic linear algebra, calculus, discrete math; as well as with fundamentals of Machine Learning (classification, clustering, matrix factorization).


  1. Xin Luna Dong (Amazon)
  2. Christos Faloutsos (Amazon and CMU)
  3. Xian Li (Amazon)
  4. Subhabrata Mukherjee (Amazon)
  5. Prashant Shiralkar (Amazon)

Instructors’ biography

Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington Post. She has got the VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”. She co-authored book “Big Data Integration”, is the PC co-chair for Sigmod 2018 and WAIM 2015, and is serving in the VLDB advisory committee and the Board of Trustees of the VLDB Endowment. She has given several tutorials on data integration and knowledge management in top-tier conferences.

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Research Contributions Award in ICDM 2006, and the SIGKDD Innovations Award (2010). He has given over 40 tutorials and over 20 invited distinguished lectures. His research interests include large-scale data mining with emphasis on graphs and time sequences; anomaly detection, tensors, and fractals.

Xian Li is an Applied Scientist at Amazon contributing to the data quality and knowledge fusion in Amazon Product Knowledge Graph. Before joining Amazon, she was a data scientist at LinkedIn working as a major contributor of building the LinkedIn’s knowledge base of business entities. She received her Ph.D. from SUNY Binghamton and her research interests include truth finding in structured and unstructured data sources, data quality, and knowledge management.

Subhabrata Mukherjee is a Machine Learning Scientist at Amazon building the Amazon Product Knowledge Graph. He is working on building large-scale machine learning models that extract knowledge from unstructured and semi-structured data. He graduated summa cum laude from Max Planck Institute for Informatics, Germany with a Ph.D. He has previously worked at IBM Research on domain adaptation of question-answering systems, and sentiment analysis. His research interests include probabilistic graphical models, information extraction, and recommender systems.

Prashant Shiralkar is an Applied Scientist in the Product Graph team at Amazon. He currently works on knowledge extraction from semi-structured data. Previously, he received a Ph.D. from Indiana University Bloomington where his dissertation work focused on devising computational approaches for fact checking by mining knowledge graphs. His research interests include machine learning, data mining, information extraction and NLP, and Semantic Web technologies.