Knowledge acquisition from heterogeneous sources

Point of contact

Fabian Suchanek, Télécom Paris, suchanek@enst.fr

Summary

Many applications in AI require knowledge about the real world: chatbots, intelligent assistants, decision support systems, or dialog systems. The goal of the axis is thus to help a machine acquire this knowledge. Quite often, this is achieved by extracting the knowledge from collections of text, such as encyclopediae, the news, or other documents or Web pages. However, the knowledge can also be assembled from structured or semi-structured sources such as Web tables, databases (with possibly differing schemas), or knowledge bases. The knowledge can take a symbolic form (as subject-predicate-object triples, logical axioms, or database rows), a statistical form (capturing correlations between terms), or a subsymbolic form (as encoded, e.g., in a neural network or transformer). The challenges lie in the distillation of the relevant pieces of information from the data sources, their canonicalization, their integration into the existing knowledge, and their mapping to similar or equivalent pieces of information in other sources. They are key steps in a better reusability, interoperability and accessibility of the data (e.g. application of the FAIR principles in an Open Science vision)

Keywords

information extraction, knowledge bases, BERT, heterogeneous sources, logic, word representations, database, natural language processing.

Researchers involved or interested

Related master programs