Extracting Descriptive Structures for RDF Data Sources

Point of contact


The web of data represents a huge information space consisting of an increasing number of interlinked datasets, described using the Resource Description Framework (RDF). One important feature of such datasets is that they contain both the data and the schema describing the data. However, these schema-related declarations are not mandatory, and are not always provided. As a consequence, the schema may be incomplete or missing, which offers a high flexibility while creating interlinked datasets, but can also limit their use. Indeed, it is not easy to query or explore a dataset without any knowledge of its resources, classes or properties. Several approaches have been proposed for automatic schema discovery, providing a schema composed of a set of classes and relationships between them. The schema in this context is not considered as a structural constraint but as a description of the underlying structure of an RDF dataset. Among other techniques, data mining algorithms have been widely used for generating such schema. We are interested in designing schema discovery approaches which address some key challenges such as scalability and incrementality, while ensuring a good quality resulting schema.

Researchers involved