table illustration

TUTSTI @ ISWC2024

The tutorial TUTSTI introduces the topic of Semantic Table Interpretation (STI), covering theoretical and practical considerations. In particular, the tutorial will provide a comprehensive analysis of how the approaches to STI have evolved from heuristic-based to ML-based, to the most recent LLM-based approaches. The analysis will consider the specific characteristics of these different classes, providing insights into their respective advantages and limitations to identify the contexts of use. The final part will describe a case study to demonstrate the application of two state-of-the-art approaches

Objectives

L1

Present key aspects of STI: the role of semantic annotation of tabular data; why STI remains relevant in the generative AI era; phases of STI design and deployment; and a review of existing solutions, highlighting their strengths and limitations

L2

Present a link and extend paradigm for sti as a unifying abstraction to develop semantic data enrichment solutions

L3

Provide a practical guide to creating a sti process using heuristic, ML and Large Language Model (LLM) based techniques in consideration of the key challenges that an sti approach must address

L4

Provide an example of the use-case that it considers of two SOTA approaches (one feature-based ML, and one llm-based) highlighting their salient characteristics, and defining guidelines for the choice in relation to the user’s objectives

L5

Discuss open research questions to stimulate further research on

Program

The tutorial will be a half-day tutorial and will be split into two slots:

Slot 1: where we discuss the main concepts and review SOTA (L1, L2, L3 and L5);

Slot 1: where we present two SOTA approach and use it in a hands-on session, by walking the audience through a use case (L4)

  1. 50 minutes

    Semantic Table Interpretation

    Topics: definitions; challenges; examples; semantic interpretation as key enablers.

  2. 50 minutes

    State-of-the-art

    Topics: semantic table interpretation tools and techniques: lessons learned and limitations.

  3. 30 minutes

    Break

  4. 60 minutes

    Semantic Table Interpretation tasks

    Topics: how to implement different apporaches for CEA, CTA and CPA using heuristic, ML and LLM techniques.

  5. 40 minutes

    Hands-on session

    Analysis and use of SOTA approaches, s-elBat/Alligator and TableLlama

Intended audience

The tutorial type falls into the category of an introductory tutorial within a specific domain, catering to an intermediate level of proficiency. Intended attendees are:

i) researchers with expertise in semantic technologies, and, in particular in their application to data integration problems (e.g., ontology matching, semantic reconciliation, table annotation), who will discover new industry-driven, real-world application scenarios

ii) young researchers that recently joined the semantic web community (e.g., PhD students and postdocs), who will be exposed to challenging problems and possible solutions; iii) data scientists and data engineers, who will learn how semantic technologies can be exploited to support data-driven innovation

iii) data scientists and data engineers, who will learn how semantic technologies can be exploited to support data-driven innovation

Presenters

Matteo palmonari
Professor at the University of Milan-Bicocca
His research interests are at the intersection of Artificial Intelligence and Data Management. He has been a coordinator and partner in projects about data enrichment, and he is particularly interested in combining machine learning and human-in-the-loop mechanisms to support knowledge-based applications.
Fabio D'Adda
Research Assistant at the University of Milan-Bicocca
He specialises in the application of ML techniques in the Semantic Web. He is chair of SemTab 2024, and organiser of the “STI vs LLMs Track”
Marco Cremaschi
Assistant Professor at the University of Milan-Bicocca
He specialises in the application of ML techniques in the Semantic Web. He is chair of SemTab 2024, and organiser of the “STI vs LLMs Track”
Ernesto Jimenez-Ruiz
Lecturer in Artificial Intelligence and Senior Tutor for Research at City, University of London affiliated to the Research Center for Adaptive Computer Systems and Machine Learning
His current research interests focus on applying Semantic Technology to Data Science workflows and combining Knowledge Representation and Machine Learning techniques. He is one of the founders of the SemTab challenge