Semantic Table Interpretation

This page provides a detailed explanation of the Semantic Table Interpretation (STI) process. You'll learn about the steps involved in annotating relational tables using knowledge graphs, including column type annotation (CTA), cell entity annotation (CEA), and column property annotation (CPA)

The Importance of Tables in Information Management

Tables are crucial for creating, organizing, and sharing information. Dating back to 2500 BC, Merer, an Egyptian naval inspector, documented his activities in a table on papyrus. Today, tables are widely used in business, science, and the web, especially with the rise of the Open Data movement

Some key statistics:

  • Web Tables: In 2008, 14.1 billion HTML tables were extracted, with 154 million high-quality ones. The Common Crawl 2015 repository has 233 million content tables
  • Wikipedia Tables The 2022 English Wikipedia snapshot includes 2,803,424 tables from 21,149,260 articles
  • Spreadsheets Between 750 million and 2 billion people use Google Sheets or Microsoft Excel globally

Understanding tabular data can be challenging, even with headers

The role of Knowledge Graph

Knowledge Graphs (KGs) represent relationships between entities (e.g., people, places, events) in graph structures. They use RDF (Resource Description Framework) to encode data meaningfully, facilitating data integration from various formats. Ontologies in RDF specify the meanings of types and properties through logical axioms

Developing KGs enhances data integration and enriches data. Semantic Table Interpretation (STI) is essential for constructing and extending KGs from semi-structured data, attracting significant attention in fields like Semantic Web, Data Management, AI, and NLP

Semantic Table Interpretation

We define Semantic Table Interpretation (STI) as follows:

  • A relational table T
  • A Knowledge Graph (KG) including entities, statements, and an ontology of types and properties
Name Coordinates Height Range
Le Mount Blanc 45° 49' 57" N, 06° 51' 52" E 4808 M Blanc massif
Hohtälli 45° 59' 20" N, 07° 48' 10" E 3275 Pennine Alps
Monte Cervino 45° 58' 35" N, 07° 39' 31" E 4478 Pennine Alps
graph3

A table T is annotated when:

  • Each column is associated with one or more KG types
  • Each cell in "entity columns" is annotated with a KG entity or marked as NIL if it doesn't exist in the KG
  • Pairs of columns are annotated with a binary KG property

The result of the annotation can be visualised in the figure below

graph1