Alligator

Alligator is a Python library for entity linking over tabular data. Given a CSV table, it automatically:

Identifies Named Entity (NE) columns (or accepts manual assignments)
Fetches candidate Wikidata entities for each cell via the LAMAPI retrieval service
Computes ~27 similarity and overlap features per candidate
Runs a two-stage Keras ML pipeline (rank → rerank) to score and rank candidates
Produces three types of SemTab-compatible annotations:
- CEA — Cell Entity Annotation: links each NE cell to a Wikidata QID
- CTA — Column Type Annotation: infers the Wikidata type for each NE column
- CPA — Column Property Annotation: infers relationships between NE columns

Results are stored in MongoDB and can optionally be written back to CSV.

Key Features

Automatic column type classification (NE / LIT / IGNORED) via column-classifier
SHA-256 keyed MongoDB TTL cache for API responses (avoids redundant lookups)
Fuzzy-retry candidate retrieval when exact match returns no results
5-attempt exponential backoff on retrieval failures
Two-stage ML ranking: rank (local features) → rerank (global CTA/CPA frequency features)
Parallel multiprocessing workers for scalable throughput
FastAPI REST backend for integration into larger systems
Docker Compose setup for easy deployment

Architecture Overview

CSV / DataFrame
      │
      ▼
DataManager       — column classification, MongoDB onboarding
      │
      ▼
WorkerManager     — N async workers: entity extraction → candidate fetch → feature computation
      │
      ▼
MLManager         — rank → compute global frequencies → rerank
      │
      ▼
OutputManager     — CSV output + MongoDB annotations

Quick Links

Installation — how to install and configure Alligator
Quick Start — run your first table annotation in minutes
CLI Reference — full command-line options
Configuration Reference — all parameters explained
Pipeline Architecture — deep dive into how the pipeline works

Key Features​

Architecture Overview​

Quick Links​

Key Features

Architecture Overview

Quick Links