Quick Start
Make sure you have installed Alligator and have MongoDB and LAMAPI running before proceeding.
Python API
Minimal example
Alligator will automatically classify columns and run the full pipeline:
from alligator import Alligator
gator = Alligator(
input_csv="tables/my_table.csv",
num_workers=4,
num_ml_workers=2,
worker_batch_size=64,
candidate_retrieval_limit=16,
mongo_uri="mongodb://localhost:27017/",
)
gator.run()
With Wikidata type constraints
Constrain candidate search per NE column to specific Wikidata entity types for higher precision:
from alligator import Alligator
# Map column index (as string) → list of Wikidata QIDs
column_types = {
"0": ["Q11424"], # film
"3": ["Q483394"], # music genre
"7": ["Q5"], # human
"8": ["Q5"], # human
}
gator = Alligator(
input_csv="tables/imdb_top_100.csv",
column_types=column_types,
num_workers=4,
num_ml_workers=2,
worker_batch_size=64,
candidate_retrieval_limit=16,
mongo_uri="mongodb://localhost:27017/",
)
gator.run()
CLI
python3 -m alligator.cli \
--gator.input_csv tables/my_table.csv \
--gator.num_workers 4 \
--gator.num_ml_workers 2 \
--gator.worker_batch_size 64 \
--gator.candidate_retrieval_limit 16 \
--gator.mongo_uri mongodb://localhost:27017/
See the full CLI Reference for all available options.
What Happens Next
After run() completes:
- Annotations are stored in MongoDB (
alligator_dbdatabase by default) - If
save_output_to_csv=True, a CSV file is written alongside the input - CEA / CTA / CPA results are accessible via the MongoDB
input_datacollection
For more details on the output format, see the Pipeline Architecture page.