Skip to main content

CLI Reference

The CLI is invoked via:

python3 -m alligator.cli [OPTIONS]

All options are auto-exposed from Alligator.__init__ under the --gator.* namespace using jsonargparse.

Arguments

ArgumentDefaultDescription
--gator.input_csvRequired. Path to the input CSV file
--gator.entity_retrieval_endpoint$ENTITY_RETRIEVAL_ENDPOINTEntity lookup API URL
--gator.entity_retrieval_token$ENTITY_RETRIEVAL_TOKENAPI auth token
--gator.mongo_urimongodb://gator-mongodb:27017MongoDB connection URI
--gator.target_columnsNoneJSON string with NE/LIT/IGNORED column assignments
--gator.column_types{}JSON map of col_idx → [QID, ...] for type-constrained search
--gator.num_workerscpu_count // 2Number of parallel retrieval workers
--gator.worker_batch_size64Number of rows per worker batch
--gator.candidate_retrieval_limit20Max candidates fetched per entity
--gator.max_candidates_in_result5Max candidates kept in final output
--gator.num_ml_workers2Number of ML pipeline workers
--gator.ml_worker_batch_size256Batch size for ML prediction
--gator.candidate_retrieval_onlyFalseStop after Phase 2 (skip ML ranking)
--gator.save_outputFalsePersist output to MongoDB
--gator.save_output_to_csvFalseWrite results to a CSV file
--disable-loggingFalseFully suppress all logging output

Examples

Run on a CSV with defaults

python3 -m alligator.cli --gator.input_csv tables/my_table.csv

With explicit column types

python3 -m alligator.cli \
--gator.input_csv tables/imdb_top_100.csv \
--gator.column_types '{"0": ["Q11424"], "7": ["Q5"]}' \
--gator.num_workers 8

Manual column type assignment

python3 -m alligator.cli \
--gator.input_csv tables/my_table.csv \
--gator.target_columns '{"NE": {"0": "OTHERS", "2": "LOC"}, "LIT": {"1": "NUMBER"}, "IGNORED": [3]}' \
--gator.num_workers 4

Candidate retrieval only (skip ML)

python3 -m alligator.cli \
--gator.input_csv tables/my_table.csv \
--gator.candidate_retrieval_only true

Save results to CSV and suppress logging

python3 -m alligator.cli \
--gator.input_csv tables/my_table.csv \
--gator.save_output true \
--gator.save_output_to_csv true \
--disable-logging

Config File

Because the CLI uses jsonargparse, you can also pass a YAML or JSON config file:

python3 -m alligator.cli --config my_config.yaml

Where my_config.yaml might look like:

gator:
input_csv: tables/my_table.csv
num_workers: 8
num_ml_workers: 4
worker_batch_size: 64
candidate_retrieval_limit: 20
save_output: true
save_output_to_csv: true