Scoring, Normalization & Thresholds

This document explains how Alligator scores candidates, normalises them, and decides whether a cell is matched.

Pipeline overview

ML model prediction  →  raw score  →  normalisation  →  match decision

The rerank stage is the final stage that produces the scores and match flags stored in MongoDB and returned by the API.

1. Raw ML score

The ML model outputs a raw probability in [0, 1] for each candidate, representing how likely that candidate is the correct entity for the cell.

raw_score = model.predict(features)[:, 1]   # class-1 probability

Each candidate's raw score is preserved in raw_score before any further transformation so it can be recovered if needed.

2. Per-cell score normalisation

All candidates within the same cell are normalised relative to the top candidate in that cell:

norm_score = raw_score / max_raw_score_in_cell

candidate	raw_score	norm_score
best	0.80	1.00
second	0.64	0.80
third	0.24	0.30

Why relative normalisation? The absolute raw score is model-dependent and hard to threshold universally. Relative normalisation expresses each candidate's confidence as a fraction of the best available option, making MATCH_THRESHOLD interpretable as "accept if at least X% as confident as the top candidate".

3. `RAW_MIN_CONFIDENCE` — the confidence floor

Environment variable: RAW_MIN_CONFIDENCE (default 0.1)

Before normalisation, the raw scores are checked against this floor:

if max_raw_score < RAW_MIN_CONFIDENCE:
    all norm_scores = 0.0          # cell is below confidence floor
    score = raw_score              # restore raw score for display
    match = False for all          # no candidate is accepted

Purpose: guard against cells where the model is universally uncertain. Even if a candidate would technically win the relative comparison, the model may have very low absolute confidence for the entire cell (e.g. a vague or ambiguous mention). When max_raw < RAW_MIN_CONFIDENCE the cell is treated as unresolvable: candidates are still returned for human review but the top candidate is not auto-matched.

Tuning guide

Value	Effect
`0.0`	Disable the floor entirely — every cell is eligible for auto-matching
`0.1` (default)	Require at least 10% raw confidence before auto-matching
`0.3`	Stricter — only high-confidence cells are auto-matched

4. `MATCH_THRESHOLD` — the acceptance threshold

Environment variable: MATCH_THRESHOLD (default 0.5)

Applied to the normalised score of the top candidate:

top_auto_match = norm_score_of_top >= MATCH_THRESHOLD

Because the top candidate always has norm_score = 1.0 (if above the confidence floor), this threshold by itself would always match the top candidate. Its real role becomes apparent when combined with MATCH_MARGIN_DELTA.

Tuning guide

Value	Effect
`0.5` (default)	Accept if the top candidate's norm score ≥ 0.5
`0.9`	Only accept when the top candidate dominates the cell clearly
`1.0`	Only accept when the top candidate is the single candidate above the floor

5. `MATCH_MARGIN_DELTA` — the margin guard

Environment variable: MATCH_MARGIN_DELTA (default 0.1)

An alternative accept condition based on how far ahead the top candidate is from the second:

top_auto_match = (norm_score_top >= MATCH_THRESHOLD)
              OR (norm_score_top - norm_score_second >= MARGIN_DELTA)

Purpose: catch cases where the top candidate has a slightly lower normalised score (e.g. 0.45, just below the threshold) but is clearly better than the runner-up (e.g. second at 0.10, margin 0.35 >> 0.1). Without the margin guard those cells would be left unmatched despite an unambiguous winner.

Tuning guide

Value	Effect
`1.0`	Disable the margin guard — only `MATCH_THRESHOLD` decides
`0.1` (default)	Accept when the top candidate leads by ≥ 10 percentage points
`0.5`	Only accept when the top candidate is dramatically ahead

6. Match flag assignment

Only the top candidate can ever receive match = True. All other candidates always receive match = False.

for cand in cands_to_save:
    cand["match"] = False
cands_to_save[0]["match"] = top_auto_match   # True only if conditions above are met

7. Full decision flowchart

For each cell:
│
├─ max_raw < RAW_MIN_CONFIDENCE?
│   ├─ YES → all norm_scores = 0, restore raw scores for display
│   │         match = False for all candidates   ← STOP
│   └─ NO  → continue
│
├─ normalise: norm_score = raw_score / max_raw
│
├─ top_auto_match = (norm_top >= MATCH_THRESHOLD)
│                OR (norm_top - norm_second >= MARGIN_DELTA)
│
├─ all candidates: match = False
└─ top candidate:  match = top_auto_match

8. Environment variable summary

Variable	Default	Description
`RAW_MIN_CONFIDENCE`	`0.1`	Minimum raw ML score the best candidate must reach for the cell to be eligible for auto-matching
`MATCH_THRESHOLD`	`0.5`	Minimum normalised score the top candidate must have to be auto-matched
`MATCH_MARGIN_DELTA`	`0.1`	Alternative accept condition: top candidate is auto-matched if its normalised score exceeds the second by at least this delta

All three can be set in the Docker environment (e.g. docker-compose.yml) or as regular environment variables before starting the service.

Pipeline overview​

1. Raw ML score​

2. Per-cell score normalisation​

3. RAW_MIN_CONFIDENCE — the confidence floor​

Tuning guide​

4. MATCH_THRESHOLD — the acceptance threshold​

Tuning guide​

5. MATCH_MARGIN_DELTA — the margin guard​

Tuning guide​

6. Match flag assignment​

7. Full decision flowchart​

8. Environment variable summary​

Pipeline overview

1. Raw ML score

2. Per-cell score normalisation

3. `RAW_MIN_CONFIDENCE` — the confidence floor

Tuning guide

4. `MATCH_THRESHOLD` — the acceptance threshold

Tuning guide

5. `MATCH_MARGIN_DELTA` — the margin guard

Tuning guide

6. Match flag assignment

7. Full decision flowchart

8. Environment variable summary