# Animal Data Merge, Species Coverage, and Web Matcher Overlap — Diagnosis Report date: 2026-04-20 Read-only diagnosis. No changes made. --- ## ⚠️ CRITICAL BUG FOUND: Matcher Uses Wrong ID — Behavior Notes NEVER Load **matchingEngine.ts:309** calls `getBehaviorNotes(animal.id)` where `animal.id` is the **numeric SM ID** (e.g., `"3717"`, `"55"`). But `behavior_notes.animal_id` stores **shelter codes** (e.g., `"S2025592"`, `"A2023228"`). Every other endpoint in the codebase correctly uses `animal.shelterCode`: - `/api/animals` (Web Matcher): `getBehaviorNotes(animal.shelterCode)` ✅ - `/api/dashboard/behavior-notes`: `getBehaviorNotes(animal.shelterCode)` ✅ - Bio generation: `getBehaviorNotes(animal.shelterCode)` ✅ The matcher is the **only caller** using `animal.id`. This means: - `getBehaviorNotes()` returns `null` for every animal in the matcher - **All 7 scoring factors that depend on profiler data always use their fallback/default values** - The +5 bonus for "animals with behavior notes" is never applied - The matcher has been effectively **scoring blind** since it was written Proof from the DB: ``` SM numeric IDs: 3717, 4139, 55, ... behavior_notes.animal_id: S2025592, S2025896, A2023228, ... Join on shelter_code works. Join on numeric ID returns nothing. ``` **Fix:** Change `matchingEngine.ts:309` from `getBehaviorNotes(animal.id)` to `getBehaviorNotes(animal.shelterCode)`. One-line fix. --- ## Part A — The Merge Logic (Already Exists, Well-Codified) ### Where it lives **`getBehaviorNotes()` in localDatabase.ts:832-913** is a proper server-side merge function. It is NOT ad-hoc render code. It's a clean, well-structured function that: 1. Calls `getBehaviorRecords(animalId)` which fetches the **5 most recent** records, sorted oldest-to-newest 2. Starts with an empty `BehaviorNotes` object with all fields defaulted 3. Iterates records oldest→newest, overlaying each field if it has a meaningful value (`hasValue()` check) 4. Handles both dual-storage fields (`_text` + `_match`) and legacy fields 5. Falls back `_text` ← legacy if `_text` is empty (lines 900-907) 6. Concatenates all raw transcripts chronologically 7. Returns a single merged `BehaviorNotes` object ### Merge rules (from localDatabase.ts:832-913) For every field, the rule is: **last meaningful value wins** (newest record with a non-empty value). | Field | Default | Overlay rule | Notes | |---|---|---|---| | `color` | `''` | Latest non-empty wins | | | `specialFeatures` | `''` | Latest non-empty wins | | | `energyLevel` | `''` | Latest non-empty wins | | | `energyLevel_match` | `'unknown'` | Latest non-unknown wins | | | `peopleReaction` | `''` | Latest non-empty wins | | | `goodWithCats_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithCats` if empty | | `goodWithCats_match` | `'unknown'` | Latest non-unknown wins | | | `goodWithDogs_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithDogs` if empty | | `goodWithDogs_match` | `'unknown'` | Latest non-unknown wins | | | `goodWithKids_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithKids` if empty | | `goodWithKids_match` | `'unknown'` | Latest non-unknown wins | | | `specialNeeds` | `''` | Latest non-empty wins | | | `backstory` | `''` | Latest non-empty wins | | | `additionalNotes` | `''` | Latest non-empty wins | | | `recordedAt` | — | Uses latest record's timestamp | | | `rawTranscript` | `''` | All transcripts concatenated chronologically | | ### Record limit `getBehaviorRecords()` fetches only the **5 most recent** records per animal (SQL LIMIT 5, sorted by `recorded_at DESC`, then re-sorted ASC for merge). Older records are silently ignored. ### Dashboard server-side merge (separate, parallel) The dashboard endpoint `GET /api/dashboard/behavior-notes` (server.ts:927-994) has its **own** merge logic that's similar but not identical: - SM fields (color, breed, age, sex, size, description) are set as baseline - Then profiler records are overlaid oldest→newest (same pattern) - But this merge uses a **different field list** (`mergeFields` at line 933): includes legacy fields (`goodWithCats`, `goodWithDogs`, `goodWithKids`, `otherAnimalReaction`, `kidBehavior`) but NOT the `_match` or `_text` dual-storage fields This means the dashboard's merged view shows **legacy field values** for compatibility, while `getBehaviorNotes()` properly handles dual-storage. The dashboard merge is slightly stale but it's display-only. ### Who calls what | Consumer | Function called | ID used | Gets merged data? | |---|---|---|---| | matchingEngine.ts | `getBehaviorNotes(animal.id)` | ❌ WRONG (numeric SM ID) | ❌ Always null | | `/api/animals` (Web Matcher) | `getBehaviorNotes(animal.shelterCode)` | ✅ Correct | ✅ Yes | | `/api/dashboard/behavior-notes` | Own inline merge logic | ✅ Correct (shelterCode) | ✅ Yes (parallel merge) | | Bio generation | `getBehaviorNotes(animal.shelterCode)` | ✅ Correct | ✅ Yes | --- ## Part B — Multi-Entry Handling ### Current merge strategy **Last meaningful value wins** across the 5 most recent records. No weighting, no aggregation, no user pick. If record #3 says `energyLevel: "High"` and record #5 says `energyLevel: "Not specified"`, record #3's value persists (because the merge only overlays "meaningful" values). For `_match` enums, `'unknown'` is treated as not-meaningful — so a newer `'unknown'` won't overwrite an older `'yes'`. ### Current multi-entry state | Metric | Value | |---|---| | Animals with notes | 19 | | Total note records | 24 | | Max records per animal | 2 | | Animals with 2 records | 5 | | Animals with 1 record | 14 | Multi-entry is minimal so far. With biweekly profiling, animals could accumulate 12+ records per year. ### Record window Only the 5 most recent records are fetched (`getBehaviorRecords` SQL LIMIT 5). Records beyond 5 are silently dropped. At biweekly cadence, this means the merge window covers roughly **10 weeks** of observations. --- ## Part C — Web Matcher Merge Usage The Web Matcher calls `GET /api/animals` (server.ts:703), which calls `getBehaviorNotes(animal.shelterCode)` — the **same merge function** from localDatabase.ts. So: - Web Matcher ✅ uses `getBehaviorNotes()` merge (correct) - Dashboard ⚠️ uses its own parallel inline merge (slightly different field list) - Adopter Matcher ❌ calls `getBehaviorNotes()` with wrong ID (always null) --- ## Part D — Path to Fix the Adopter Matcher ### This is Option A (simplest case) The merge function already exists, is well-codified, and is used by two other consumers. The matcher already calls it — just with the wrong ID. **Fix 1 (one-line, critical):** matchingEngine.ts:309 ``` BEFORE: const notes = getBehaviorNotes(animal.id); AFTER: const notes = getBehaviorNotes(animal.shelterCode); ``` This single fix: - Enables ALL profiler data for scoring (energy, compatibility, special needs, backstory, etc.) - Uses the existing merge logic (5 most recent, last-meaningful-value-wins) - Automatically gets `_match` enums, `_text` fields, and legacy fallbacks **Fix 2 (scoring code):** After Fix 1, the scoring functions need to read the right fields: - `scoreHouseholdMatch`: read `notes.goodWithCats_match` / `goodWithDogs_match` / `goodWithKids_match` instead of legacy `otherAnimalReaction` / `kidBehavior` - `scoreEnergyMatch`: optionally read `notes.energyLevel_match` enum instead of re-parsing freeform text - `scoreColorMatch`: optionally check `notes.color` in addition to `animal.color` **Fix 3 (species):** Add `preferredSpecies` to extraction + species pre-filter in matcher. Estimated work: Fix 1 is trivial. Fix 2 is moderate (rewrite 3 scoring functions). Fix 3 is moderate (schema change + extraction prompt + filter logic). --- ## Part E — Freshness Proposal ### Proposed weighting scheme For the current system (biweekly profiles, 5-record window), a weighted merge is premature. The "last meaningful value wins" pattern works well when: - Records accumulate slowly (biweekly) - The window is small (5 records ≈ 10 weeks) - Fields are categorical (enums) not numeric **Recommendation: keep current merge as-is for now.** The 5-record LIMIT already provides a natural ~10-week window. Here's a phased plan: ### Phase 1 (now): No change to merge logic Current "last meaningful value wins" across 5 most recent records is adequate. Focus implementation effort on the three fixes above. ### Phase 2 (after 6 months of profiling data): Evaluate weighted merge When animals have 6+ records, revisit. Proposed rules at that point: | Field category | Proposed merge rule | Rationale | |---|---|---| | **Identity** (color, specialFeatures) | First non-empty value, never overwritten | Color doesn't change. Special features are permanent. | | **Behavior** (energyLevel, peopleReaction, compatibility enums) | Most recent meaningful value wins | Behavior evolves. Recent observation is most relevant. | | **Backstory** | First non-empty value, never overwritten | Origin story doesn't change. | | **Special needs** | Most recent meaningful value wins | Medical needs change (medications added/removed). | | **Additional notes** | Most recent meaningful value wins | Situational observations. | ### Phase 3 (if needed): Age cutoff | Cutoff | Effect | When to implement | |---|---|---| | 6 months | Records older than 6 months excluded from merge | When animals regularly have 10+ records | | 9 months | More conservative cutoff | If behavior changes slowly | | Never | Keep current 5-record LIMIT as implicit cutoff | If biweekly cadence means 5 records ≈ 10 weeks is already good | **Recommendation: the 5-record SQL LIMIT is already an implicit freshness cutoff.** At biweekly cadence, it covers ~10 weeks. At weekly cadence, ~5 weeks. This is reasonable and requires no additional logic. An explicit time-based cutoff adds complexity without clear benefit until the profiling program has run for 6+ months. --- REPORT COMPLETE — 195 total lines.