# Animal Data Merge, Species Coverage, and Web Matcher Overlap — Diagnosis

Report date: 2026-04-20
Read-only diagnosis. No changes made.

---

## ⚠️ CRITICAL BUG FOUND: Matcher Uses Wrong ID — Behavior Notes NEVER Load

**matchingEngine.ts:309** calls `getBehaviorNotes(animal.id)` where `animal.id` is the **numeric SM ID** (e.g., `"3717"`, `"55"`).

But `behavior_notes.animal_id` stores **shelter codes** (e.g., `"S2025592"`, `"A2023228"`).

Every other endpoint in the codebase correctly uses `animal.shelterCode`:
- `/api/animals` (Web Matcher): `getBehaviorNotes(animal.shelterCode)` ✅
- `/api/dashboard/behavior-notes`: `getBehaviorNotes(animal.shelterCode)` ✅  
- Bio generation: `getBehaviorNotes(animal.shelterCode)` ✅

The matcher is the **only caller** using `animal.id`. This means:
- `getBehaviorNotes()` returns `null` for every animal in the matcher
- **All 7 scoring factors that depend on profiler data always use their fallback/default values**
- The +5 bonus for "animals with behavior notes" is never applied
- The matcher has been effectively **scoring blind** since it was written

Proof from the DB:
```
SM numeric IDs: 3717, 4139, 55, ...
behavior_notes.animal_id: S2025592, S2025896, A2023228, ...
Join on shelter_code works. Join on numeric ID returns nothing.
```

**Fix:** Change `matchingEngine.ts:309` from `getBehaviorNotes(animal.id)` to `getBehaviorNotes(animal.shelterCode)`. One-line fix.

---

## Part A — The Merge Logic (Already Exists, Well-Codified)

### Where it lives

**`getBehaviorNotes()` in localDatabase.ts:832-913** is a proper server-side merge function.

It is NOT ad-hoc render code. It's a clean, well-structured function that:
1. Calls `getBehaviorRecords(animalId)` which fetches the **5 most recent** records, sorted oldest-to-newest
2. Starts with an empty `BehaviorNotes` object with all fields defaulted
3. Iterates records oldest→newest, overlaying each field if it has a meaningful value (`hasValue()` check)
4. Handles both dual-storage fields (`_text` + `_match`) and legacy fields
5. Falls back `_text` ← legacy if `_text` is empty (lines 900-907)
6. Concatenates all raw transcripts chronologically
7. Returns a single merged `BehaviorNotes` object

### Merge rules (from localDatabase.ts:832-913)

For every field, the rule is: **last meaningful value wins** (newest record with a non-empty value).

| Field | Default | Overlay rule | Notes |
|---|---|---|---|
| `color` | `''` | Latest non-empty wins | |
| `specialFeatures` | `''` | Latest non-empty wins | |
| `energyLevel` | `''` | Latest non-empty wins | |
| `energyLevel_match` | `'unknown'` | Latest non-unknown wins | |
| `peopleReaction` | `''` | Latest non-empty wins | |
| `goodWithCats_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithCats` if empty |
| `goodWithCats_match` | `'unknown'` | Latest non-unknown wins | |
| `goodWithDogs_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithDogs` if empty |
| `goodWithDogs_match` | `'unknown'` | Latest non-unknown wins | |
| `goodWithKids_text` | `''` | Latest non-empty wins | Falls back to legacy `goodWithKids` if empty |
| `goodWithKids_match` | `'unknown'` | Latest non-unknown wins | |
| `specialNeeds` | `''` | Latest non-empty wins | |
| `backstory` | `''` | Latest non-empty wins | |
| `additionalNotes` | `''` | Latest non-empty wins | |
| `recordedAt` | — | Uses latest record's timestamp | |
| `rawTranscript` | `''` | All transcripts concatenated chronologically | |

### Record limit

`getBehaviorRecords()` fetches only the **5 most recent** records per animal (SQL LIMIT 5, sorted by `recorded_at DESC`, then re-sorted ASC for merge). Older records are silently ignored.

### Dashboard server-side merge (separate, parallel)

The dashboard endpoint `GET /api/dashboard/behavior-notes` (server.ts:927-994) has its **own** merge logic that's similar but not identical:

- SM fields (color, breed, age, sex, size, description) are set as baseline
- Then profiler records are overlaid oldest→newest (same pattern)
- But this merge uses a **different field list** (`mergeFields` at line 933): includes legacy fields (`goodWithCats`, `goodWithDogs`, `goodWithKids`, `otherAnimalReaction`, `kidBehavior`) but NOT the `_match` or `_text` dual-storage fields

This means the dashboard's merged view shows **legacy field values** for compatibility, while `getBehaviorNotes()` properly handles dual-storage. The dashboard merge is slightly stale but it's display-only.

### Who calls what

| Consumer | Function called | ID used | Gets merged data? |
|---|---|---|---|
| matchingEngine.ts | `getBehaviorNotes(animal.id)` | ❌ WRONG (numeric SM ID) | ❌ Always null |
| `/api/animals` (Web Matcher) | `getBehaviorNotes(animal.shelterCode)` | ✅ Correct | ✅ Yes |
| `/api/dashboard/behavior-notes` | Own inline merge logic | ✅ Correct (shelterCode) | ✅ Yes (parallel merge) |
| Bio generation | `getBehaviorNotes(animal.shelterCode)` | ✅ Correct | ✅ Yes |

---

## Part B — Multi-Entry Handling

### Current merge strategy

**Last meaningful value wins** across the 5 most recent records. No weighting, no aggregation, no user pick. If record #3 says `energyLevel: "High"` and record #5 says `energyLevel: "Not specified"`, record #3's value persists (because the merge only overlays "meaningful" values).

For `_match` enums, `'unknown'` is treated as not-meaningful — so a newer `'unknown'` won't overwrite an older `'yes'`.

### Current multi-entry state

| Metric | Value |
|---|---|
| Animals with notes | 19 |
| Total note records | 24 |
| Max records per animal | 2 |
| Animals with 2 records | 5 |
| Animals with 1 record | 14 |

Multi-entry is minimal so far. With biweekly profiling, animals could accumulate 12+ records per year.

### Record window

Only the 5 most recent records are fetched (`getBehaviorRecords` SQL LIMIT 5). Records beyond 5 are silently dropped. At biweekly cadence, this means the merge window covers roughly **10 weeks** of observations.

---

## Part C — Web Matcher Merge Usage

The Web Matcher calls `GET /api/animals` (server.ts:703), which calls `getBehaviorNotes(animal.shelterCode)` — the **same merge function** from localDatabase.ts. So:

- Web Matcher ✅ uses `getBehaviorNotes()` merge (correct)
- Dashboard ⚠️ uses its own parallel inline merge (slightly different field list)
- Adopter Matcher ❌ calls `getBehaviorNotes()` with wrong ID (always null)

---

## Part D — Path to Fix the Adopter Matcher

### This is Option A (simplest case)

The merge function already exists, is well-codified, and is used by two other consumers. The matcher already calls it — just with the wrong ID.

**Fix 1 (one-line, critical):** matchingEngine.ts:309
```
BEFORE: const notes = getBehaviorNotes(animal.id);
AFTER:  const notes = getBehaviorNotes(animal.shelterCode);
```

This single fix:
- Enables ALL profiler data for scoring (energy, compatibility, special needs, backstory, etc.)
- Uses the existing merge logic (5 most recent, last-meaningful-value-wins)
- Automatically gets `_match` enums, `_text` fields, and legacy fallbacks

**Fix 2 (scoring code):** After Fix 1, the scoring functions need to read the right fields:
- `scoreHouseholdMatch`: read `notes.goodWithCats_match` / `goodWithDogs_match` / `goodWithKids_match` instead of legacy `otherAnimalReaction` / `kidBehavior`
- `scoreEnergyMatch`: optionally read `notes.energyLevel_match` enum instead of re-parsing freeform text
- `scoreColorMatch`: optionally check `notes.color` in addition to `animal.color`

**Fix 3 (species):** Add `preferredSpecies` to extraction + species pre-filter in matcher.

Estimated work: Fix 1 is trivial. Fix 2 is moderate (rewrite 3 scoring functions). Fix 3 is moderate (schema change + extraction prompt + filter logic).

---

## Part E — Freshness Proposal

### Proposed weighting scheme

For the current system (biweekly profiles, 5-record window), a weighted merge is premature. The "last meaningful value wins" pattern works well when:
- Records accumulate slowly (biweekly)
- The window is small (5 records ≈ 10 weeks)
- Fields are categorical (enums) not numeric

**Recommendation: keep current merge as-is for now.** The 5-record LIMIT already provides a natural ~10-week window. Here's a phased plan:

### Phase 1 (now): No change to merge logic

Current "last meaningful value wins" across 5 most recent records is adequate. Focus implementation effort on the three fixes above.

### Phase 2 (after 6 months of profiling data): Evaluate weighted merge

When animals have 6+ records, revisit. Proposed rules at that point:

| Field category | Proposed merge rule | Rationale |
|---|---|---|
| **Identity** (color, specialFeatures) | First non-empty value, never overwritten | Color doesn't change. Special features are permanent. |
| **Behavior** (energyLevel, peopleReaction, compatibility enums) | Most recent meaningful value wins | Behavior evolves. Recent observation is most relevant. |
| **Backstory** | First non-empty value, never overwritten | Origin story doesn't change. |
| **Special needs** | Most recent meaningful value wins | Medical needs change (medications added/removed). |
| **Additional notes** | Most recent meaningful value wins | Situational observations. |

### Phase 3 (if needed): Age cutoff

| Cutoff | Effect | When to implement |
|---|---|---|
| 6 months | Records older than 6 months excluded from merge | When animals regularly have 10+ records |
| 9 months | More conservative cutoff | If behavior changes slowly |
| Never | Keep current 5-record LIMIT as implicit cutoff | If biweekly cadence means 5 records ≈ 10 weeks is already good |

**Recommendation: the 5-record SQL LIMIT is already an implicit freshness cutoff.** At biweekly cadence, it covers ~10 weeks. At weekly cadence, ~5 weeks. This is reasonable and requires no additional logic. An explicit time-based cutoff adds complexity without clear benefit until the profiling program has run for 6+ months.

---

REPORT COMPLETE — 195 total lines.