Commit 638f289
fix(protocol): re-solicit A/AAAA for stale cache entries on commissionable discovery (#3695)
* chore(chip-testing): add discovery diagnostic logs and run matterjs on PRs
Intermittent CI failure on the matterjs controller variant of the CHIP
Discovery test: step 7 returns 12 commissionables, step 8 (~3s later)
returns empty. Insufficient instrumentation to determine whether the
mDNS scanner cache was emptied between calls, addresses got pruned, or
the Discovery callback chain dropped them.
Add INFO logs at the relevant chokepoints:
- CommissionableMdnsScanner: log cache size on entry, per-entry match
decision (matched / rejected: no addresses / rejected: filter), waiter
notifications for new arrivals, cache add and cache delete events,
and final result size.
- Discovery: log whether the scanner callback reused an existing
ClientNode or created a new one (and whether find() hit an
already-commissioned node).
- LegacyControllerCommandHandler.handleDiscovery: log incoming findBy,
raw result count, picked entry, and empty-return.
Also temporarily extend the matterjs controller matrix gate in
build-test.chip.yml to run on pull_request so this branch can capture
fresh logs from CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(protocol): avoid BigInt-unsafe JSON.stringify in discovery diagnostic logs
Previous logging commit serialized the CommissionableDeviceIdentifiers
object via JSON.stringify, which throws "Do not know how to serialize
a BigInt" when the identifier carries a BigInt-typed field, breaking
every commissionable discovery on the matterjs CHIP test variant.
Replace with the field-key list (sufficient to know what was queried)
and replace the legacy handler's raw result dump with a slim
{id, D, CM} projection so neither path can hit a BigInt at log time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(protocol): re-solicit A/AAAA for stale cache entries on discovery
CHIP Discovery test on the matterjs controller reproducibly returned an
empty result on the second discover-commissionables call within a step
(step 8 "Check Hostname" failed with hostName: NoneType). Diagnostic
logs added in the previous commit showed the scanner cache held all 13
entries but every entry was rejected by the addresses-length filter on
the second call:
1st call: matched: 13, rejectedNoAddr: 0, resultSize: 13
2nd call: matched: 0, rejectedNoAddr: 13, resultSize: 0
Cache itself was intact (DnssdName.isDiscovered still true) but each
entry's IpService.addresses had been pruned because the A/AAAA TTL on
matter commissionable broadcasts is short (~4s) and elapses between
unsolicited rebroadcasts. #cacheDevice solicited and armed an
onAddresses observer when a device was first cached without addresses,
but the observer deregistered itself after the first delivery, so a
later expiry could not re-trigger it, and #startDiscovery solicits PTR
records only.
Extract the "solicit SRV-target A/AAAA + arm onAddresses observer"
logic from #cacheDevice into a new idempotent helper
#solicitAndArmAddresses, and call it from the cache iteration loop
whenever a cached entry matches the identifier but has zero addresses.
The current discovery's waiter is already registered, so when A/AAAA
records resolve before the discovery timeout fires, #deliverDeviceIfResolved
notifies the waiter and the device flows through the callback chain.
Idempotent: skips re-arming if the onAddresses observer is already
attached, so concurrent overlapping discoveries do not double-register.
Diagnostic logs from the previous commit are kept so the next CI run
verifies the fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: remove diagnostic logs and PR-trigger gate after fix verified
Two consecutive matterjs CHIP runs on this branch passed cleanly with
the A/AAAA re-solicit fix in place. Drop the diagnostic logs added
across CommissionableMdnsScanner, Discovery, and LegacyControllerCommandHandler
and revert the temporary pull_request gate in build-test.chip.yml so
the branch contains only the surgical fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(protocol): add regression for stale-cache A/AAAA re-solicit
Cover the path fixed in the parent commit: a cached commissionable
device whose A record TTL has elapsed while SRV/TXT remain valid is
re-delivered on a subsequent findCommissionableDevicesContinuously
call once the responder replies to the re-solicited hostname query.
Verified: the test fails on the pre-fix code (expected 1 to equal 0
after the second discovery returns no devices) and passes after the
#solicitAndArmAddresses helper is reused from the cache iteration loop.
Addresses copilot-pull-request-reviewer review comment on PR #3695.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent ccd892f commit 638f289
2 files changed
Lines changed: 150 additions & 21 deletions
File tree
- packages/protocol
- src/mdns
- test/mdns
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
193 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
194 | 197 | | |
195 | 198 | | |
196 | 199 | | |
197 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
198 | 206 | | |
199 | 207 | | |
200 | 208 | | |
| |||
383 | 391 | | |
384 | 392 | | |
385 | 393 | | |
386 | | - | |
387 | | - | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
398 | 412 | | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
399 | 419 | | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | | - | |
| 420 | + | |
| 421 | + | |
408 | 422 | | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
409 | 432 | | |
410 | 433 | | |
411 | 434 | | |
| |||
Lines changed: 106 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
823 | 823 | | |
824 | 824 | | |
825 | 825 | | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
826 | 932 | | |
0 commit comments