Conversation
…n PRs Intermittent CI failure on the matterjs controller variant of the CHIP Discovery test: step 7 returns 12 commissionables, step 8 (~3s later) returns empty. Insufficient instrumentation to determine whether the mDNS scanner cache was emptied between calls, addresses got pruned, or the Discovery callback chain dropped them. Add INFO logs at the relevant chokepoints: - CommissionableMdnsScanner: log cache size on entry, per-entry match decision (matched / rejected: no addresses / rejected: filter), waiter notifications for new arrivals, cache add and cache delete events, and final result size. - Discovery: log whether the scanner callback reused an existing ClientNode or created a new one (and whether find() hit an already-commissioned node). - LegacyControllerCommandHandler.handleDiscovery: log incoming findBy, raw result count, picked entry, and empty-return. Also temporarily extend the matterjs controller matrix gate in build-test.chip.yml to run on pull_request so this branch can capture fresh logs from CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stic logs
Previous logging commit serialized the CommissionableDeviceIdentifiers
object via JSON.stringify, which throws "Do not know how to serialize
a BigInt" when the identifier carries a BigInt-typed field, breaking
every commissionable discovery on the matterjs CHIP test variant.
Replace with the field-key list (sufficient to know what was queried)
and replace the legacy handler's raw result dump with a slim
{id, D, CM} projection so neither path can hit a BigInt at log time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHIP Discovery test on the matterjs controller reproducibly returned an empty result on the second discover-commissionables call within a step (step 8 "Check Hostname" failed with hostName: NoneType). Diagnostic logs added in the previous commit showed the scanner cache held all 13 entries but every entry was rejected by the addresses-length filter on the second call: 1st call: matched: 13, rejectedNoAddr: 0, resultSize: 13 2nd call: matched: 0, rejectedNoAddr: 13, resultSize: 0 Cache itself was intact (DnssdName.isDiscovered still true) but each entry's IpService.addresses had been pruned because the A/AAAA TTL on matter commissionable broadcasts is short (~4s) and elapses between unsolicited rebroadcasts. #cacheDevice solicited and armed an onAddresses observer when a device was first cached without addresses, but the observer deregistered itself after the first delivery, so a later expiry could not re-trigger it, and #startDiscovery solicits PTR records only. Extract the "solicit SRV-target A/AAAA + arm onAddresses observer" logic from #cacheDevice into a new idempotent helper #solicitAndArmAddresses, and call it from the cache iteration loop whenever a cached entry matches the identifier but has zero addresses. The current discovery's waiter is already registered, so when A/AAAA records resolve before the discovery timeout fires, #deliverDeviceIfResolved notifies the waiter and the device flows through the callback chain. Idempotent: skips re-arming if the onAddresses observer is already attached, so concurrent overlapping discoveries do not double-register. Diagnostic logs from the previous commit are kept so the next CI run verifies the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two consecutive matterjs CHIP runs on this branch passed cleanly with the A/AAAA re-solicit fix in place. Drop the diagnostic logs added across CommissionableMdnsScanner, Discovery, and LegacyControllerCommandHandler and revert the temporary pull_request gate in build-test.chip.yml so the branch contains only the surgical fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a commissionable mDNS discovery regression where consecutive discovery calls can return empty results when cached entries remain discovered but their A/AAAA records have expired (short TTL), by re-soliciting host A/AAAA and re-arming address resolution for stale cached entries.
Changes:
- Adjust cached-result iteration to re-solicit A/AAAA and re-arm address observation when a matching cached device has zero addresses.
- Extract “solicit SRV-target A/AAAA + arm onAddresses observer” into a shared helper (
#solicitAndArmAddresses) and reuse it from both the initial caching path and the stale-cache refresh path.
Cover the path fixed in the parent commit: a cached commissionable device whose A record TTL has elapsed while SRV/TXT remain valid is re-delivered on a subsequent findCommissionableDevicesContinuously call once the responder replies to the re-solicited hostname query. Verified: the test fails on the pre-fix code (expected 1 to equal 0 after the second discovery returns no devices) and passes after the #solicitAndArmAddresses helper is reused from the cache iteration loop. Addresses copilot-pull-request-reviewer review comment on PR #3695. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CommissionableMdnsScanner.findCommissionableDevicesContinuouslyreturns an empty result on the second consecutive discover-commissionables call within a CHIP test step on the matterjs controller, breaking the YAML constraint check (e.g.Discoverystep 8Check Hostnamefails withhostName: NoneType).Cause: the scanner's
#cacheretains the entry (itsDnssdName.isDiscoveredstays true), butIpService.addressesfor the entry has been pruned because matter commissionable A/AAAA TTL is short (~4s) and lapses between the responder's unsolicited rebroadcasts. The cache iteration infindCommissionableDevicesContinuouslyrejects entries withaddresses.length === 0.#cacheDevicedid solicit and arm anonAddressesobserver when a device was first cached without addresses, but that observer deregistered itself after the first delivery, so a subsequent A/AAAA expiry could not re-trigger it.#startDiscoveryonly solicits PTR records, not A/AAAA, so a fresh discovery window does nothing to refresh hostname records on already-cached devices.Fix
onAddressesobserver" logic out of#cacheDeviceinto a new idempotent helper#solicitAndArmAddresses(cached).#deliverDeviceIfResolvednotifies the waiter and the device flows through the callback chain into the result.cached.onAddressesis already attached, so concurrent overlapping discoveries do not double-register onipService.changed.The helper is also reused by
#cacheDeviceso the original initial-resolution path is preserved verbatim.Verification
matterjsCHIPDiscoverytest variantnpm test -- -p packages/protocol— 736/736 passTest plan
npm run build -w @matter/protocol)npm run format-verify)npm test -- -p packages/protocol)Discoverytest passes onmatterjscontroller variant in CI🤖 Generated with Claude Code