Skip to content

fix(node): make endpoint.state[id] and ClientNode.toString teardown-safe#3680

Merged
Apollon77 merged 3 commits intomainfrom
fix-snapshot-store-access
May 3, 2026
Merged

fix(node): make endpoint.state[id] and ClientNode.toString teardown-safe#3680
Apollon77 merged 3 commits intomainfrom
fix-snapshot-store-access

Conversation

@Apollon77
Copy link
Copy Markdown
Collaborator

@Apollon77 Apollon77 commented May 3, 2026

Summary

Fixes a misleading WalStorageDriver Snapshot failed: [destroyed-dependency] client stores is closing warning during ServerNode shutdown.

The shutdown sequence accesses endpoint.state.<behaviorId> on a ClientNode whose behaviors have already been closed. The augmented descriptors on endpoint.state survive Behaviors.close() (so factory-reset can re-activate the same getter), but post-close access re-entered #backingFor on a missing backing and threw BehaviorInitializationError. Constructing that error embedded ${this.#endpoint}toString()identitypeerAddressthis.storeServerNodeStore.clientStores (already in Closing state) → re-threw [destroyed-dependency], masking the original cause.

Two layered fixes:

  1. Behaviors.#augmentEndpoint getter now checks endpoint construction status and returns undefined for Destroying/Destroyed endpoints. Active and Initializing paths (including late activation) are unchanged, so factory-reset still works. This kills the root cause.
  2. ClientNode.peerAddress mirrors live commissioning state into a cache and accepts DependencyLifecycleError around the storage probe. toString() no longer throws even if a similar lifecycle race surfaces in another code path.

🤖 Generated with Claude Code

Apollon77 and others added 2 commits May 2, 2026 19:47
…ardown-safe

ClientNode.toString() — invoked from Behaviors and Endpoint error message
construction — could throw [destroyed-dependency] during ServerNode shutdown:
the storage fallback in peerAddress went through ServerNodeStore.clientStores,
which asserts active construction and is already in Closing state by the time
the WAL final snapshot runs.  The throw then masked whatever original error
triggered toString().

peerAddress now mirrors the commissioning behavior state into a cache field
whenever the backing is loaded (set OR cleared, so a decommission does not
leave a stale value), falls back to the cache when the backing is gone, and
accepts DependencyLifecycleError around the storage probe so teardown reads
return undefined rather than throwing.  eraseWithMutex clears the cache so
re-commissioning can publish a new address.

Adds a regression test that closes the controller and verifies peerAddress
and toString continue to resolve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardown

Behaviors.close() destroys backings but leaves the augmented endpoint.state[id]
property descriptors in place because reuse paths (factory reset) re-activate
through the same getter without re-running the constructor.  Post-close access
therefore re-entered #backingFor on a missing backing and threw
BehaviorInitializationError; constructing that error involved ${this.#endpoint}
which cascaded through toString → identity → peerAddress → storage and
surfaced as a misleading "Snapshot failed" warning during ServerNode shutdown.

The getter now checks the endpoint construction status: in Destroying or
Destroyed it returns undefined, leaving Active/Initializing paths (including
late activation) unchanged.

Adds a regression test verifying state access on a closed peer returns
undefined rather than throwing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 3, 2026 11:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves teardown/shutdown robustness for ClientNode and behavior state access to avoid lifecycle-related exceptions (and misleading storage warnings) during ServerNode shutdown.

Changes:

  • Make endpoint.state[behaviorId] return undefined for Destroying/Destroyed endpoints to prevent re-entering behavior backing initialization during teardown.
  • Make ClientNode.peerAddress resilient during shutdown by caching the peer address and tolerating DependencyLifecycleError from the storage fallback path.
  • Add regression tests covering peerAddress/toString() teardown safety and endpoint.state[behaviorId] behavior after close.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
packages/node/src/endpoint/properties/Behaviors.ts Guard behavior state getter during endpoint teardown to avoid late backing initialization/errors.
packages/node/src/node/ClientNode.ts Add peer address caching and tolerate dependency lifecycle errors so logging/toString paths don’t throw during shutdown.
packages/node/test/node/ClientNodeTest.ts Add regression tests for peerAddress/toString teardown resilience and post-close behavior state access.

Comment thread packages/node/src/node/ClientNode.ts
Copilot review on #3680 noted that ClientNode.reset() (soft reset, without
storage erase) closes behaviors but leaves #cachedPeerAddress in place.  If
the node is then re-commissioned to a different fabric, peerAddress would
return the stale cached value until the new commissioning state loads.

Add a resetWithMutex override that clears the cache.  The explicit clear in
eraseWithMutex stays because that path uses super.resetWithMutex() (static
dispatch), which skips this override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread packages/node/src/node/ClientNode.ts
@Apollon77 Apollon77 added the automerge Set this label if the PR is ready to automatically merged after approval label May 3, 2026
@Apollon77 Apollon77 merged commit 9577473 into main May 3, 2026
47 checks passed
@Apollon77 Apollon77 deleted the fix-snapshot-store-access branch May 3, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Set this label if the PR is ready to automatically merged after approval

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants