fix(node): make endpoint.state[id] and ClientNode.toString teardown-safe#3680
Merged
fix(node): make endpoint.state[id] and ClientNode.toString teardown-safe#3680
Conversation
…ardown-safe ClientNode.toString() — invoked from Behaviors and Endpoint error message construction — could throw [destroyed-dependency] during ServerNode shutdown: the storage fallback in peerAddress went through ServerNodeStore.clientStores, which asserts active construction and is already in Closing state by the time the WAL final snapshot runs. The throw then masked whatever original error triggered toString(). peerAddress now mirrors the commissioning behavior state into a cache field whenever the backing is loaded (set OR cleared, so a decommission does not leave a stale value), falls back to the cache when the backing is gone, and accepts DependencyLifecycleError around the storage probe so teardown reads return undefined rather than throwing. eraseWithMutex clears the cache so re-commissioning can publish a new address. Adds a regression test that closes the controller and verifies peerAddress and toString continue to resolve. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardown
Behaviors.close() destroys backings but leaves the augmented endpoint.state[id]
property descriptors in place because reuse paths (factory reset) re-activate
through the same getter without re-running the constructor. Post-close access
therefore re-entered #backingFor on a missing backing and threw
BehaviorInitializationError; constructing that error involved ${this.#endpoint}
which cascaded through toString → identity → peerAddress → storage and
surfaced as a misleading "Snapshot failed" warning during ServerNode shutdown.
The getter now checks the endpoint construction status: in Destroying or
Destroyed it returns undefined, leaving Active/Initializing paths (including
late activation) unchanged.
Adds a regression test verifying state access on a closed peer returns
undefined rather than throwing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves teardown/shutdown robustness for ClientNode and behavior state access to avoid lifecycle-related exceptions (and misleading storage warnings) during ServerNode shutdown.
Changes:
- Make
endpoint.state[behaviorId]returnundefinedforDestroying/Destroyedendpoints to prevent re-entering behavior backing initialization during teardown. - Make
ClientNode.peerAddressresilient during shutdown by caching the peer address and toleratingDependencyLifecycleErrorfrom the storage fallback path. - Add regression tests covering
peerAddress/toString()teardown safety andendpoint.state[behaviorId]behavior after close.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/node/src/endpoint/properties/Behaviors.ts | Guard behavior state getter during endpoint teardown to avoid late backing initialization/errors. |
| packages/node/src/node/ClientNode.ts | Add peer address caching and tolerate dependency lifecycle errors so logging/toString paths don’t throw during shutdown. |
| packages/node/test/node/ClientNodeTest.ts | Add regression tests for peerAddress/toString teardown resilience and post-close behavior state access. |
Copilot review on #3680 noted that ClientNode.reset() (soft reset, without storage erase) closes behaviors but leaves #cachedPeerAddress in place. If the node is then re-commissioned to a different fabric, peerAddress would return the stale cached value until the new commissioning state loads. Add a resetWithMutex override that clears the cache. The explicit clear in eraseWithMutex stays because that path uses super.resetWithMutex() (static dispatch), which skips this override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a misleading
WalStorageDriver Snapshot failed: [destroyed-dependency] client stores is closingwarning during ServerNode shutdown.The shutdown sequence accesses
endpoint.state.<behaviorId>on a ClientNode whose behaviors have already been closed. The augmented descriptors onendpoint.statesurviveBehaviors.close()(so factory-reset can re-activate the same getter), but post-close access re-entered#backingForon a missing backing and threwBehaviorInitializationError. Constructing that error embedded${this.#endpoint}→toString()→identity→peerAddress→this.store→ServerNodeStore.clientStores(already inClosingstate) → re-threw[destroyed-dependency], masking the original cause.Two layered fixes:
Behaviors.#augmentEndpointgetter now checks endpoint construction status and returnsundefinedforDestroying/Destroyedendpoints. Active and Initializing paths (including late activation) are unchanged, so factory-reset still works. This kills the root cause.ClientNode.peerAddressmirrors live commissioning state into a cache and acceptsDependencyLifecycleErroraround the storage probe.toString()no longer throws even if a similar lifecycle race surfaces in another code path.🤖 Generated with Claude Code