fix(neon_local): allow restart of crashed compute endpoints by hawthornic · Pull Request #12898 · neondatabase/neon

hawthornic · 2026-05-20T15:08:43Z

Problem

When a compute endpoint is killed externally (kill -9), neon_local cannot restart it:

cargo neon endpoint stop fails because pg_ctl cannot signal a dead PID
cargo neon endpoint start fails because check_conflicting_endpoints sees the crashed endpoint as a "duplicate primary" (status is Crashed, not Stopped)
User is stuck

Root cause

check_conflicting_endpoints iterates all endpoints and never excludes the endpoint being operated on. The status filter only excluded Stopped, not Crashed. Additionally, stop() unconditionally called pg_ctl stop which fails when the postmaster is already dead.

Fix

check_conflicting_endpoints: Added exclude_id: Option<&str> parameter so an endpoint does not conflict with itself when being restarted. The Create path passes None (no-op), the Start path passes the endpoint being started.
stop(): Before attempting pg_ctl stop, check if the endpoint is Crashed. If so, verify the postmaster process is truly dead via kill(pid, None) (signal 0 — existence check only), then clean up the stale postmaster.pid and let compute_ctl shut down. If the postmaster is alive despite the Crashed heuristic (300ms TCP timeout false-positive during startup), fall through to the normal stop path.

Testing

Manual testing confirmed all three paths work:

kill -9 → direct cargo neon endpoint start ✅
kill -9 → cargo neon endpoint stop → start ✅
Normal stop / start unaffected ✅

A regression test in test_runner/ is planned but not included in this PR.

When a compute endpoint is killed externally (kill -9), its status becomes Crashed (stale postmaster.pid + no TCP connectivity). The user then cannot restart it: - 'endpoint stop' fails because pg_ctl cannot signal a dead process - 'endpoint start' fails because check_conflicting_endpoints treats the crashed endpoint as a duplicate primary of itself Fix: 1. Add exclude_id parameter to check_conflicting_endpoints so an endpoint does not conflict with itself when being restarted. 2. Handle the Crashed state in stop(): verify the postmaster is truly dead via kill(pid, None), clean up the stale postmaster.pid, and return success instead of bailing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(neon_local): allow restart of crashed compute endpoints#12898

fix(neon_local): allow restart of crashed compute endpoints#12898
hawthornic wants to merge 1 commit into
neondatabase:mainfrom
hawthornic:fix/crashed-endpoint-restart

hawthornic commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hawthornic commented May 20, 2026

Problem

Root cause

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant