process-segments consumer crash-loops after upgrading

### Self-Hosted Version

25.11.1

### CPU Architecture

x86_64

### Docker Version

25.0.3

### Docker Compose Version

2.32.3

### Machine Specification

- [x] My system meets the minimum system requirements of Sentry

### Installation Type

Upgrade from 25.8.0 to 25.11.1

### Steps to Reproduce

## Environment
- **Self-hosted version:** 25.11.1
- **Upgraded from:** 25.8.0
- **Host OS:** Linux x86_64
- **Available RAM:** ~12 GiB free of 48 GiB total
- **Shared memory (`/dev/shm`):** 22 GiB available, <1 MB in use
- **Disk:** 1.3 TB available
- **Swap:** Within expected system requirements (16gb) 

### Description

After upgrading from 25.8.0 to 25.11.1 (With a data migration to SeaweedFS), the `process-segments` consumer enters a continuous crash-restart loop and never successfully starts. All other containers are healthy and the Sentry application itself is functional. Only `process-segments` is affected.

The container starts, initializes its multiprocessing pool (including running `parallel_worker_initializer`), begins consuming from the `buffered-segments` Kafka topic, and then crashes approximately 20 to 30 seconds into processing every time, without exception.

### Observed behaviour

The crash cycle follows this consistent pattern:

1. Container starts, multiprocessing pool initializes successfully
2. Consumer is assigned `Partition(topic=Topic(name='buffered-segments'), index=0)`
3. Worker begins processing, then emits one or more incomplete batch warnings
4. Child process terminates (signal 17 / SIGCHLD)
5. Parent process crashes with `ChildProcessTerminated: 17`
6. Container restarts and the cycle repeats

### Error output

```
WARNING arroyo.processing.strategies.run_task_with_multiprocessing: Received incomplete batch (57.00% complete), resubmitting

Traceback (most recent call last):
  File ".../run_task_with_multiprocessing.py", line 860, in __reset_batch_builder
    input_block = self.__input_blocks.pop()
IndexError: pop from empty list

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../processor.py", line 440, in _run_once
    self.__processing_strategy.submit(message)
  File ".../healthcheck.py", line 29, in submit
    self.__next_step.submit(message)
  File ".../run_task_with_multiprocessing.py", line 879, in submit
    self.__reset_batch_builder()
  File ".../run_task_with_multiprocessing.py", line 862, in __reset_batch_builder
    raise MessageRejected("no available input blocks") from e
arroyo.processing.strategies.abstract.MessageRejected: no available input blocks

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  ...
  File ".../kafka.py", line 67, in run_processor_with_signals
    processor.run()
    ...
    raise ChildProcessTerminated(signum)
arroyo.processing.strategies.run_task_with_multiprocessing.ChildProcessTerminated: 17

ERROR arroyo.processing.processor: Caught exception, shutting down...
```

### Investigation

We investigated the following potential causes:

- **Memory pressure:** Ruled out. 12 GiB RAM available, `/dev/shm` has 22 GiB available with negligible usage. No OOM entries in `dmesg`.
- **Shared memory exhaustion:** Ruled out. `df -h /dev/shm` confirms ample space and only a single unrelated semaphore present.
- **Kafka connectivity:** Consumer successfully connects, receives partition assignment, and begins consuming before crashing.
- **Kafka topic reset:** Ruled out. In addition to recreating the consumer groups, we also deleted and recreated all Kafka topics related to this consumer, including `buffered-segments` itself, to ensure no stale messages, corrupt data, or leftover topic configuration was contributing to the crash. The issue persists on a completely fresh topic with no existing messages.

The error originates in `arroyo`'s `run_task_with_multiprocessing.py` where the parent process attempts to reset the batch builder after the child process is killed, finds no available input blocks, and crashes. The root cause appears to be the child worker process being terminated (SIGCHLD) silently before the parent can recover however the child's own stderr output is not surfaced in Docker logs.

### Steps to reproduce

1. Run a healthy self-hosted Sentry 25.8.0 instance
2. Upgrade to 25.11.1 following the standard upgrade procedure
3. Observe `process-segments` container entering a crash-restart loop

### What we have tried

- Restarting the full application 
- Re-running the full `./install.sh` / `docker compose up` sequence
- Verifying resource availability (RAM, shm, disk, swap)

### Full logs
[logs.txt](https://github.com/user-attachments/files/28149976/logs.txt)

### Expected Result

All containers are healthy after the upgrade and the application is online, however due to the crash loop on `process-segments` it is not fully operational.

### Actual Result

See the attached full logs in section 1.


### Event ID

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

process-segments consumer crash-loops after upgrading #4346

Self-Hosted Version

CPU Architecture

Docker Version

Docker Compose Version

Machine Specification

Installation Type

Steps to Reproduce

Environment

Description

Observed behaviour

Error output

Investigation

Steps to reproduce

What we have tried

Full logs

Expected Result

Actual Result

Event ID

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

process-segments consumer crash-loops after upgrading #4346

Description

Self-Hosted Version

CPU Architecture

Docker Version

Docker Compose Version

Machine Specification

Installation Type

Steps to Reproduce

Environment

Description

Observed behaviour

Error output

Investigation

Steps to reproduce

What we have tried

Full logs

Expected Result

Actual Result

Event ID

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions