I've built an open source streaming library for async pipelines

(github.com)

1 points | by ju-bezdek 2 days ago ago

1 comments

$ju-bezdek 2 days ago

I’ve been working with LLMs a lot lately, and one consistent UX bottleneck is inference speed.
Many tasks follow this pattern: process small chunks → batch for inference → split results again. Parallelizing helps, but naive asyncio.gather approaches often backfire—each stage waits on the slowest batch, killing responsiveness. Mixing fast per-item logic with slower batch steps needs smarter coordination.
Technical approach: Built a pipeline library that handles the streaming coordination automatically. Uses async generators throughout with intelligent queuing for order preservation when needed.
Architecture decisions:
Stream-first design: Results flow by default, with optional collection Flexible ordering: Choose between speed (unordered) and sequence (ordered) Memory efficiency: O(batch_size) memory usage, not O(dataset_size) Backpressure handling: Automatic coordination between fast and slow stages Error boundaries: Configurable failure strategies at task level.