optimal chunk size is strongly query-dependent - very true.
Faced similar issues. Ended up adding a agentic tool call layer on the top to retrieve the nearby chunks to handle a case where a relevant answer was only partially available in a chunk (like a 7 step instruction in which only 4 were available in a chunk). It worked ok.
The RAG was setup on a bunch of documents, most of them were manuals containing steps about measurements, troubleshooting, and replacing components of industrial machines.
The issue was that most of these steps were long (above 512 tokens). So the typical chunk window wouldn't capture the full steps. We added a tool calling capability by which LLM can request nearby chunks of a given chunk. This worked well in practice, but burned more $$.
optimal chunk size is strongly query-dependent - very true.
Faced similar issues. Ended up adding a agentic tool call layer on the top to retrieve the nearby chunks to handle a case where a relevant answer was only partially available in a chunk (like a 7 step instruction in which only 4 were available in a chunk). It worked ok.
Interesting. Can you elaborate a bit more please
The RAG was setup on a bunch of documents, most of them were manuals containing steps about measurements, troubleshooting, and replacing components of industrial machines.
The issue was that most of these steps were long (above 512 tokens). So the typical chunk window wouldn't capture the full steps. We added a tool calling capability by which LLM can request nearby chunks of a given chunk. This worked well in practice, but burned more $$.