The Impact of Retrieval and Chunking on Finance: Why Long-Context Isn’t Enough
In the realm of artificial intelligence (AI) answer generation, the advent of very long-context large language models (LLMs) has sparked debates about the necessity of retrieval strategies. Some argue that with these advanced LLMs, the need for retrieval is diminished, and instead, one can rely on fitting all documents into a large context window for the LLM to derive relevant information. However, a case study on financial document analysis sheds light on the essential role of retrieval and chunking strategies in ensuring high-quality AI answer generation, asserting that these elements are even more crucial than the quality of the generating model itself.
When it comes to complex documents like financial filings such as SEC 10-K and 10-Q forms, retrieval-augmented generation (RAG) systems face unique challenges. Precision in pinpointing relevant information is paramount in scenarios where details like a $2.1 trillion increase in assets under management (AUM) or regulatory changes impacting capital requirements need to be accurately extracted from thousands of lengthy PDFs. Hence, financial teams require an RAG system that not only retrieves information but does so with fine-tuned precision.
The typical RAG pipeline involves several key steps, including parsing documents to extract text, chunking to segment text into meaningful units, retrieval to search for relevant chunks, and generation to synthesize these chunks into a coherent response. Each of these components can be individually tuned to enhance the quality of the output generated by the AI chatbot. In a detailed examination focused on a curated dataset of SEC filings, the study highlights that in the face of current long-context models, chunking and retrieval strategies significantly outweigh the computational power of the generative language models in determining output quality.
The study’s findings emphasize critical considerations in enhancing AI answer generation quality. Firstly, appending LLM-generated global document context can significantly boost response accuracy, outperforming LLM-generated chunk-specific context. Moreover, the optimal chunk size and retrieval of multiple chunks play a pivotal role in improving accuracy, as overly large chunks may dilute relevance. Additionally, an optimized retrieval pipeline with moderate chunk sizes and enhanced retrieval techniques elevates performance, bridging the gap between different generation models.
Snowflake Cortex Search, along with the broader Snowflake Cortex AI ecosystem, provides a robust solution tailored to address these challenges posed by extensive financial filings. By offering a flexible and production-ready tool, Snowflake Cortex Search enables precise extraction of relevant information from complex financial documents.
The study also delves into various chunking strategies explored during the experiments on SEC filings. These strategies, from recursive chunking to semantic chunking and markdown-header-based chunking, were augmented with LLM-based metadata injection to enhance retrieval accuracy without complicating the pipeline. The incorporation of LLM-driven metadata, both at the document and chunk levels, offers a more nuanced and contextually rich approach to chunking, thereby improving the overall retrieval accuracy of the system.
In conclusion, the study underscores the pivotal role of retrieval and chunking strategies in AI answer generation, particularly in complex document analysis like financial filings. By prioritizing precision, context, and optimized retrieval techniques, financial teams can enhance the quality and accuracy of answers generated by AI systems even in the era of advanced long-context language models.