Document Indexes let you index documents using a few different index types, and then query the indices in natural language. Document Indexes use GPTIndex under the hood; I suggest getting familiar with some of these concepts:
GPT Index Usage Pattern - GPT Index documentation
GPT Index Use Cases - GPT Index documentation
<aside> 💡 Warning: indexing documents can make a lot of calls to your underlying LLM! Read more here: https://gpt-index.readthedocs.io/en/latest/how_to/cost_analysis.html
</aside>
A DocumentIndex consists of an index type, a document loader, and a config object.
https://gpt-index.readthedocs.io/en/latest/guides/index_guide.html#vector-store-index
https://gpt-index.readthedocs.io/en/latest/guides/index_guide.html#list-index
This index type uses Python’s Pandas library to answer questions about tabular data. To use it, you must add one well-formed CSV or Excel file (make sure to include column headers, and in the case of Excel, only the first spreadsheet will be used).
When using this index type, your natural language query (e.g. ‘What was the average monthly expense?’) will be translated into a Pandas command (e.g. df[df.columns[1]].mean()
). and executed.