Hamish: Research Discovery is obviously more than just search. But search is quite a high priority for research consumers to make sure they can find specific content they need. How are people currently approaching search and how, in your opinion, should they approach it?
Simon: There is no single sensible approach to search. You should be guided by what you want to get out of it. If you’ve got a very wide corpus of information and a wide audience who want to search absolutely anything, and you’re happy to provide them generic results, then using full-text or vector search as the core component makes sense.
However, if you have a relatively niche, nuanced and semi-structured domain where control, auditability, accuracy and relevance are more important, then having a knowledge graph at the centre of it makes more sense.
“The great thing is that knowledge graphs can be integrated to enhance existing search technologies, so the ultimate solution in this space looks like a combination of the two.”
Hamish: How would a combination of these technologies come together best for the Research industry? People are talking about “RAG” quite a lot. Is that what you mean?
RAG (Retrieval Augmented Generation) is a great example of this.
The most common example of RAG at the moment is Microsoft Bing Chat which allows users to search the web via a Chatbot. It uses traditional Bing search to find content related to the user’s query, then uses OpenAI’s GPT to generate an answer based on the content of the results in order to try to meet the user’s expectations.
“When it comes to searching for complex content, understanding the user’s intent is important.”
If you can, you want to meet the user’s expectations that when they ask a question, your system knows what they were trying to ask, rather than being too literal. Bing Chat is a great example of this working better than traditional search.
But for Investment Research, you need to be a little bit careful with this.
RAG is ultimately a product of the quality of the search result and identifying concise relevant information based on the ‘prompt’. There is a lot of active research into how to optimise this search / LLM interaction and at Limeglass we’ve actually already got a neat solution that works out of the box to provide LLMs with relevant, concise & clean information. We can also go one step further and we’re able to show the actual relevant source content alongside the result.
Because we have this head-start in this area we’ve moved on to thinking about other ideas where the power of LLMs can be utilised to help interpret the user’s intent and rank the relevant extracted original content without exposing the user to the risks of hallucinations and overly assertive incorrect answers.