Copilot for Microsoft 365 has been generally available to corporate customers since November 2023. Around the same time, Microsoft published some exciting research results from the Copilot for Microsoft 365 Early Access Program. They show that Copilot can make users more productive, efficient and creative: 73 per cent of respondents say they can complete tasks faster with it, for 87 per cent Copilot makes it easier to get started with a first draft using Word, 64 per cent spend less time handling emails, and even more save time on everyday tasks (71%) and when searching for files (75%). 84 per cent said that Copilot makes it easier for them to take targeted action after a meeting.
Context data increases performance
One reason for these benefits is, of course, the performance of large AI language models (LLMs) such as GPT-4, which are pre-trained with huge amounts of data and thus learn to understand, summarise, predict and generate content. But there is more to it than that: Microsoft has succeeded in significantly increasing the performance of LLMs even further. The method: The AI is given access (in a controlled and secure manner) to the data and documents of an organisation that are particularly relevant to the Copilot users and their tasks. This "grounds" the AI model in the reality of the organisation, which further improves the quality of its answers.
The principle: Retrieval-Augmented Generation
We already mentioned it briefly in the article linked above: the principle behind Microsoft's AI grounding technology for Copilot is called "Retrieval Augmented Generation", or RAG for short. Because it is essential for Copilot’s use of data, we want to go into it in more detail today.
The "generation" in RAG refers to the responses generated by an LLM: this is to be strengthened or improved ("augmentation") through the “retrieval” of relevant and up-to-date information.
RAG thus combines the strengths of generative LLMs and systems that store knowledge in a retrievable form. After all, while LLMs can solve a large number of general tasks, they quickly become inaccurate because they can only implicitly store the knowledge they acquire during training (changing the weights of their parameters in the neural network). This also makes it difficult to understand exactly how an LLM arrives at its answer. Query-based systems (imagine Wikipedia, for example), on the other hand, answer very precisely and provide sources for this, but cannot generate anything themselves – they only reproduce what someone has entered at some point.
In fact, the inventors of the term "Retrieval Augmented Generation" have supplemented a generative language model with information from Wikipedia, thereby significantly improving its performance in knowledge-intensive tasks. More precisely, a user query is first passed to a retrieval AI model, which selects suitable text fragments from Wikipedia articles, weights them and forwards the most relevant ones to the LLM as context for the query (see illustration).
Data sources for Microsoft Copilot
You could also link your Copilots to Wikipedia. However, it is even more useful to make your company's own documents from your users' day-to-day work accessible to the Copilot AI. Only then will it have the necessary context to understand your users' queries correctly and provide really useful assistance.
Copilot receives the data for RAG via Microsoft Graph. This platform ensures secure and high-performance data access in Microsoft 365 and the Microsoft Cloud. By default, all data from the M365 applications, such as Office documents, emails or SharePoint pages, is already available to Copilot, provided that the requesting user is permitted to access it in their authorisation context.
Imagine you want Copilot to summarise the last meeting in Teams for you. The tool then not only uses the voice recording of the meeting, including the chat history, it can also analyse associated documents, appointments or emails that were recently sent to you by the participants. Copilot therefore knows the context of individual meeting topics, makes fewer mistakes when summarising and can suggest more relevant follow-up measures if necessary.
External data – outside the Microsoft Cloud – can also be accessed via Microsoft Graph. After all, the work context of your users will likely include also data from ERP, CRM, HR or project management systems, documents in Azure storage resources and more. To connect such external systems, so-called connectors for Microsoft Graph must be set up. Microsoft already offers various ready-to-use Graph connectors for popular data sources, for example Salesforce, Jira, Confluence or SQL databases. Customised connectors can also be created via an API. In addition, a growing number of third-party connectors are available, already over 100 at present.
An index for your data
Another way to provide your own data and functions for Copilot is to use plugins that address web services via their API. Plugins allow real-time access to certain data, but also have a disadvantage that should not be underestimated: they do not allow uncomplicated indexing.
Simply connecting to data sources is often not enough for AI applications, as it must be possible to find and retrieve the required information in fractions of a second. This is ensured by indexing. In the first presentation of Retrieval Augmented Generation mentioned above, a vector index was created for the connected Wikipedia articles, because vector-based indices are well suited to capturing and comparing the meaning of text segments.
This also works for your company data: Data or text fragments from your documents are converted into multi-dimensional numerical representations (also called vector embeddings) all the way down to word level. In a search query, the retrieval AI compares the numerical similarity ("distance") between the vectorised query and the vector embeddings of documents and thus also captures semantic similarities between the corresponding text passages.
The Copilot system also uses vector indexing. This is done with the help of Semantic Index. This index supplements the conventional indexing by Microsoft Graph – full-text keyword search plus user-related signals – with vector embeddings.
In short: If you make certain data and documents accessible in Microsoft Graph, with the help of your users' access authorisations and, if necessary, via connectors, then Copilot can also access them and the vector index is available in addition to the full-text search. If, on the other hand, you use plugins, you can by default only retrieve their data, but not index it via Microsoft Search without technical tricks.
Conclusion: Using your own data for Copilot
As you can see, you have various options to make your own data available to Copilot and tailor the tool to your requirements. This provides you with powerful functions – but you also need to take appropriate precautions to prevent information being made accessible to the wrong people or the AI using data that is not intended for this purpose. You can find further information on the secure use of data in Copilot in our blog article Getting ready for Microsoft 365 Copilot – It's all about data.
Get AI-ready
Reach out to our experts to schedule an AI strategy session for you and your team.
Get AI-ready
Reach out to our experts to schedule an AI strategy session for you and your team.