AI summarisers are arguably one of the most used features of AI on a UC platform.
Having your weekly, hour-long team meeting summarised into a nice little bite-sized package at the end not only saves time, allows people to focus on the talk and not on note taking, but it ensures the thrust of the conversation can be referred back to and any important takeaways actioned. Or does it?
A new BBC study decided to put AI chatbots to the test and see how they would summarise news content from its website, from which it then asked them questions about the news.
What it reported was answers that contained “significant inaccuracies” and distortions.
Testing OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Perplexity AI, it found the majority (51 per cent) of all AI answers to questions about the news were judged to have significant issues of some form.
Most worrying for those in the UC sphere, Microsoft’s Copilot and Google’s Gemini, the two AI copilots embedded in UCaaS solutions, exhibited the most significant issues of the test group.
Examining the Study
The BBC study saw the AI systems tasked with summarising 100 news stories, with their responses evaluated by expert journalists in the relevant fields who assessed the quality of the AI assistants’ answers.
In addition to the distortions and inaccuracies noted above, the study
found 19% of AI answers that cited BBC content included factual inaccuracies, such as incorrect statements, numbers, and dates.
Examples of inaccuracies include Gemini misrepresenting medical advice from the UK’s National Health Service (NHS) on use of vaping as a smoking cessation aid.
ChatGPT and Copilot reported that former British heads of state, Rishi Sunak and Nicola Sturgeon, were still in office months after their departures.
Perplexity misquoted BBC News regarding the Middle East, portraying Iran’s initial actions as showing “restraint” and describing Israel’s actions as “aggressive.”
The report concluded the chatbots “struggled to differentiate between opinion and fact, editorialised, and often failed to include essential context,” in addition to factual inaccuracies.
How it Affects a UC Setting
Now, sentiment analysis of news is arguably more difficult than summarising a meeting.
There are a lot more nuances to be taken into account, like for instance use of language when describing a political situation.
For instance, the BBC study showed Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed “restraint” and described Israel’s actions as “aggressive”.
This language, in the context of a news story, would be considered subjective, and generally is something that is not associated with objective reporting.
Thus, such intricacies are unlikely to change the meaning too much in a meeting about what the marketing focuses this week should be.
However, a wide range of users, enterprises, and industries use UC tools.
Therefore, should this sentiment analysis go awry in summarising minutes from a client call in an engineering firm or from a meeting between legal counsels prior to a case, then the potential for damage increases.
What This Means for AI Use in UCaaS
Striking a balance between using it and taking it with a pinch of salt might be hard, as some changes may be so subtle that you don’t know what’s been changed.
However, for those using it in less stringent and nuanced situations, a quick, manual read over could give you some clues.
Equally, with the report showing both OpenAI and Perplexity faring better than the built-in, integrated AI copilots of the UC providers, then using third-party APIs to integrate other AI systems could prove beneficial too.
But beyond these issues, whether this is a teething issue or something inherent in AI systems, remains to be seen.
Yes, AI systems suffer from hallucinations that sometimes bring in inaccuracies in summarisations.
Microsoft has created ways to measure, detect and mitigate these hallucinations based on expertise from developing its own AI products like Microsoft Copilot.
Engineers spent months grounding Copilot’s model with Bing search data through retrieval augmented generation, a technique that adds extra knowledge to a model without having to retrain it.
Although that might not apply directly to summarisation, and the BBC study still shows flaws, the fact that it is being given so much attention shows the importance these companies place on AI copilot accuracy.
To summarise (pardon the pun), a balanced approach may see users of things like Microsoft’s Copilot in their UC solution not as a summariser for everything but as a tool that has applications for summarising some things, sometimes.
from UC Today https://ift.tt/zxT9um7
0 Comments