Microsoft’s new Copilot agents have sparked a little controversy since Ignite 2024. These pre-built autonomous agents, such as the new Microsoft Teams Interpreter agent, and Facilitator agent, as well as the various bots designed for sales, marketing, and employee support aren’t just set to revolutionize human-AI collaboration. They could fundamentally change the future of work.
Obviously, not everyone is a fan of Microsoft’s approach to agentic AI. Salesforce even publicly slammed Microsoft following Ignite, suggesting that their “Copilot Agents” are really nothing new. But I’m not so sure. The Interpreter agent, in particular, is interesting for a few reasons.
I think this tool really could boost inclusivity and accessibility in meetings, bridge gaps between mobile and remote workers, and enhance “globalization” in the workplace. Of course – there will be a few challenges to overcome too.
Here’s a behind-the-scenes look at the Microsoft Teams Interpreter Agent, what it can do, and what benefits and challenges it’s likely to present in the years ahead.
What is the Microsoft Teams Interpreter Agent?
The Microsoft Teams Interpreter Agent is an autonomous agent that leverages speech-to-speech technology to translate language in real-time during Teams meetings. It was introduced alongside a series of other “pre-built” Copilot Agents designed for Microsoft products (like Teams and Dynamics 365) at Microsoft’s Ignite Conference in 2024.
This isn’t the first time Microsoft has allowed users to access “interpreters” in Teams Meetings. But before now, if you needed someone to “translate” conversations for you, then you’d have to assign an Interpreter role to another human being in a Teams Meeting.
The Interpreter Agent is different. It takes advantage of Microsoft’s AI technology to understand and translate language automatically – no human expert necessary. Currently the agent is still in its early stages. It’ll be rolling out into Public Preview at the beginning of 2025 – and will require a Microsoft 365 Copilot License, and a Microsoft 365 and Teams license to access.
Initially, users can choose from nine languages for the Interpreter agent, including English, French, German, Japanese, Korean, Italian, Chinese (Mandarin), Portuguese, and Spanish. However, if the agent is a success, Microsoft may introduce new languages in future.
How to Use the Microsoft Teams Interpreter Agent
Starting in 2025, anyone with the correct Microsoft 365 license (with access to Teams), and a Microsoft 365 Copilot license will be able to access a range of pre-built autonomous agents in their apps – including the Interpreter Agent.
Microsoft hasn’t shared whether administrators will need to enable access to specific features (like voice cloning) first. However, it seems likely that business leaders will be able to set up policies that govern how teams use AI tools. That’s already the case for Microsoft’s Copilot services.
Once users have access to the Interpreter Agent, they’ll be able to find it within the three dots (more options) menu in a Microsoft Teams meeting. All you’ll need to do is click on the “Language and Speech” options, choose “Turn on AI Interpretation”, and select the language you want.
Users that want to access the “voice cloning” feature, will need to set up their own voice profile, to basically train Microsoft’s AI tools on the nuances of their voice.
The Benefits of the Microsoft Teams Interpreter Agent
First, let’s look at the things that make this particular “autonomous agent” from Microsoft special. On the surface, it seems like your average translation bot. But it’s worth remembering this agent isn’t just listening to text and translating the words in live captions. It’s actually converting spoken words.
Speech-to-speech translation is something most other early translation tools hasn’t offered, and I honestly think it will be beneficial in meetings. Usually, people have enough to focus on in weekly stand-up or brainstorming session without having to actively read captions too.
With the interpreter agent, you’ll be able to keep your attention on what’s happening on screen (what your colleagues are saying or doing, or the slides and content they’re presenting).
Even more interesting (or creepy, depending on your perspective), the agent can actually mimic a user’s voice. That’s a pretty unique feature. Most “AI translation” tools sound pretty robotic or generic – they’re not really supposed to mimic “human beings” to a huge extent.
Microsoft’s decision to allow its bot to replicate human voices could lead to more personal, engaging interactions between people who speak in different languages. Users will need to set up a voice profile to access this feature – something many already had to do for noise suppression in the past.
On a broad scale, the Microsoft Teams Interpreter agent could do much more than simply help teams overcome language barriers. It could enable more inclusive, immersive discussions between teams, and have a major impact on meeting and productivity levels too.
But there are still problems to overcome.
Autonomous Agents and Voice Cloning: The Challenges
Agentic AI bots, or autonomous agents, naturally raise a few concerns among companies – much like many modern AI tools. Unlike your standard generative AI bot, these solutions can make decisions on their own, complete multi-stage tasks, and basically act like a virtual team member.
Obviously, there are benefits to these kinds of bots. They can help make teams more efficient, minimize more repetitive tasks, and improve engagement. Leaders at companies like Zoom even think autonomous agents could drive adoption of the 4-day work week, and improve work-life balance for today’s frequently overwhelmed employees.
But there are risks to giving AI tools too much decision-making control too. AI bots can still make mistakes, or make decisions that inadvertently harm other people. Then there’s the challenge of “job displacement” to think of too.
Adding voice cloning into the mix surfaces a range of new issues. Deepfakes have been spreading like wildfire across social media and online publications. They’re already making it harder than ever to distinguish the truth from “fake news”. President Joe Biden, VP Kamala Harris, Taylor Swift, and countless others have already been impersonated by bots.
Early in 2024, a team of cybercriminals even reportedly staged a meeting with a company’s executive team that featured deepfakes so convincing, the company wired about $25 million to the thieves.
Issues like this have even prevented other AI leaders from releasing voice cloning technology. For instance, OpenAI decided against introducing its “Voice Engine” tech.
Will Microsoft Teams Interpreter Agent Be Safe?
So, if Microsoft allows its Interpreter Agent to realistically “mimic” genuine human voices – is it really going to be safe? Right now, it’s hard to tell. Microsoft has said that the Interpreter Agent won’t store any biometric data, even after users create their voice profiles.
That could mean the risk of a criminal “breaking in” to a user’s Teams account and stealing their voice is a lot lower. However, there’s always a chance that a criminal could gain access to voice or biometric information in another way – such as by recording a meeting and using it to train their own agent with external tools.
The company also says that the bot won’t change the “tone” of the user’s voice. For instance, it won’t inadvertently make you sound snarky or sarcastic for no particular reason.
Additionally, users don’t have to access the “voice cloning” feature. This capability will only be enabled when users provide consent and enable it in their settings. Everyone on Teams can also change their mind and “disable” the feature later.
Still, even if Microsoft implements plenty of guardrails and controls to minimize the threats associated with voice cloning, businesses will need to do their own due diligence. Figuring out how to protect “cloned voices” in meetings and biometric data may become a standard part of most AI governance and security strategies in 2025 and beyond.
The Future of Microsoft Agents and Voice Cloning
Video conferencing companies have been experimenting with real-time language translation services for a while now. Zoom and Google Meet, for instance, have both offered various translation options in the past – with varying degrees of success.
Many AI-powered tools have faced a few issues. As mentioned above, most AI translations aren’t as “lexically rich” or engaging as those delivered by human interpreters. Plus, AI translator can struggle to convey analogies, colloquiums, and cultural nuances. OpenAI’s transcription tool (Whisper) for instance, was once found to deliver inaccurate translations eight out of ten times.
Still, progress is being made. Meta recently introduced new AI technology at Connect 2024, showing how they could mimic celebrity voices to answer user questions on platforms like Facebook, Instagram, and WhatsApp.
Microsoft has definitely made several important improvements to Copilot in the last year too. At this year’s Ignite, the company noted that Copilot responses are two times faster, and response satisfaction is up to three times higher. On top of that, Microsoft’s AI tools for Teams are becoming more multimodal. For instance, users in Teams can now ask Copilot for a quick summary of anything shared on screen, from a link, to a slide in a PowerPoint presentation.
In the future, the tech giant might even update the Microsoft Teams Interpreter Agent with the same multimodal capabilities, allowing it to instantly translate written content on screen, as well as speech.
Getting Started with the Microsoft Interpreter Agent
The only way to really tell if the Microsoft Teams Interpreter Agent is going to be more “revolutionary” than it is risky, is to put it the test. Soon, everyone will be able to experiment with this feature themselves, create their own voice profiles, and hopefully engage in more powerful multilingual meetings.
Personally, though, I’d recommend proceeding with caution. Using the Interpreter Agent on its own shouldn’t cause any problems, but ask yourself if you really want to allow an AI to clone your voice before you set up your voice profile.
Want to learn more about Microsoft’s Copilot Agents? Check out our complete guide here
from UC Today https://ift.tt/P9zXeDI
0 Comments