Google Gemini, the latest multimodal AI solution from the Google team, has finally arrived.
First introduced at the Google I/O developer conference in May 2023, Google Gemini represents a crucial step forward in the brand’s artificial intelligence roadmap. It stems from the work of Google’s now-combined DeepMind and Brain AI labs, which joined forces on a new LLM journey.
The initial announcement for Gemini emerged just after the launch of Bard, Duet AI, and Google’s PaLM 2 LLM. However, the tech giant only introduced the first iteration of the solution on December 6th, along with a clear roadmap for future progression.
If nothing else, Google Gemini highlights Google’s ongoing quest to regain some AI market share from competitors like Meta and Microsoft as the demand for generative AI grows.
Here’s everything you need to know about Google Gemini and how to use it.
What is Google Gemini? The Basics
Google Gemini is a set of large language models (LLMs) that leverages training techniques from AlphaGo, such as tree search and reinforcement learning. It’s intended to become Google’s “flagship AI,” powering many products and services within the Google portfolio.
According to CEO and Co-Founder of Google DeepMind, Demis Hassabis, Gemini is the most “capable” model they’ve ever built. It’s the result of significant collaborative efforts by multiple teams across Google and Google Research.
Unlike other models in the emerging LLM arms race, Google Gemini was built to be multimodal from the ground up. It can seamlessly generalize, understand, and combine different data types, such as text, code, audio, video, and images.
The solution was trained on Google’s in-house AI chips and tensor processing units, such as the TPU v4 and v5e. It’s one of the most flexible models on the market and one of the most efficient. Where other multimodal processes would need vast amounts of power, Gemini can run on everything from data centers to mobile devices.
What is Google Gemini Nano, Ultra, and Pro?
The version of Google Gemini released in December 2023 is just the first iteration of the model – labeled “Gemini 1.0”. It has been optimized for three different “sizes”:
Google Gemini Nano
Gemini Nano is the “lite” pared-down model of the LLM, available in two sizes: Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters).
This version of Gemini is designed to run on mobile devices and will soon preview in Google’s AI Core app via Android 14 on the Pixel 8 Pro app. Though Nano is exclusive to the Pixel 8 Pro, for now, developers can apply for a sneak peek at the technology.
Nano will power various features previewed by Google during the Pixel 8 Pro unveiling in October, such as summarization within the Record app and suggested replies for messaging apps.
Google Gemini Pro
Google Gemini Pro runs on Google’s data centers and powers things like Google Bard, the chatbot similar to Microsoft’s Copilot solution. It will soon roll out into other Google tools, such as Duet AI, Google Chrome, Google Ads, and the Google Generative Search experience.
Google Gemini Pro will launch on December 13th for customers using Vertex AI (Google’s fully-managed machine learning platform). It will also be integrated into Google’s Generative AI developer suite going forward.
According to Google, Gemini Pro is more effective at tasks like brainstorming, writing, and summarizing content – outperforming OpenAI GPT-3.5 in six core benchmarks.
Google Gemini Ultra
Gemini Ultra, still unavailable for widespread use at this point, is the most capable model in the collection. Like Pro, it’s trained to be natively multimodal and was pre-trained and fine-tuned on various codebases.
Gemini Ultra can comprehend nuanced information in text, code, and audio and answer questions related to complicated topics. Ultra exceeds current state-of-the-art results on around 30 of the 32 widely-used benchmarks used for LLM development.
How Powerful is Google Gemini? Performance Insights
Ever since Google first announced the impending arrival of Gemini, analysts have been trying to predict just how powerful it could be. We finally have some genuine data shared by Google in the latest “Gemini Technical Report.”
The AI team said they’ve been carefully testing their Gemini models for the last few months, evaluating their performance in various tasks. Although insights into the performance of Gemini Nano and Gemini Pro are limited, there’s plenty of data to suggest Ultra bulldoze LLM competitors.
With a score of around 90%, Gemini Ultra is the first solution capable of outperforming human experts in Massive Multitask Language Understanding (MMLU) tests. These tests use a combination of 57 different subjects, such as physics, math, history, and ethics, to examine real-world knowledge and problem-solving capabilities.
According to the team, Google’s new benchmark approach to MMLU means Gemini can use its reasoning abilities to “think more carefully” before it answers questions.
Gemini Ultra also achieved a state-of-the-art score of 59.4% on the new MMMU benchmark. This benchmark looks at the performance of LLMs on multimodal tasks that require deliberate reasoning.
Google says Gemini Ultra outperformed other leading models without assistance from object character recognition, highlighting the native multimodal capabilities of the solution.
This doesn’t necessarily mean Google Gemini won’t suffer from the same issues other language models face, such as AI hallucination. Even the best generative AI models can respond problematically when prompted in specific ways.
Is Gemini Better than GPT?
As demand for generative AI solutions and LLM models grows, Google has plenty of competition in the current market. Tons of up-and-coming models could outperform Gemini, particularly if they continue to evolve, such as Falcon 180B.
However, many tech enthusiasts are only interested in answering one question: “Is it better than GPT-4?” GPT-4, OpenAI’s multimodal large language model, is pretty much the benchmark all developers are using to assess the potential of new LLMs.
Fortunately, Google has made comparing the performance of Gemini and GPT-4 pretty simple, with a simple graph you can find here. According to Google, GPT-4 only outperforms Gemini in one area, called “HellaSwag reasoning.” That’s the commonsense reasoning used for everyday tasks.
GPT-4 scored 95.3% in this area, compared to Gemini’s 87.8%.
In every other area, Gemini Ultra came out on top. Here’s a quick insight into the “text” stats:
Capability | Benchmark | Gemini Ultra | GPT-4 |
General | MMLU (Representation of various questions in 57 subjects) | 90.0% | 86.4% |
Reasoning | Big-Bench Hard (Challenging tasks requiring multi-step reasoning)
DROP (Reading comprehension) |
83.6%
82.4% |
83.1%
80.9% |
Math | GSM8K (Basic arithmetic manipulation)
MATH (Challenging math problems) |
94.4%
53.2% |
92.0%
52.9% |
Code | HumanEval (Python code generation)
Natural2Code (Python code generation) |
74.4%
74.9% |
67.0%
73.9% |
While these stats only show us the power of Gemini Ultra, it’s worth noting that Google also found Gemini (in general) outperforms GPT-4 in every multimodal task. Remember, GPT-4 might be multimodal but can only process images and text.
Gemini, on the other hand, can process video, audio, images, and text. As Google continues to train its toolkit, it could significantly surpass the performance of various other models.
What Makes Google Gemini Different?
When Google first introduced Gemini to the masses, Demis Hassabis said the model would have advanced abilities in problem-solving and intelligent reasoning. He even noted Gemini might use memory to fact-check sources against Google Search and improved reinforcement learning to reduce hallucinated content. However, that still hasn’t been confirmed.
What we know is that Google Gemini sets itself apart from the competitors in the LLM market in various ways, starting with its architecture.
Until now, the typical approach to creating multimodal models involved training various components for different models and stitching them together.
Gemini was designed to be natively multimodal. It was pre-trained on different modalities and then fine-tuned with additional multimodal data.
Google Gemini is incredibly effective at:
Sophisticated multimodal reasoning
The sophisticated multimodal reasoning capabilities of Gemini 1.0 mean the model can make sense of more complex written and visual information. It’s uniquely skilled at drawing insights from vast amounts of data. The tool can even filter through hundreds of thousands of documents to deliver breakthrough insights at remarkable speeds.
Plus, because Gemini can recognize and understand images, audio, text, and more, at the same time, it better understands nuanced information. It can answer complex questions and assist with everything from math to physics queries.
Advanced coding
The first version of Gemini can understand, generate, and explain high-quality code in some of the world’s most popular programming languages, including Java, C++, AND Go. Gemini excels in various coding benchmarks and can be used as the engine for advanced coding systems.
For instance, Google presented “AlphaCode” two years ago, the first AI code generation system to perform exceptionally in programming competition. Using a specific version of Gemini, Google has created “AlphaCode 2,” which takes these results to the next level.
Compared to the original AlphaCode, the new model solves almost twice as many problems and performs better than 85% of other competition participants.
Efficient scalability
According to Google, Gemini 1.0 has been trained at scale on AI-optimized infrastructure using proprietary Tensor Processing Units. On TPU’s Gemini runs even faster than smaller, less-capable models. Plus, Google even announced a new TPU system coming soon.
Developers will soon be able to access the Cloud TPU v5p to train their own cutting-edge AI models. According to the brand, this will help accelerate Gemini’s development further and assist enterprise customers in building their own AI solutions.
Is Google Gemini Safe? Ethics and Security
As LLM and generative AI models continue to develop, so do concerns about their safety. Like most market leaders, Google has a set of specific “AI principles” to ensure its technology is safe, ethical, and secure for users.
Gemini has some of the most comprehensive safety evaluations of any of Google’s AI models. The company is carefully analyzing the technology for evidence of bias and toxicity. Plus, they’ve conducted research into risk areas like persuasion and autonomy.
Google is working with a diverse selection of experts to stress-test its models going forward. Additionally, they’re using benchmarks like “Real Toxicity Prompts” to diagnose content safety problems during Gemini’s training phases.
To further limit potential harm, Google has built dedicated safety classifiers to identify content involving stereotypes or violence. The team also says they’re continuing to work on known challenges like attribution, grounding, and corroboration.
How to Access and Use Google Gemini
Currently, Google Gemini 1.0 is rolling out across various products and platforms. The easiest place to try the “Pro” version of the solution is in Bard, Google’s ChatGPT competitor. This app is now powered by a fine-tuned version of Gemini Pro.
According to Google, this marks the biggest update to Bard since it launched. Initially, it will be available in English across 170 countries and territories. However, new languages should roll out in the future. Notably, Google will also be introducing “Bard Advanced” next year.
Gemini will also be available in Google Search, Ads, and Duet in the coming months.
Already, Google is beginning to experiment with Gemini in Search, where it says it’s making search experiences faster for users, with a 40% reduction in latency.
Elsewhere, Gemini Nano will be available in the Pixel 8 Smartphone, helping with “smart reply” features in tools like WhatsApp and recording summarization.
Developers interested in experimenting with Gemini can access the “Pro” service via the API for Google AI Studio or through Google Cloud Vertex. The AI Studio is probably the easiest option, as it’s a free, web-based developer tool, great for prototyping and quickly launching app.
However, Vertex AI allows for more comprehensive customization of Gemini, with complete data control and extra Google Cloud security, safety, and governance features.
Gemini Ultra, on the other hand, isn’t available yet. Google is running more safety and trust checks to ensure the solution suits the current market. As part of this process, it is making Gemini Ultra available to certain developers and partners in “beta mode.”
Looking to the Future with Google Gemini
We still have a while to wait before experimenting with the whole Google Gemini experience and the “Ultra” edition. However, so far, Google seems to be on track toward its goal of becoming a true market leader in the AI landscape once again.
Gemini seems to be setting a new standard for Google’s AI journey. The company says it represents the start of a new era of LLM development. The team will continue to extend its capabilities for future versions. Already, they’re planning new advancements in planning and memory and will soon increase the “context window” for bulk information processing.
Google believes we’re heading into a future powered by “responsible” AI. They say this future will pave the way for new levels of innovation, creativity, and knowledge sharing for billions of people worldwide. We can’t wait to see what the next generation of developers will accomplish with a solution as powerful as Google Gemini.
from UC Today https://ift.tt/G5QKNMr
0 Comments