What is Google Gemini, and why is it such a significant part of the tech giant’s plan to become a leader in artificial intelligence? Google Gemini is a collection of multimodal LLMs (Large Language Models) created by Google’s AI-focused team.

It’s also now the name of Google’s generative AI app (previously Bard), which offers a similar experience to ChatGPT and Microsoft Copilot. Simply put, Gemini is the name of the next stage of Google’s generative AI revolution. The title confusingly encompasses both the new models created by the company and a handful of intelligent apps.

Today, we’re going to cover everything you need to know about Gemini in all of its forms.

What is Google Gemini? The Gemini Models

To answer the question, “What is Google Gemini?” fully, let’s start by looking at the heart of the Gemini ecosystem – Google’s Gemini models.

Google Gemini is a collection of proprietary large language models (LLMs) that leverages training techniques from AlphaGo, such as tree search and reinforcement learning. It’s intended to become Google’s “flagship AI,” powering many products and services within the Google portfolio.

According to CEO and Co-Founder of Google DeepMind, Demis Hassabis, Gemini is the most “capable” model they’ve ever built. It’s the result of significant collaborative efforts by multiple teams across Google and Google Research.

Unlike other models in the emerging LLM arms race, Google Gemini was built to be multimodal from the ground up. It can seamlessly generalize, understand, and combine different data types, such as text, code, audio, video, and images.

The solution was trained on Google’s in-house AI chips and tensor processing units, such as the TPU v4 and v5e. It’s one of the most flexible models on the market and one of the most efficient. Where other multimodal processes would need vast amounts of power, Gemini can run on everything from data centers to mobile devices.

How Do Google Gemini Models Work?

The Google Gemini models were trained on a massive corpus of data and refined with various neural network techniques to define how they understand content and interact with users.

Specifically, like many modern large language models, the Gemini solutions use a transformer-based neural network architecture. These models have been carefully enhanced to process lengthy contextual sequences in a multimodal format. This means they can understand and interact with text, audio, and video (unlike most competitors).

The Google DeepMind team used various attention mechanisms in the transformer decoder to enable this, and the models were trained on various multilingual and multimodal data sets.

Like Copilot and ChatGPT, Gemini models can generate, summarize, translate, and understand text. However, according to Google, they also excel in a few key areas, such as:

Sophisticated multimodal reasoning

The sophisticated multimodal reasoning capabilities of Gemini 1.0 mean the model can make sense of more complex written and visual information. It’s uniquely skilled at drawing insights from vast amounts of data. The tool can even filter through hundreds of thousands of documents to deliver breakthrough insights at remarkable speeds.

Plus, because Gemini can recognize and understand images, audio, text, and more, at the same time, it better understands nuanced information. It can answer complex questions and assist with everything from math to physics queries.

Advanced coding

The first version of Gemini can understand, generate, and explain high-quality code in some of the world’s most popular programming languages, including Java, C++, and Go. Gemini excels in numerous coding tasks and can be used as the engine for advanced coding solutions.

For instance, Google presented “AlphaCode” two years ago, the first AI code generation system to perform exceptionally in programming competition. Using a specific version of Gemini, Google has created “AlphaCode 2,” which takes these results to the next level.

Compared to the original AlphaCode, the new model solves almost twice as many problems and performs better than 85% of other competition participants.

Efficient scalability

According to Google, Gemini 1.0 has been trained at scale on AI-optimized infrastructure using proprietary Tensor Processing Units. On TPU’s Gemini runs even faster than smaller, less-capable models. Plus, Google even announced a new TPU system coming soon.

Developers will soon be able to access the Cloud TPU v5p to train their own cutting-edge AI models. According to the brand, this will help accelerate Gemini’s development further and assist enterprise customers in building their own AI solutions.

What is Google Gemini? The Model Versions

Here’s where things get a little complex. Google Gemini isn’t just a single LLM; it’s a series of LLMs designed for different use cases. There are currently three different “sizes” of Gemini, which are available in different formats and offer specific levels of functionality.

What is Google Gemini Nano?

Gemini Nano is the “lite” pared-down model of the LLM, available in two sizes: Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters). So far, the Nano version of the model powers two features on the Pixel 8 Pro: Smart Reply in Gboard and Summarize in the recorder app.

The recorder app uses Gemini to create intelligent summarizations of recorded conversations, interviews, and presentations, even when a Wi-Fi connection isn’t available. Notably, the system also ensures no data leaves your phone in the process.

In Gboard, Gemini Nano allows users to generate contextual responses to conversations in apps like WhatsApp rapidly.

What is Google Gemini Pro?

Google Gemini Pro is the “main” version of the LLM most users will interact with. It’s the solution that powers the free version of the renamed Bard chatbot. According to independent studies, Gemini Pro is more effective than tools like OpenAI’s GPT-3.5 at handling longer, more complex reasoning chains. However, it does struggle somewhat with complex math problems.

Gemini Pro is also more powerful than the previous models used to power Google’s apps. It can process up to 700,000 words and 30,000 lines of code. Plus, it can analyze up to 11 hours of audio or an hour of video in various languages.

Aside from powering the new “Gemini chatbot,” Gemini Pro is also available via an API in Vertex AI, where developers can customize the system for specific contexts and use cases.

Gemini Pro is also available in AI Studio, where users can find tools for building chat prompts using the LLM.

What is Google Gemini Ultra?

Google Gemini Ultra is the most advanced of Google’s LLM models. The multimodal solution can help with everything from physics homework to identifying scientific formulas. It also supports image generation, similar to tools like Midjourney. However, this feature hasn’t yet found its way into the Gemini apps available to consumers.

Users can access Gemini Ultra through the Gemini Advanced chatbot (previously Bard Advanced). This requires a subscription to the Google One AI Premium plan, priced at $19.99 monthly (with a two-month free trial).

How Powerful are the Google Gemini Models?

A follow-up question to “What is Google Gemini?” is, “How powerful is it?”

Ever since Google first announced the impending arrival of Gemini, analysts have been trying to predict just how powerful it could be. In the early stages of the model collection’s release, Google attempted to shed some light on Gemini’s potential with their “Gemini Technical Report.”

The AI team said they’ve been carefully testing their Gemini models for the last few months, evaluating their performance in various tasks. Although insights into the performance of Gemini Nano and Gemini Pro are limited, there’s plenty of data to suggest Ultra bulldoze LLM competitors.

With a score of around 90%, Gemini Ultra is the first solution capable of outperforming human experts in Massive Multitask Language Understanding (MMLU) tests. These tests use a combination of 57 different subjects, such as physics, math, history, and ethics, to examine real-world knowledge and problem-solving capabilities.

According to the team, Google’s new benchmark approach to MMLU means Gemini can use its reasoning abilities to “think more carefully” before it answers questions.

Gemini Ultra also achieved a state-of-the-art score of 59.4% on the new MMMU benchmark. This benchmark looks at the performance of LLMs on multimodal tasks that require deliberate reasoning.

Google says Gemini Ultra outperformed other leading models without assistance from object character recognition, highlighting the native multimodal capabilities of the solution.

This doesn’t necessarily mean Google Gemini won’t suffer from the same issues other language models face, such as AI hallucination. Even the best generative AI models can respond problematically when prompted in specific ways.

Is Gemini Better than GPT?

As demand for generative AI solutions and LLM models grows, Google has plenty of competition in the current market. Tons of up-and-coming models could outperform Gemini, particularly if they continue to evolve, like Falcon 180B.

However, many tech enthusiasts are only interested in answering one question: “Is it better than GPT-4?” GPT-4, OpenAI’s multimodal large language model, is pretty much the benchmark all developers are using to assess the potential of new LLMs.

Fortunately, Google has made comparing the performance of Gemini and GPT-4 pretty simple, with a simple graph you can find here. According to Google, GPT-4 only outperforms Gemini in one area, called “HellaSwag reasoning.” That’s the commonsense reasoning used for everyday tasks.

GPT-4 scored 95.3% in this area, compared to Gemini’s 87.8%.

In every other area, Gemini Ultra came out on top. Here’s a quick insight into the “text” stats:

Capability Benchmark Gemini Ultra GPT-4
General MMLU (Representation of various questions in 57 subjects) 90.0% 86.4%
Reasoning Big-Bench Hard (Challenging tasks requiring multi-step reasoning)

DROP (Reading comprehension)

83.6%

 

82.4%

83.1%

 

80.9%

Math GSM8K (Basic arithmetic manipulation)

MATH (Challenging math problems)

94.4%

 

53.2%

92.0%

 

52.9%

Code HumanEval (Python code generation)

Natural2Code (Python code generation)

74.4%

 

74.9%

67.0%

 

73.9%

While these stats only show us the power of Gemini Ultra, it’s worth noting that Google also found Gemini (in general) outperforms GPT-4 in every multimodal task. Remember, GPT-4 might be multimodal but can only process images and text.

Gemini, on the other hand, can process video, audio, images, and text. As Google continues to train its toolkit, it could significantly surpass the performance of various other models.

Is Google Gemini Safe? Ethics and Security

As LLM and generative AI models continue to develop, so do concerns about their safety. Like most market leaders, Google has a set of specific “AI principles” to ensure its technology is safe, ethical, and secure for users.

Gemini has some of the most comprehensive safety systems in place of any of Google’s AI models. The company is carefully analyzing the technology for evidence of bias and toxicity. Plus, they’ve conducted research into risk areas like persuasion and autonomy.

Google is working with a diverse selection of experts to stress-test its models going forward. Additionally, they’re using benchmarks like “Real Toxicity Prompts” to diagnose content safety problems during Gemini’s training phases.

Google has built dedicated safety classifiers to identify content involving stereotypes or violence to further limit potential harm. The team also says they’re continuing to work on known challenges like attribution, grounding, and corroboration.

What is Google Gemini? The Gemini Apps

Here’s why the answer to “What is Google Gemini” is so complex today. Google is using the same name for its large language models for the title of its generative AI apps. After updating the “Bard” ecosystem with Gemini Pro, the team decided to rename Bard as Gemini.

This has obviously created a little confusion among consumers. However, the Gemini apps on the web and mobile aren’t the same as the Gemini models. They’re just an interface through which you can access these models. Think of it like ChatGPT is an interface for interacting with GPT 3.5, or GPT-4.

Different versions of the core “Gemini” apps are available. The first basic version, accessible through the web and mobile, is just “Gemini.” It’s the app that’s available for free and powered by Gemini Pro. The alternative is Gemini Advanced, powered by Gemini Ultra.

This is the upgraded version of the Gemini chatbot, available to users through the Google One AI Premium plan for $19.99 monthly.

The Google One Premium plan combines the latest in Google’s Generative AI-powered applications with the features of the Google One Premium plan (like 2TB of cloud storage). It also gives users access to Gemini in Gmail, Docs, Slides, and Sheets, replacing Google Duet.

Outside of Google Gemini and Gemini Advanced, the company is also infusing its Gemini models into various other applications, such as:

  • AlphaCode 2: A code generation tool using a custom version of Gemini Pro.
  • Android 14: The latest Android OS, enabling developers to access Gemini Nano.
  • Vertex AI: Google’s service for developers building AI applications.
  • Google AI Studio: A prototyping and building tool for intelligent applications.
  • Google Search: Google is experimenting with adding Gemini to its search engine.

How to Access Google Gemini

Google is rolling the first versions of its Gemini LLM models out into various products and platforms. Gemini Pro is available in 40 languages and 230 countries and territories through multiple apps and services.

Gemini Ultra, available through Gemini Advanced, is only available in 150 countries and in English, though Google plans to expand very soon.

The easiest way to experiment with Gemini is through the “Gemini” app (previously Bard) on your desktop or smartphone.

Of course, you’ll need to upgrade to Gemini Advanced to explore the benefits of Gemini Ultra. Elsewhere, developers can access Gemini Pro and Ultra through Vertex AI via an API and in Google AI Studio.

Alternatively, if you want a pared down experience of Gemini, you can experiment with the “Nano” version of the model on a Google Pixel Pro, or play with the model as a developer here.

In the months ahead, we can expect to see more examples of Gemini rolling out into everything Google offers, from the Google Workplace Suite to Google Search. Already, Google says Gemini is making search experiences faster for users, with a 40% reduction in latency.

Looking to the Future with Google Gemini

Though understanding the whole “Gemini ecosystem” can be a little complex, this new AI revolution marks a significant step forward in Google’s journey. According to Google, Gemini represents the start of a new era in LLM development.

Already, the company is planning on introducing new ways to make the Gemini models more compelling, and outshine competitors like OpenAI and Microsoft. It will be interesting to see just what Gemini accomplishes for Google.

FAQs:

What is Google Gemini used for?

Since Google Gemini’s models are multimodal, they can perform many tasks, from transcribing and translating speech to captioning images and generating artwork. However, the functionality of Gemini will vary depending on the model you use.

Is Google Gemini free to use?

The basic versions of Google Gemini are free to use. You can access Gemini Nano on a Pixel 8 phone or Gemini Pro in the Gemini chat app, Google AI Studio, and Vertex. Gemini Ultra is available within Gemini Advanced through the Google One AI Premium plan for $19.99 monthly.

What are the disadvantages of Gemini AI?

Currently, Google says Gemini AI can struggle with specific tasks, like precise object location in images or processing data from long videos. It’s not suitable for medical use and may be less accurate in understanding some mathematical equations.

 



from UC Today https://ift.tt/yJdPUb7