Go to previous page

Gemini: A guide to Google’s free AI LLM offering in 2024

Google Gemini
#
AI
#
LLM
Sanskar Agrawal
Sanskar
Android Engineer
February 26, 2024

Gemini is a free-to-use large language model introduced by Google. Large Language Models (LLM) are neural networks that are used to generate general-purpose text and can help you write everything from emails to code. Lately, LLMs are being increasingly used everywhere in the digital world, from taking the place of virtual assistants on websites like Amazon to providing summarized search results on Google. They fall under the category of Generative AI, which is a class of AI programs that generate a result such as an image, video, or text based on an input (typically called a prompt).

In this article, we’ll look at precursors and how you can access Google’s free LLM.

Little Bit of History

Let’s look a little into how LLMs came to be. In 2017, researchers at Google Brain released a seminal paper called Attention is All you Need. The paper was extremely significant and introduced a new neural network architecture called Transformers. They allowed much faster training times, the generation of long-range sequences (such as paragraphs) that made sense, and were much more accessible to researchers.

This spurred work on a GPT (Generative Pre-Trained) model at OpenAI, which was capable of generating large amounts of coherent text from a simple prompt like “Write a mail to X company to inquire about Y product”.

Fast forward to 2020, OpenAI released their GPT-3 model. Microsoft was extremely impressed by its capability and considered it the next big thing. They invested a significant amount into OpenAI and gained exclusive rights to the model. They later built products on top of it, like GitHub Copilot.

Google also got into competition and released their LaMDA model at I/O 2021. A Google Engineer claimed that the model had gained sentience, a claim that was largely debunked by the research community. They also released an improved version of the model in 2022 and made it available to developers for use in their applications.

Later in 2022, OpenAI released ChatGPT, a tool that allowed users to converse with a fine-tuned GPT-3.5 model. It reached 100 Million users in less than 2 months, becoming the fastest-growing consumer software application in history. Google had to introduce a response since ChatGPT threatened their most lucrative product: Search.

ChatGPT prompt
ChatGPT’s interface with GPT-3.5, From: https://openai.com/chatgpt

Bard

Bard was launched in March 2023. It was supposed to be Google’s answer to the rising dominance of ChatGPT and offered a similar conversational experience. It initially ran Google’s LaMDA model, which didn’t quite match up to GPT 3.5. However, it did have access to the latest events and data thanks to Google’s enormous datasets and constant updates. The reaction was initially mixed.

Meanwhile, OpenAI servers by getting stampeded by the millions of requests that ChatGPT users were sending, and lost $540 Million in 2022 alone. LLMs require a significant amount of computing power, and OpenAI, a research company, wasn’t equipped to handle a consumer product of this scale. They slowed down the free ChatGPT version and introduced a paid $20 subscription to speed up the responses. Some time later, they also made their latest model, GPT-4 available to the paid users. Microsoft had introduced its own free GPT-4 experience, initially called Bing Chat and the Copilot, for free.

Bard’s Interface, From: https://blog.google/technology/ai/bard-google-ai-search-updates/

AI Wars

ChatGPT had taken the world by storm, and Microsoft and OpenAI formed a partnership to challenge Google’s AI superiority, Google countered with a heavy focus on AI at I/O 2023. Duet AI was integrated into Google workspace products like Gmail and Google Docs, while other Vertex AI consolidated different branches of Google’s AI offering developers on its cloud.

Google updated Bard to use PaLM 2, Google’s new foundational LLM. PaLM 2 performed significantly better, but still not quite as good as GPT-3.5’s latest version. They simultaneously announced a much better model called Gemini that was still being internally developed and positioned it as the true challenger to GPT.

Later in the year, Google merged their other AI effort, Google Brain, into Deepmind, their subsidiary that had created such products as AlphaGo. Bard gained the ability to access Gmail, YouTube, Maps, and other services in the form of extensions, giving it an advantage over ChatGPT in information retrieval. Both companies constantly launched features to try and one-up the other to become the de-facto player in the AI space.

Google also introduced Generative AI into its most important product: Search, as an experiment. It generates a summary of information based on the search results, largely running the PaLM 2 model.

Search Generative Experience, From: https://blog.google/technology/ai/bard-google-ai-search-updates/

Gemini: Launch

Gemini was heavily anticipated before launch. It was designed to be multi-modal from scratch, meaning it could process inputs beyond text, such as images, audio, and video. OpenAI hastily released multimodal capabilities in GPT-4 in the run-up to the Gemini launch.

Google’s co-founder, Sergey Brin came back to an active role at the company to work on the model, and it was trained on the extremely massive amount of data Google possesses (including YouTube videos).

On December 6, Google launched Gemini 1.0 to the wider public with text and image input in three variants:

  1. Nano: On-Device LLM, capable of running locally on phones like the Pixel 8 Pro, providing private inference
  2. Pro: Designed to outperform GPT-3.5, suitable for generation and coherence, but not for analytical and logical tasks
  3. Ultra: The most powerful model, capable of performing advanced reasoning, logic, and analysis. Google reported it to have surpassed GPT-4, Claude 2, and every other large-scale general-purpose model then available on a variety of benchmarks. It wasn’t made available to the public in 2024 to ensure it aligns with Google’s guardrails.

All variants of Gemini 1.0 support a context window of 32k tokens, meaning it could input about 50 pages of text.

Gemini Benchmarks, From: https://blog.google/technology/ai/google-gemini-ai

Gemini Era

Bard was updated to use Gemini Pro, and later entirely rebranded to Gemini, now available at gemini.google.com. The standard version is free to use and has seemingly no limits, which makes sense given Google’s enormous investments into their custom AI processing hardware, called TPUs. To use Gemini Pro, all you have to do is head over to gemini.google.com, and assuming that you’re already signed in with your Google account (who isn’t?), type a message and hit send. Gemini has stricter guardrails than GPT and other models, but that tracks since Google has a certain reputation to protect. Gemini also features image generation through the Imagen model, and the results are quite astonishing. Image Generation has currently been paused due to misalignment, but is expected to resume once Google fixes the issues.

Any and all data you provide to Gemini is stored by Google for further training and evaluation by Human reviewers, so be wary of sharing any private information that you may not want to expose.

Other major giants such as Microsoft and Amazon have to rely on GPU manufacturers like nVidia to provide AI processing hardware, significantly slowing their growth and processing capabilities. Both have unveiled their plans to make custom chips, but this is an area where Google has already won, since they have clusters of their extremely powerful TPUs deployed in every major geographical locale on the planet, positioning them to provide the most reliable and fastest inference to their consumers.

Google provides access to Gemini 1.0 Advanced through a pricing model similar to OpenAI’s, at $20/month. However, it offers additional features such as Gemini integration in Workspace products like Sheets and 2 TB of storage across its products, and the plan is completely free for evaluation for the first 2 months.

Developer Advantage

Since Gemini’s launch, Google has provided a free evaluation tier, where software developers can easily generate API keys and make up to 60 requests per minute without any charges at all.

For production, the Gemini Pro model costs $0.00025 per 1000 input characters, which at the time of release, was an order of magnitude cheaper than GPT-3.5. The pricing, coupled with faster response times and better availability means that Gemini is a better contender for developers than OpenAI or Microsoft with their GPT models.

Gemini Pro Pricing, From: https://blog.google/technology/ai/gemini-api-developers-cloud/

Google also followed up the model release with SDKs for every major technology including Android, Flutter, and Web, capitalizing on developer experience to increase the adoption of Gemini across external products. Google already makes the most used operating system in the world (Android) and is actively pushing for Gemini’s integration into it. Developers can test Gemini on Google AI Studio.

Google AI Studio, From: https://ai.google.dev/tutorials/ai-studio_quickstart

Google Assistant

Google Assistant was recently replaced by Gemini on Android devices. While it seems that it still can’t perform many tasks that the Assistant could do (such as setting reminders), updates are being made at a rapid pace, and this is the first major evolution of the Google Assistant since its launch.

I asked Gemini, on my Pixel 8, to find a burger place to eat at with a couple of friends in Tottenham Court Road on Friday. I also wanted it to recommend a few pubs, bars and maybe a late-night spot to visit after the meal. It did both and, knowing the area well, the recommendations were solid. It saved some planning time.- Janhoi McGregor, Forbes

Gemini 1.5: The 800-pound Gorilla dancing

Soon after the release of Gemini Ultra model accessible through Gemini Advanced app with a Google One AI subscription (they could’ve thought a tad bit more about the naming), Google dropped a major upgrade to the model, called Gemini 1.5, with currently just the Pro variant. It represents a major step up from the previous models in the LLM space, offering a context window of 1 Million tokens (currently being scaled to 10M).

Gemini 1.5 Pro, From: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

Gemini 1.5 Pro is equivalent to Gemini 1.0 Ultra in reasoning and other capabilities while consuming less computing power. It used a Mixture of Experts architecture, which through gross oversimplification can be thought of as multiple expert models being fused in one model.

Another significant factor is its ability to input Videos, being the first model of its kind to do so. It can do so because of its enormous context window, and numerous experiments with it have left users stunned so far by its capabilities. Gemini 1.0 was an attempt to position Google in the same space as OpenAI and Microsoft, but Gemini 1.5 cemented its status as the front-runner, and the Ultra variant hasn’t even been released yet.

It’s available in a limited preview and can be evaluated through Google AI Studio if you have received access.

Conclusion

Since the release of the 2017 Transformers Paper by Google, AI development has accelerated significantly. Companies like OpenAI have become major players, eventually wanting to create Artificial General Intelligence through partnerships with major players like Microsoft and a ton of money. Google, despite being the origin of the seminal paper, didn't quite build on top of it like OpenAI, perhaps because of fears of not disturbing their biggest revenue source: Advertising. Since ChatGPT’s growth, Google has increasingly faced pressure from stakeholders to provide a better alternative. Large number of Open Source models have also come forward. In February 2023, Microsoft’s CEO, Satya Nadella remarked that he wanted to see the 800-pound Gorilla in search space dance. I guess this is Google finally doing something about it, and we should take advantage of it.

Recommended Posts

Open new blog
Ruby vs. Javascript
#
Ruby
#
Javascript

Ruby vs JavaScript: Comparing Two Dynamic Languages for Web Development

Bharat
September 26, 2023
Open new blog
figma user flow
#
Figma
#
Design

Navigating Seamless Design: A Deep Dive into Figma User Flows

Ishpreet
January 9, 2024
Open new blog
android 14
#
Android
#
technology

14 Changes in Android 14: Vital Insights That Android Developers Need to Know

Harsh
October 30, 2023