Go to previous page

Google I/O 2024: The Future of AI is Private (and Local)

Google I/O 2024
#
AI
#
Google
Sanskar Agrawal
Sanskar
Android Engineer
May 22, 2024

At the end of this year’s I/O keynote, Google’s CEO stepped on stage and told the audience that they ran the entire presentation through its Gemini model to count how many times the term AI was mentioned throughout the almost 2-hour runtime. The answer? More than 120 times. So clearly, this year was just as focused on AI as the last year. In this article, we’ll unpack the major changes for developers, centered on AI or otherwise.

<aside>💡 This article focuses strictly on what’s new for developers and NOT on what’s new for end-users. For end-user announcements, we recommend this TechCrunch article. It also does not cover all the newly launched products like Gemini Flash, but rather what we think would end up having the most impact on developers.

</aside>

Google AI Edge

I was happy when Google announced that Gemini Nano was running on Pixel 8 Pro in December last year. Private models running completely locally on a device is an excellent direction for responsible AI, and was glad to see Google announcing AI Edge at I/O. During their announcement back in December, Google mentioned that Gemini Nano will be accessible on supported devices via a system service called AICore (like ARCore), but the new MediaPipe LLM inference API makes other models easily accessible too, even if you don’t have AICore on your device (yet).

Google AI Edge works on both Android and the Web.

<aside>💡 Smaller foundational models like Gemini Nano running on the client device need to be fine-tuned through Low-Rank Adaptation (LoRA) to yield accurate results. If you’re using MediaPipe, you can deliver your own fine-tuned model to edge devices.

</aside>

Using on-device models gives developers the advantage of knowing that their app and their user’s data never leave the phone, and they don’t have to worry about the costs of running the model on their servers or an existing API. On-device models can also be extremely fast due to their small size but at the cost of accuracy. With proper fine-tuning though, developers stand to gain a lot from foundational models running on client devices.

Gemini Nano

Gemini Nano is a multimodal LLM. Multimodal means that it can take both text and image as input. It’s a smaller LLM designed to be run locally on the device and is currently available for Pixel 8 Pro (soon coming to Pixel 8 too) and the Samsung Galaxy S24 series. It will also be available for all Chrome users, version 126 onwards.

Nano has two variants, one with 1.8 billion parameters, and one with 3.25. The context window is currently unknown, but likely to be smaller than the server models owing to its on-device nature. For comparison, gemma-2b and gemma-7b, both open-source models by Google intended for local use, have a context window of 8192 tokens.

MediaPipe

MediaPipe is a suite of tools that lets you run models on your device. The name may suggest that it’s specialized for tasks involving media, but it lets you handle mediums like text too. It has solutions built in for tasks like object detection, text classification, and segmentation, and supports several models, allowing you to even plug in your own models when needed. But, MediaPipe has been around for a while and isn’t exactly a new launch. The thing of interest here is due to Google’s increased push for on-device AI, and with that in mind, they have launched the new LLM Inference API. Out of the box, it currently supports Google’s own Gemma-2b and Gemma-7b, along with some external models. It’s possible to also export a PyTorch model to the framework and deploy it on-device. Through MediaPipe, we ran Gemini-2b (which occupied about 1.3 GB on disk) on a Pixel 6a. It didn’t perform too well, but that’s to be expected without LoRA fine-tuning. It’s also not optimized for general queries or chatting, but rather for tasks like classification, so the responses were in line with the expectations and were fast.

MediaPipe LLM Inference

MediaPipe is available on Android, as well as the Web, where it uses the <span class="text-color-code">WebGPU</span> API to run inference on GPU-based models. Check out the studio here! The current big hurdle with using LLMs through Mediapipe is the size. Generally ranging between 1-2 GBs, LLMs can’t be shipped with apps directly, and instead need to be downloaded or accessed in some other way post-install or render. This is why foundational models with just LoRA tuning seem more promising than every application running its own local LLM.

Android

Android 15 is in beta 2, and ships with many changes for developers. The changes that we found significant were:

Default Edge to Edge

Edge-to-edge UI is enabled by default. Apps will draw into the display cutouts by default, so developers will have to ensure they’re using the right inset settings so that their apps look good across devices. Most Material 3 components (<span class="text-color-code">TopAppBar</span>, <span class="text-color-code">NavigationBar</span>) are built to automatically handle inserts, while Material 2 components are not.

Edge-to-edge UI
From https://www.youtube.com/watch?v=_yWxUp86TGg

Camera Changes

CameraX 1.4 is in beta and supports UltraHDR and in-preview camera effects, as well as stabilization. Android 15 also debuts a new low-light boost that’s applied in real-time in previews. It may reduce frame rate and stability since the apparatus is performing more work but promises better low-light photos.

There’s also new support for a viewfinder composable. Earlier, you needed to use the viewfinder built for the Android View System and wrap it in an <span class="text-color-code">AndroidView</span> composable. This made state updates, like toggling the flashlight difficult. With this new composable, it’s easier to use preview features and deal with lifecycle. No need to use the views anymore!

Compose: Better Performance

Compose compiled will now be bundled as a grade plugin with Kotlin releases starting with K2, meaning you don’t need to ensure compatibility between Compose and Kotlin anymore.

Strong skipping mode will be enabled by default soon. What this means is that Compose will skip even more composables during recompositions, and there’ll be far less need for annotating parameters with the <span class="text-color-code">@Stable</span> annotation. This is poised to lead to 20% faster recompositions.

Due to new pre-fetching improvements and changes in the slot table, the first draw time compared to the January Compose BOM is supposed to go down by 17%!

<aside>💡 Developers don’t need to take any action for either the strong skipping mode or the first draw time improvements. These will be enabled automatically in the future.

</aside>

There’s also support for HTML and links built into the <span class="text-color-code">Text</span> composable by now. <span class="text-color-code">TextField</span>s will also be able to accept rich content like images and audio directly without additional code.

Kotlin Multiplatform

Kotlin has been the way to go when building Android apps for a while, and due to its multi-paradigm and developer-friendly nature, developer productivity has seen an uptick in the Android ecosystem. Along with their long-term commitment to the language itself, Google has also announced official support for Kotlin Multiplatform, a way to build apps that work across Android, iOS, Desktop, and Web, all using Kotlin.

Kotlin Multiplatform at first glance may seem similar to other tools like Flutter or React Native, but is much more nuanced than that, letting you share anywhere between 1-100% of your Kotlin code across platforms, meaning you get to decide what parts to share, and what to keep native.

The current recommendation is to use it for sharing business logic, encapsulated in features like ViewModel and Database, while Compose Multiplatform is being developed to share UI (it’s in Alpha for iOS, with rapid progress being made). To that end, Jetpack ****libraries like ViewModel, DataStore, and Room (in alpha) now work with Kotlin Multiplatform.

Google also announced the migration of Google Docs, one of their most important apps to Kotlin Multiplatform across Android, iOS, and Web, signaling their long-term investment into the framework being developed by them and JetBrains.

<aside>💡 Google now officially supports two multiplatform tools, Flutter and KMP, and this has developers questioning whether Flutter support can be abandoned in the future. This is unlikely, since UI sharing using KMP (Compose Multiplatform) is still a work in progress, especially on iOS, and Google has a significant number of apps that use Flutter.

</aside>

Web

Google owns and builds the most used web browser in the world, and has largely been, along with Mozilla, leading torchbearers of web innovations. While some changes like the Manifest V3 have been poorly received in the general web development community, others have been more positive. Here’s a rundown of the features we found interesting:

Gemini Nano integration in Chrome

Chrome 126 onwards, Gemini Nano will be bundled with the browser, meaning web developers can directly use the foundational model if their users are on the Chrome browser. Just like Android, Web developers can adjust the model’s weight by LoRA fine-tuning, making it more useful for their use case. Read more here.

Gemini will also be available in Chrome DevTools, suggesting possible solutions to issues that pop up in the console.

Speculation API

Chrome can now pre-fetch some pages based on confidence. It’s supposed to enable instantaneous navigation, bringing web apps one step closer to mobile.

In this demo, the speed is reduced by ~20x, marking a significant performance gain. Chrome will ship with a new Speculation API, that will let developers customize what they want to pre-fetch on the browser side, preventing some potentially intrusive behaviors. The confidence generated for fetches will also be available at <span class="text-color-code"> chrome://predictions </span>.

prerender pages
From: https://developer.chrome.com/docs/web-platform/prerender-pages

Multi-page application transitions

Chrome has introduced view transition on the same document, which would work on SPA (single-player applications), typically built using a library like React. However, when you navigate to a different page in the web app, the page is loaded from the network and there’s a jarring transition between the current and the next page sometimes. Chrome 126 is shipping a new cross-document view transition, which will trigger when the user navigates to a new page in the app. Combined with the new pre-fetching API, this may bring web apps a lot closer to the native feel that Google has been aiming towards.

Developers need to make a single change to the CSS for both the current and next page to have the transition trigger:

	
Copy
@view-transition {   navigation: auto; }

Conclusion

Google has tough competition in the AI space from the likes of Microsoft, OpenAI, Anthropic, and Meta. Generative tools like ChatGPT threaten its core business model, Search, and Google is trying to counter its influence by adapting to the change. The focus on on-device AI is evident, and welcome. It was also nice to see that on digging beyond the keynotes, new changes have been introduced that don’t focus so much on AI, but rather on user experience and developer productivity. We’re excited to make the best use of the new tools that have been introduced, and we hope you do too!

Recommended Posts

Open new blog
Ruby vs. Javascript
#
Ruby
#
Javascript

Ruby vs JavaScript: Comparing Two Dynamic Languages for Web Development

Bharat
September 26, 2023
Open new blog
#
IoT
#
sports-tech

Game-Changing Technology: Unleashing the Power of IoT in Sports Innovation

Sohini
July 14, 2023
Open new blog
Github Copilotx
#
AI
#
technology

GitHub Copilot X: Taking developer productivity to the next level

Sanskar
June 29, 2023