The Vertex AI plugin provides interfaces to several AI services:
- Google generative AI models:
- Gemini text generation
- Imagen2 and Imagen3 image generation
- Text embedding generation
- A subset of evaluation metrics through the Vertex AI Rapid Evaluation API:
- Vector Search
Installation
npm i --save @genkit-ai/vertexai
If you want to locally run flows that use this plugin, you also need the Google Cloud CLI tool installed.
Configuration
To use this plugin, specify it when you initialize Genkit:
import { genkit } from 'genkit';
import { vertexAI } from '@genkit-ai/vertexai';
const ai = genkit({
plugins: [
vertexAI({ location: 'us-central1' }),
],
});
The plugin requires you to specify your Google Cloud project ID, the region to which you want to make Vertex API requests, and your Google Cloud project credentials.
- You can specify your Google Cloud project ID either by setting
projectId
in thevertexAI()
configuration or by setting theGCLOUD_PROJECT
environment variable. If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on),GCLOUD_PROJECT
is automatically set to the project ID of the environment. - You can specify the API location either by setting
location
in thevertexAI()
configuration or by setting theGCLOUD_LOCATION
environment variable. To provide API credentials, you need to set up Google Cloud Application Default Credentials.
To specify your credentials:
- If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
On your local dev environment, do this by running:
gcloud auth application-default login
For other environments, see the Application Default Credentials docs.
In addition, make sure the account is granted the Vertex AI User IAM role (
roles/aiplatform.user
). See the Vertex AI access control docs.
Usage
Generative AI Models
This plugin statically exports references to its supported generative AI models:
import { gemini15Flash, gemini15Pro, imagen3 } from '@genkit-ai/vertexai';
You can use these references to specify which model ai.generate()
uses:
const ai = genkit({
plugins: [vertexAI({ location: 'us-central1' })],
});
const llmResponse = await ai.generate({
model: gemini15Flash,
prompt: 'What should I do when I visit Melbourne?',
});
This plugin also supports grounding Gemini text responses using Google Search or your own data.
Example:
const ai = genkit({
plugins: [vertexAI({ location: 'us-central1' })],
});
await ai.generate({
model: gemini15Flash,
prompt: '...',
config: {
googleSearchRetrieval: {
disableAttribution: true,
}
vertexRetrieval: {
datastore: {
projectId: 'your-cloud-project',
location: 'us-central1',
collection: 'your-collection',
},
disableAttribution: true,
}
}
})
This plugin also statically exports a reference to the Gecko text embedding model:
import { textEmbedding004 } from '@genkit-ai/vertexai';
You can use this reference to specify which embedder an indexer or retriever uses. For example, if you use Chroma DB:
const ai = genkit({
plugins: [
chroma([
{
embedder: textEmbedding004,
collectionName: 'my-collection',
},
]),
],
});
Or you can generate an embedding directly:
const ai = genkit({
plugins: [vertexAI({ location: 'us-central1' })],
});
const embedding = await ai.embed({
embedder: textEmbedding004,
content: 'How many widgets do you have in stock?',
});
Imagen3 model allows generating images from user prompt:
import { imagen3 } from '@genkit-ai/vertexai';
const ai = genkit({
plugins: [vertexAI({ location: 'us-central1' })],
});
const response = await ai.generate({
model: imagen3,
output: { format: 'media' },
prompt: 'a banana riding a bicycle',
});
return response.media();
and even advanced editing of existing images:
const ai = genkit({
plugins: [vertexAI({ location: 'us-central1' })],
});
const baseImg = fs.readFileSync('base.png', { encoding: 'base64' });
const maskImg = fs.readFileSync('mask.png', { encoding: 'base64' });
const response = await ai.generate({
model: imagen3,
output: { format: 'media' },
prompt: [
{ media: { url: `data:image/png;base64,${baseImg}` }},
{
media: { url: `data:image/png;base64,${maskImg}` },
metadata: { type: 'mask' },
},
{ text: 'replace the background with foo bar baz' },
],
config: {
editConfig: {
editMode: 'outpainting',
},
},
});
return response.media();
Refer to Imagen model documentation for more detailed options.
Anthropic Claude 3 on Vertex AI Model Garden
If you have access to Claude 3 models (haiku, sonnet or opus) in Vertex AI Model Garden you can use them with Genkit.
Here's sample configuration for enabling Vertex AI Model Garden models:
import { genkit } from 'genkit';
import {
claude3Haiku,
claude3Sonnet,
claude3Opus,
vertexAIModelGarden,
} from '@genkit-ai/vertexai/modelgarden';
const ai = genkit({
plugins: [
vertexAIModelGarden({
location: 'us-central1',
models: [claude3Haiku, claude3Sonnet, claude3Opus],
}),
],
});
Then use them as regular models:
const llmResponse = await ai.generate({
model: claude3Sonnet,
prompt: 'What should I do when I visit Melbourne?',
});
Llama 3.1 405b on Vertex AI Model Garden
First you'll need to enable Llama 3.1 API Service in Vertex AI Model Garden.
Here's sample configuration for Llama 3.1 405b in Vertex AI plugin:
import { genkit } from 'genkit';
import { llama31, vertexAIModelGarden } from '@genkit-ai/vertexai/modelgarden';
const ai = genkit({
plugins: [
vertexAIModelGarden({
location: 'us-central1',
models: [llama31],
}),
],
});
Then use it as regular models:
const llmResponse = await ai.generate({
model: llama31,
prompt: 'Write a function that adds two numbers together',
});
Mistral Models on Vertex AI Model Garden
If you have access to Mistral models (Mistral Large, Mistral Nemo, or Codestral) in Vertex AI Model Garden, you can use them with Genkit.
Here's sample configuration for enabling Vertex AI Model Garden models:
import { genkit } from 'genkit';
import {
mistralLarge,
mistralNemo,
codestral,
vertexAIModelGarden,
} from '@genkit-ai/vertexai/modelgarden';
const ai = genkit({
plugins: [
vertexAIModelGarden({
location: 'us-central1',
models: [mistralLarge, mistralNemo, codestral],
}),
],
});
Then use them as regular models:
const llmResponse = await ai.generate({
model: mistralLarge,
prompt: 'Write a function that adds two numbers together',
config: {
version: 'mistral-large-2411', // Optional: specify model version
temperature: 0.7, // Optional: control randomness (0-1)
maxOutputTokens: 1024, // Optional: limit response length
topP: 0.9, // Optional: nucleus sampling parameter
stopSequences: ['###'], // Optional: stop generation at sequences
}
});
The models support:
- mistralLarge
: Latest Mistral large model with function calling capabilities
- mistralNemo
: Optimized for efficiency and speed
- codestral
: Specialized for code generation tasks
Each model supports streaming responses and function calling:
const response = await ai.generateStream({
model: mistralLarge,
prompt: 'What should I cook tonight?',
tools: ['recipe-finder'],
config: {
version: 'mistral-large-2411',
temperature: 1,
},
});
for await (const chunk of response.stream) {
console.log(chunk.text);
}
Evaluators
To use the evaluators from Vertex AI Rapid Evaluation, add an evaluation
block to your vertexAI
plugin configuration.
import { genkit } from 'genkit';
import {
vertexAIEvaluation,
VertexAIEvaluationMetricType,
} from '@genkit-ai/vertexai/evaluation';
const ai = genkit({
plugins: [
vertexAIEvaluation({
location: 'us-central1',
metrics: [
VertexAIEvaluationMetricType.SAFETY,
{
type: VertexAIEvaluationMetricType.ROUGE,
metricSpec: {
rougeType: 'rougeLsum',
},
},
],
}),
],
});
The configuration above adds evaluators for the Safety
and ROUGE
metrics. The example shows two approaches- the Safety
metric uses the default specification, whereas the ROUGE
metric provides a customized specification that sets the rouge type to rougeLsum
.
Both evaluators can be run using the genkit eval:run
command with a compatible dataset: that is, a dataset with output
and reference
fields. The Safety
evaluator can also be run using the genkit eval:flow -e vertexai/safety
command since it only requires an output
.
Indexers and retrievers
The Genkit Vertex AI plugin includes indexer and retriever implementations backed by the Vertex AI Vector Search service.
(See the Retrieval-augmented generation page to learn how indexers and retrievers are used in a RAG implementation.)
The Vertex AI Vector Search service is a document index that works alongside the document store of your choice: the document store contains the content of documents, and the Vertex AI Vector Search index contains, for each document, its vector embedding and a reference to the document in the document store. After your documents are indexed by the Vertex AI Vector Search service, it can respond to search queries, producing lists of indexes into your document store.
The indexer and retriever implementations provided by the Vertex AI plugin use either Cloud Firestore or BigQuery as the document store. The plugin also includes interfaces you can implement to support other document stores.
To use Vertex AI Vector Search:
- Choose an embedding model. This model is responsible for creating vector embeddings from text. Advanced users might use an embedding model optimized for their particular data sets, but for most users, Vertex AI's
text-embedding-004
model is a good choice for English text and thetext-multilingual-embedding-002
model is good for multilingual text. In the Vector Search section of the Google Cloud console, create a new index. The most important settings are:
- Dimensions: Specify the dimensionality of the vectors produced by your chosen embedding model. The
text-embedding-004
andtext-multilingual-embedding-002
models produce vectors of 768 dimensions. - Update method: Select streaming updates.
After you create the index, deploy it to a standard (public) endpoint.
- Dimensions: Specify the dimensionality of the vectors produced by your chosen embedding model. The
Get a document indexer and retriever for the document store you want to use:
Cloud Firestore
import { getFirestoreDocumentIndexer, getFirestoreDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch'; import { initializeApp } from 'firebase-admin/app'; import { getFirestore } from 'firebase-admin/firestore'; initializeApp({ projectId: PROJECT_ID }); const db = getFirestore(); const firestoreDocumentRetriever = getFirestoreDocumentRetriever(db, FIRESTORE_COLLECTION); const firestoreDocumentIndexer = getFirestoreDocumentIndexer(db, FIRESTORE_COLLECTION);
BigQuery
import { getBigQueryDocumentIndexer, getBigQueryDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch'; import { BigQuery } from '@google-cloud/bigquery'; const bq = new BigQuery({ projectId: PROJECT_ID }); const bigQueryDocumentRetriever = getBigQueryDocumentRetriever(bq, BIGQUERY_TABLE, BIGQUERY_DATASET); const bigQueryDocumentIndexer = getBigQueryDocumentIndexer(bq, BIGQUERY_TABLE, BIGQUERY_DATASET);
Other
To support other documents stores you can provide your own implementations of
DocumentRetriever
andDocumentIndexer
:const myDocumentRetriever = async (neighbors) => { // Return the documents referenced by `neighbors`. // ... } const myDocumentIndexer = async (documents) => { // Add `documents` to storage. // ... }
For an example, see Sample Vertex AI Plugin Retriever and Indexer with Local File.
Add a
vectorSearchOptions
block to yourvertexAI
plugin configuration:import { genkit } from 'genkit'; import { textEmbedding004 } from '@genkit-ai/vertexai'; import { vertexAIVectorSearch } from '@genkit-ai/vertexai/vectorsearch'; const ai = genkit({ plugins: [ vertexAIVectorSearch({ projectId: PROJECT_ID, location: LOCATION, vectorSearchOptions: [ { indexId: VECTOR_SEARCH_INDEX_ID, indexEndpointId: VECTOR_SEARCH_INDEX_ENDPOINT_ID, deployedIndexId: VECTOR_SEARCH_DEPLOYED_INDEX_ID, publicDomainName: VECTOR_SEARCH_PUBLIC_DOMAIN_NAME, documentRetriever: firestoreDocumentRetriever, documentIndexer: firestoreDocumentIndexer, embedder: textEmbedding004, }, ], }), ], });
Provide the embedder you chose in the first step and the document indexer and retriever you created in the previous step.
To configure the plugin to use the Vector Search index you created earlier, you need to provide several values, which you can find in the Vector Search section of the Google Cloud console:
indexId
: listed on the Indexes tabindexEndpointId
: listed on the Index Endpoints tabdeployedIndexId
andpublicDomainName
: listed on the "Deployed index info" page, which you can open by clicking the name of the deployed index on either of the tabs mentioned earlier
Now that everything is configured, you can use the indexer and retriever in your Genkit application:
import { vertexAiIndexerRef, vertexAiRetrieverRef, } from '@genkit-ai/vertexai/vectorsearch'; // ... inside your flow function: await ai.index({ indexer: vertexAiIndexerRef({ indexId: VECTOR_SEARCH_INDEX_ID, }), documents, }); const res = await ai.retrieve({ retriever: vertexAiRetrieverRef({ indexId: VECTOR_SEARCH_INDEX_ID, }), query: queryDocument, });
See the code samples for:
Context Caching
The Vertex AI Genkit plugin supports Context Caching, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.
How to Use Context Caching
To enable context caching, ensure your model supports it. For example, gemini15Flash
and gemini15Pro
are models that support context caching, and you will have to specify version number 001
.
You can define a caching mechanism in your application like this:
const ai = genkit({
plugins: [googleAI()],
});
const llmResponse = await ai.generate({
messages: [
{
role: 'user',
content: [{ text: 'Here is the relevant text from War and Peace.' }],
},
{
role: 'model',
content: [
{
text: 'Based on War and Peace, here is some analysis of Pierre Bezukhov’s character.',
},
],
metadata: {
cache: {
ttlSeconds: 300, // Cache this message for 5 minutes
},
},
},
],
model: gemini15Flash,
prompt: 'Describe Pierre’s transformation throughout the novel.',
});
In this setup:
- messages
: Allows you to pass conversation history.
- metadata.cache.ttlSeconds
: Specifies the time-to-live (TTL) for caching a specific response.
Example: Leveraging Large Texts with Context
For applications referencing long documents, such as War and Peace or Lord of the Rings, you can structure your queries to reuse cached contexts:
const textContent = await fs.readFile('path/to/war_and_peace.txt', 'utf-8');
const llmResponse = await ai.generate({
messages: [
{
role: 'user',
content: [{ text: textContent }], // Include the large text as context
},
{
role: 'model',
content: [
{
text: 'This analysis is based on the provided text from War and Peace.',
},
],
metadata: {
cache: {
ttlSeconds: 300, // Cache the response to avoid reloading the full text
},
},
},
],
model: gemini15Flash,
prompt: 'Analyze the relationship between Pierre and Natasha.',
});
Benefits of Context Caching
- Improved Performance: Reduces the need for repeated processing of large inputs.
- Cost Efficiency: Decreases API usage for redundant data, optimizing token consumption.
- Better Latency: Speeds up response times for repeated or related queries.
Supported Models for Context Caching
Only specific models, such as gemini15Flash
and gemini15Pro
, support context caching, and currently only on version numbers 001
. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.
Further Reading
See more information regarding context caching on Vertex AI in their documentation.