Vertex AI plugin

The Vertex AI plugin provides interfaces to several AI services:

Installation

npm i --save @genkit-ai/vertexai

If you want to locally run flows that use this plugin, you also need the Google Cloud CLI tool installed.

Configuration

To use this plugin, specify it when you initialize Genkit:

import { genkit } from 'genkit';
import { vertexAI } from '@genkit-ai/vertexai';

const ai = genkit({
  plugins: [
    vertexAI({ location: 'us-central1' }),
  ],
});

The plugin requires you to specify your Google Cloud project ID, the region to which you want to make Vertex API requests, and your Google Cloud project credentials.

  • You can specify your Google Cloud project ID either by setting projectId in the vertexAI() configuration or by setting the GCLOUD_PROJECT environment variable. If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), GCLOUD_PROJECT is automatically set to the project ID of the environment.
  • You can specify the API location either by setting location in the vertexAI() configuration or by setting the GCLOUD_LOCATION environment variable.
  • To provide API credentials, you need to set up Google Cloud Application Default Credentials.

    1. To specify your credentials:

      • If you're running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
      • On your local dev environment, do this by running:

        gcloud auth application-default login
      • For other environments, see the Application Default Credentials docs.

    2. In addition, make sure the account is granted the Vertex AI User IAM role (roles/aiplatform.user). See the Vertex AI access control docs.

Usage

Generative AI Models

This plugin statically exports references to its supported generative AI models:

import { gemini15Flash, gemini15Pro, imagen3 } from '@genkit-ai/vertexai';

You can use these references to specify which model ai.generate() uses:

const ai = genkit({
  plugins: [vertexAI({ location: 'us-central1' })],
});

const llmResponse = await ai.generate({
  model: gemini15Flash,
  prompt: 'What should I do when I visit Melbourne?',
});

This plugin also supports grounding Gemini text responses using Google Search or your own data.

Example:

const ai = genkit({
  plugins: [vertexAI({ location: 'us-central1' })],
});

await ai.generate({
  model: gemini15Flash,
  prompt: '...',
  config: {
    googleSearchRetrieval: {
      disableAttribution: true,
    }
    vertexRetrieval: {
      datastore: {
        projectId: 'your-cloud-project',
        location: 'us-central1',
        collection: 'your-collection',
      },
      disableAttribution: true,
    }
  }
})

This plugin also statically exports a reference to the Gecko text embedding model:

import { textEmbedding004 } from '@genkit-ai/vertexai';

You can use this reference to specify which embedder an indexer or retriever uses. For example, if you use Chroma DB:

const ai = genkit({
  plugins: [
    chroma([
      {
        embedder: textEmbedding004,
        collectionName: 'my-collection',
      },
    ]),
  ],
});

Or you can generate an embedding directly:

const ai = genkit({
  plugins: [vertexAI({ location: 'us-central1' })],
});

const embedding = await ai.embed({
  embedder: textEmbedding004,
  content: 'How many widgets do you have in stock?',
});

Imagen3 model allows generating images from user prompt:

import { imagen3 } from '@genkit-ai/vertexai';

const ai = genkit({
  plugins: [vertexAI({ location: 'us-central1' })],
});

const response = await ai.generate({
  model: imagen3,
  output: { format: 'media' },
  prompt: 'a banana riding a bicycle',
});

return response.media();

and even advanced editing of existing images:

const ai = genkit({
  plugins: [vertexAI({ location: 'us-central1' })],
});

const baseImg = fs.readFileSync('base.png', { encoding: 'base64' });
const maskImg = fs.readFileSync('mask.png', { encoding: 'base64' });

const response = await ai.generate({
  model: imagen3,
  output: { format: 'media' },
  prompt: [
    { media: { url: `data:image/png;base64,${baseImg}` }},
    {
      media: { url: `data:image/png;base64,${maskImg}` },
      metadata: { type: 'mask' },
    },
    { text: 'replace the background with foo bar baz' },
  ],
  config: {
    editConfig: {
      editMode: 'outpainting',
    },
  },
});

return response.media();

Refer to Imagen model documentation for more detailed options.

Anthropic Claude 3 on Vertex AI Model Garden

If you have access to Claude 3 models (haiku, sonnet or opus) in Vertex AI Model Garden you can use them with Genkit.

Here's sample configuration for enabling Vertex AI Model Garden models:

import { genkit } from 'genkit';
import {
  claude3Haiku,
  claude3Sonnet,
  claude3Opus,
  vertexAIModelGarden,
} from '@genkit-ai/vertexai/modelgarden';

const ai = genkit({
  plugins: [
    vertexAIModelGarden({
      location: 'us-central1',
      models: [claude3Haiku, claude3Sonnet, claude3Opus],
    }),
  ],
});

Then use them as regular models:

const llmResponse = await ai.generate({
  model: claude3Sonnet,
  prompt: 'What should I do when I visit Melbourne?',
});

Llama 3.1 405b on Vertex AI Model Garden

First you'll need to enable Llama 3.1 API Service in Vertex AI Model Garden.

Here's sample configuration for Llama 3.1 405b in Vertex AI plugin:

import { genkit } from 'genkit';
import { llama31, vertexAIModelGarden } from '@genkit-ai/vertexai/modelgarden';

const ai = genkit({
  plugins: [
    vertexAIModelGarden({
      location: 'us-central1',
      models: [llama31],
    }),
  ],
});

Then use it as regular models:

const llmResponse = await ai.generate({
  model: llama31,
  prompt: 'Write a function that adds two numbers together',
});

Mistral Models on Vertex AI Model Garden

If you have access to Mistral models (Mistral Large, Mistral Nemo, or Codestral) in Vertex AI Model Garden, you can use them with Genkit.

Here's sample configuration for enabling Vertex AI Model Garden models:

import { genkit } from 'genkit';
import {
  mistralLarge,
  mistralNemo,
  codestral,
  vertexAIModelGarden,
} from '@genkit-ai/vertexai/modelgarden';

const ai = genkit({
  plugins: [
    vertexAIModelGarden({
      location: 'us-central1',
      models: [mistralLarge, mistralNemo, codestral],
    }),
  ],
});

Then use them as regular models:

const llmResponse = await ai.generate({
  model: mistralLarge,
  prompt: 'Write a function that adds two numbers together',
  config: {
    version: 'mistral-large-2411', // Optional: specify model version
    temperature: 0.7,              // Optional: control randomness (0-1)
    maxOutputTokens: 1024,         // Optional: limit response length
    topP: 0.9,                     // Optional: nucleus sampling parameter
    stopSequences: ['###'],        // Optional: stop generation at sequences
  }
});

The models support: - mistralLarge: Latest Mistral large model with function calling capabilities - mistralNemo: Optimized for efficiency and speed - codestral: Specialized for code generation tasks

Each model supports streaming responses and function calling:

const response = await ai.generateStream({
  model: mistralLarge,
  prompt: 'What should I cook tonight?',
  tools: ['recipe-finder'],
  config: {
    version: 'mistral-large-2411',
    temperature: 1,
  },
});

for await (const chunk of response.stream) {
  console.log(chunk.text);
}

Evaluators

To use the evaluators from Vertex AI Rapid Evaluation, add an evaluation block to your vertexAI plugin configuration.

import { genkit } from 'genkit';
import {
  vertexAIEvaluation,
  VertexAIEvaluationMetricType,
} from '@genkit-ai/vertexai/evaluation';

const ai = genkit({
  plugins: [
    vertexAIEvaluation({
      location: 'us-central1',
      metrics: [
        VertexAIEvaluationMetricType.SAFETY,
        {
          type: VertexAIEvaluationMetricType.ROUGE,
          metricSpec: {
            rougeType: 'rougeLsum',
          },
        },
      ],
    }),
  ],
});

The configuration above adds evaluators for the Safety and ROUGE metrics. The example shows two approaches- the Safety metric uses the default specification, whereas the ROUGE metric provides a customized specification that sets the rouge type to rougeLsum.

Both evaluators can be run using the genkit eval:run command with a compatible dataset: that is, a dataset with output and reference fields. The Safety evaluator can also be run using the genkit eval:flow -e vertexai/safety command since it only requires an output.

Indexers and retrievers

The Genkit Vertex AI plugin includes indexer and retriever implementations backed by the Vertex AI Vector Search service.

(See the Retrieval-augmented generation page to learn how indexers and retrievers are used in a RAG implementation.)

The Vertex AI Vector Search service is a document index that works alongside the document store of your choice: the document store contains the content of documents, and the Vertex AI Vector Search index contains, for each document, its vector embedding and a reference to the document in the document store. After your documents are indexed by the Vertex AI Vector Search service, it can respond to search queries, producing lists of indexes into your document store.

The indexer and retriever implementations provided by the Vertex AI plugin use either Cloud Firestore or BigQuery as the document store. The plugin also includes interfaces you can implement to support other document stores.

To use Vertex AI Vector Search:

  1. Choose an embedding model. This model is responsible for creating vector embeddings from text. Advanced users might use an embedding model optimized for their particular data sets, but for most users, Vertex AI's text-embedding-004 model is a good choice for English text and the text-multilingual-embedding-002 model is good for multilingual text.
  2. In the Vector Search section of the Google Cloud console, create a new index. The most important settings are:

    • Dimensions: Specify the dimensionality of the vectors produced by your chosen embedding model. The text-embedding-004 and text-multilingual-embedding-002 models produce vectors of 768 dimensions.
    • Update method: Select streaming updates.

    After you create the index, deploy it to a standard (public) endpoint.

  3. Get a document indexer and retriever for the document store you want to use:

    Cloud Firestore

    import { getFirestoreDocumentIndexer, getFirestoreDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch';
    
    import { initializeApp } from 'firebase-admin/app';
    import { getFirestore } from 'firebase-admin/firestore';
    
    initializeApp({ projectId: PROJECT_ID });
    const db = getFirestore();
    
    const firestoreDocumentRetriever = getFirestoreDocumentRetriever(db, FIRESTORE_COLLECTION);
    const firestoreDocumentIndexer = getFirestoreDocumentIndexer(db, FIRESTORE_COLLECTION);
    

    BigQuery

    import { getBigQueryDocumentIndexer, getBigQueryDocumentRetriever } from '@genkit-ai/vertexai/vectorsearch';
    import { BigQuery } from '@google-cloud/bigquery';
    
    const bq = new BigQuery({ projectId: PROJECT_ID });
    
    const bigQueryDocumentRetriever = getBigQueryDocumentRetriever(bq, BIGQUERY_TABLE, BIGQUERY_DATASET);
    const bigQueryDocumentIndexer = getBigQueryDocumentIndexer(bq, BIGQUERY_TABLE, BIGQUERY_DATASET);
    

    Other

    To support other documents stores you can provide your own implementations of DocumentRetriever and DocumentIndexer:

    const myDocumentRetriever = async (neighbors) => {
      // Return the documents referenced by `neighbors`.
      // ...
    }
    const myDocumentIndexer = async (documents) => {
      // Add `documents` to storage.
      // ...
    }
    

    For an example, see Sample Vertex AI Plugin Retriever and Indexer with Local File.

  4. Add a vectorSearchOptions block to your vertexAI plugin configuration:

    import { genkit } from 'genkit';
    import { textEmbedding004 } from '@genkit-ai/vertexai';
    import { vertexAIVectorSearch } from '@genkit-ai/vertexai/vectorsearch';
    
    const ai = genkit({
      plugins: [
        vertexAIVectorSearch({
          projectId: PROJECT_ID,
          location: LOCATION,
          vectorSearchOptions: [
            {
              indexId: VECTOR_SEARCH_INDEX_ID,
              indexEndpointId: VECTOR_SEARCH_INDEX_ENDPOINT_ID,
              deployedIndexId: VECTOR_SEARCH_DEPLOYED_INDEX_ID,
              publicDomainName: VECTOR_SEARCH_PUBLIC_DOMAIN_NAME,
              documentRetriever: firestoreDocumentRetriever,
              documentIndexer: firestoreDocumentIndexer,
              embedder: textEmbedding004,
            },
          ],
        }),
      ],
    });
    

    Provide the embedder you chose in the first step and the document indexer and retriever you created in the previous step.

    To configure the plugin to use the Vector Search index you created earlier, you need to provide several values, which you can find in the Vector Search section of the Google Cloud console:

    • indexId: listed on the Indexes tab
    • indexEndpointId: listed on the Index Endpoints tab
    • deployedIndexId and publicDomainName: listed on the "Deployed index info" page, which you can open by clicking the name of the deployed index on either of the tabs mentioned earlier
  5. Now that everything is configured, you can use the indexer and retriever in your Genkit application:

    import {
      vertexAiIndexerRef,
      vertexAiRetrieverRef,
    } from '@genkit-ai/vertexai/vectorsearch';
    
    // ... inside your flow function:
    
    await ai.index({
      indexer: vertexAiIndexerRef({
        indexId: VECTOR_SEARCH_INDEX_ID,
      }),
      documents,
    });
    
    const res = await ai.retrieve({
      retriever: vertexAiRetrieverRef({
        indexId: VECTOR_SEARCH_INDEX_ID,
      }),
      query: queryDocument,
    });
    

See the code samples for:

Context Caching

The Vertex AI Genkit plugin supports Context Caching, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.

How to Use Context Caching

To enable context caching, ensure your model supports it. For example, gemini15Flash and gemini15Pro are models that support context caching, and you will have to specify version number 001.

You can define a caching mechanism in your application like this:

const ai = genkit({
  plugins: [googleAI()],
});

const llmResponse = await ai.generate({
  messages: [
    {
      role: 'user',
      content: [{ text: 'Here is the relevant text from War and Peace.' }],
    },
    {
      role: 'model',
      content: [
        {
          text: 'Based on War and Peace, here is some analysis of Pierre Bezukhov’s character.',
        },
      ],
      metadata: {
        cache: {
          ttlSeconds: 300, // Cache this message for 5 minutes
        },
      },
    },
  ],
  model: gemini15Flash,
  prompt: 'Describe Pierre’s transformation throughout the novel.',
});

In this setup: - messages: Allows you to pass conversation history. - metadata.cache.ttlSeconds: Specifies the time-to-live (TTL) for caching a specific response.

Example: Leveraging Large Texts with Context

For applications referencing long documents, such as War and Peace or Lord of the Rings, you can structure your queries to reuse cached contexts:


const textContent = await fs.readFile('path/to/war_and_peace.txt', 'utf-8');

const llmResponse = await ai.generate({
  messages: [
    {
      role: 'user',
      content: [{ text: textContent }], // Include the large text as context
    },
    {
      role: 'model',
      content: [
        {
          text: 'This analysis is based on the provided text from War and Peace.',
        },
      ],
      metadata: {
        cache: {
          ttlSeconds: 300, // Cache the response to avoid reloading the full text
        },
      },
    },
  ],
  model: gemini15Flash,
  prompt: 'Analyze the relationship between Pierre and Natasha.',
});

Benefits of Context Caching

  1. Improved Performance: Reduces the need for repeated processing of large inputs.
  2. Cost Efficiency: Decreases API usage for redundant data, optimizing token consumption.
  3. Better Latency: Speeds up response times for repeated or related queries.

Supported Models for Context Caching

Only specific models, such as gemini15Flash and gemini15Pro, support context caching, and currently only on version numbers 001. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.

Further Reading

See more information regarding context caching on Vertex AI in their documentation.