Chat with a PDF file

You can use Genkit to build an app that lets its user chat with a PDF file. To do this, follow these steps:

  1. Set up your project
  2. Import the required dependencies
  3. Configure Genkit and the default model
  4. Load and parse the PDF file
  5. Set up the prompt
  6. Implement the UI
  7. Implement the chat loop
  8. Run the app

This guide explains how to perform each of these tasks.

Dependencies

Before starting work, you should have these dependencies set up:

Tasks

After setting up your dependencies, you can build the project, itself.

1. Set up your project

  1. Create a directory structure and a file to hold your source code.

    $ mkdir -p chat-with-a-pdf/src && \
    cd chat-with-a-pdf/src && \
    touch index.ts
    
  2. Initialize a new TypeScript project.

    $ npm init -y
    
  3. Install the pdf-parse module:

    $ npm i pdf-parse
    
  4. Install the following Genkit dependencies to use Genkit in your project:

    $ npm install genkit @genkit-ai/googleai
    
  • genkit provides Genkit core capabilities.
  • @genkit-ai/googleai provides access to the Google AI Gemini models.

      5. Get and configure your model API key

    To use the Gemini API, which this codelab uses, you must first configure an API key. If you don't already have one, create a key in Google AI Studio. The Gemini API provides a generous free-of-charge tier and does not require a credit card to get started. After creating your API key, set the GOOGLE_GENAI_API_KEY` environment variable to your key with the following command:
    $ export GOOGLE_GENAI_API_KEY=<your API key>


Note: Although this tutorial uses the Gemini API from AI Studio, Genkit supports a wide variety of model providers, including:

2. Import the required dependencies

In the index.ts file that you created, add the following lines to import the dependencies required for this project:

   import { gemini15Flash, googleAI } from '@genkit-ai/googleai';
   import { genkit } from 'genkit';
   import pdf from 'pdf-parse';
   import fs from 'fs';
   import { createInterface } from "node:readline/promises";
  • The first two lines import Genkit and the Google AI plugin.
  • The second two lines are for the pdf parser.
  • The fifth line is for implementing your UI.

3. Configure Genkit and the default model

Add the following lines to configure Genkit and set Gemini 1.5 Flash as the default model.

   const ai = genkit({
     plugins: [googleAI()],
     model: gemini15Flash,
   });

You can then add a skeleton for the code and error-handling.

   (async () => {
     try {
       // Step 1: get command line arguments

       // Step 2: load PDF file

       // Step 3: construct prompt

       // Step 4: start chat

       Step 5: chat loop

     } catch (error) {
       console.error("Error parsing PDF or interacting with Genkit:", error);
     }
   })(); // <-- don't forget the trailing parentheses to call the function!

4. Load and parse the PDF

  1. Under Step 1, add code to read the PDF filename that was passed in from the command line.

      const filename = process.argv[2];
      if (!filename) {
        console.error("Please provide a filename as a command line argument.");
        process.exit(1);
      }
    
  2. Under Step 2, add code to load the contents of the PDF file.

      let dataBuffer = fs.readFileSync(filename);
      const { text } = await pdf(dataBuffer);
    

5. Set up the prompt

Under Step 3, add code to set up the prompt:

   const prefix = process.argv[3] || "Sample prompt: Answer the user's questions about the contents of this PDF file.";
   const prompt = `
     ${prefix}
     Context:
     ${text}
       `
  • The first const declaration defines a default prompt if the user doesn't pass in one of their own from the command line.
  • The second const declaration interpolates the prompt prefix and the full text of the PDF file into the prompt for the model.

6. Implement the UI

Under Step 4, add the following code to start the chat and implement the UI:

   const chat = ai.chat({ system: prompt })
   const readline = createInterface(process.stdin, process.stdout);
   console.log("You're chatting with Gemini. Ctrl-C to quit.\n");

The first const declaration starts the chat with the model by calling the chat method, passing the prompt (which includes the full text of the PDF file). The rest of the code instantiates a text input, then displays a message to the user.

7. Implement the chat loop

Under Step 5, add code to receive user input and send that input to the model using chat.send. This part of the app loops until the user presses CTRL + C.

       while (true) {
         const userInput = await readline.question("> ");
         const {text} = await chat.send(userInput);
         console.log(text);
       }

8. Run the app

Run the app from your terminal. Open the terminal in the root folder of your project, then run the following command:

npx tsx src/index.ts path/to/some.pdf

You can then start chatting with the PDF file.

Code-first framework for orchestrating, deploying, and monitoring generative AI workflows.

Updated Feb 7, 2025

Code-first framework for orchestrating, deploying, and monitoring generative AI workflows.

Updated Feb 7, 2025

Code-first framework for orchestrating, deploying, and monitoring generative AI workflows.

Updated Feb 7, 2025