Beta: Firebase Genkit is in Beta, which means that it is not subject to any SLA or deprecation policy and could change in backwards-incompatible ways. Throughout the Beta period, Firebase Genkit and its documentation will be updated and improved.

इस पेज का अनुवाद Cloud Translation API से किया गया है.

Genkit एवैल्यूएटर लिखना

Firebase Genkit को टेस्ट केस के आउटपुट का कस्टम आकलन करने के लिए इस्तेमाल किया जा सकता है. इसके लिए, जज के तौर पर एलएलएम का इस्तेमाल किया जा सकता है या पूरी तरह से प्रोग्राम के हिसाब से काम किया जा सकता है.

एवैल्यूएटर की परिभाषा

एवल्यूएटर ऐसे फ़ंक्शन होते हैं जो एलएलएम को दिए गए और उससे जनरेट किए गए कॉन्टेंट का आकलन करते हैं. ऑटोमेटेड आकलन (टेस्टिंग) के दो मुख्य तरीके हैं: हेयुरिस्टिक्स आकलन और एलएलएम पर आधारित आकलन. ह्यूरिस्टिक तरीके में, आपको एक ऐसा फ़ंक्शन तय करना होता है जो पारंपरिक सॉफ़्टवेयर डेवलपमेंट की तरह ही हो. एलएलएम पर आधारित आकलन में, कॉन्टेंट को एलएलएम में वापस फ़ीड किया जाता है. इसके बाद, एलएलएम से प्रॉम्प्ट में सेट की गई शर्तों के हिसाब से आउटपुट को स्कोर करने के लिए कहा जाता है.

एलएलएम पर आधारित एवैल्यूएटर

एलएलएम पर आधारित एवैल्यूएटर, जनरेटिव एआई की सुविधा के इनपुट, संदर्भ या आउटपुट का आकलन करने के लिए एलएलएम का इस्तेमाल करता है.

Genkit में एलएलएम पर आधारित एवैल्यूएटर, तीन कॉम्पोनेंट से बने होते हैं:

प्रॉम्प्ट
स्कोरिंग फ़ंक्शन
एवैल्यूएटर की कार्रवाई

प्रॉम्प्ट तय करना

इस उदाहरण के लिए, प्रॉम्प्ट एलएलएम से यह तय करने के लिए कहेगा कि आउटपुट कितना स्वादिष्ट है. सबसे पहले, एलएलएम को कॉन्टेक्स्ट दें. इसके बाद, बताएं कि आपको क्या करना है. आखिर में, उसे कुछ उदाहरण दें, ताकि वह उनका इस्तेमाल करके जवाब दे सके.

Genkit की definePrompt सुविधा, इनपुट और आउटपुट की पुष्टि के साथ प्रॉम्प्ट तय करने का आसान तरीका उपलब्ध कराती है. definePrompt के साथ, आकलन का प्रॉम्प्ट सेट अप करने का तरीका यहां बताया गया है.

const DELICIOUSNESS_VALUES = ['yes', 'no', 'maybe'] as const;

const DeliciousnessDetectionResponseSchema = z.object({
  reason: z.string(),
  verdict: z.enum(DELICIOUSNESS_VALUES),
});
type DeliciousnessDetectionResponse = z.infer<typeof DeliciousnessDetectionResponseSchema>;

const DELICIOUSNESS_PROMPT = ai.definePrompt(
  {
    name: 'deliciousnessPrompt',
    inputSchema: z.object({
      output: z.string(),
    }),
    outputSchema: DeliciousnessDetectionResponseSchema,
  },
  `You are a food critic. Assess whether the provided output sounds delicious, giving only "yes" (delicious), "no" (not delicious), or "maybe" (undecided) as the verdict.

  Examples:
  Output: Chicken parm sandwich
  Response: { "reason": "A classic and beloved dish.", "verdict": "yes" }

  Output: Boston Logan Airport tarmac
  Response: { "reason": "Not edible.", "verdict": "no" }

  Output: A juicy piece of gossip
  Response: { "reason": "Metaphorically 'tasty' but not food.", "verdict": "maybe" }

  New Output:
  {{output}}
  Response:
  `
);

स्कोरिंग फ़ंक्शन तय करना

अब, वह फ़ंक्शन तय करें जो प्रॉम्प्ट के मुताबिक output वाला कोई उदाहरण लेगा और नतीजे को स्कोर करेगा. Genkit के टेस्ट केस में, input को ज़रूरी फ़ील्ड के तौर पर शामिल किया गया है. साथ ही, output और context के लिए वैकल्पिक फ़ील्ड भी शामिल किए गए हैं. यह पुष्टि करना, जांच करने वाले व्यक्ति की ज़िम्मेदारी है कि आकलन के लिए ज़रूरी सभी फ़ील्ड मौजूद हैं.

import { BaseEvalDataPoint, Score } from 'genkit/evaluator';

/**
 * Score an individual test case for delciousness.
 */
export async function deliciousnessScore<
  CustomModelOptions extends z.ZodTypeAny,
>(
  judgeLlm: ModelArgument<CustomModelOptions>,
  dataPoint: BaseEvalDataPoint,
  judgeConfig?: CustomModelOptions
): Promise<Score> {
  const d = dataPoint;
  // Validate the input has required fields
  if (!d.output) {
    throw new Error('Output is required for Deliciousness detection');
  }

  //Hydrate the prompt
  const finalPrompt = DELICIOUSNESS_PROMPT.renderText({
    output: d.output as string,
  });

  // Call the LLM to generate an evaluation result
  const response = await generate({
    model: judgeLlm,
    prompt: finalPrompt,
    config: judgeConfig,
  });

  // Parse the output
  const parsedResponse = response.output;
  if (!parsedResponse) {
    throw new Error(`Unable to parse evaluator response: ${response.text}`);
  }

  // Return a scored response
  return {
    score: parsedResponse.verdict,
    details: { reasoning: parsedResponse.reason },
  };
}

एवैल्यूएटर ऐक्शन तय करना

आखिरी चरण में, एक ऐसा फ़ंक्शन लिखना है जो एवैल्यूएटर ऐक्शन को खुद तय करता है.

import { BaseEvalDataPoint, EvaluatorAction } from 'genkit/evaluator';

/**
 * Create the Deliciousness evaluator action.
 */
export function createDeliciousnessEvaluator<
  ModelCustomOptions extends z.ZodTypeAny,
>(
  judge: ModelReference<ModelCustomOptions>,
  judgeConfig: z.infer<ModelCustomOptions>
): EvaluatorAction {
  return defineEvaluator(
    {
      name: `myAwesomeEval/deliciousness`,
      displayName: 'Deliciousness',
      definition: 'Determines if output is considered delicous.',
    },
    async (datapoint: BaseEvalDataPoint) => {
      const score = await deliciousnessScore(judge, datapoint, judgeConfig);
      return {
        testCaseId: datapoint.testCaseId,
        evaluation: score,
      };
    }
  );
}

अनुभव के हिसाब से काम करने वाले एलिमेंट

कोई भी फ़ंक्शन, हेयुरिस्टिक्स एवैल्यूएटर हो सकता है. इसका इस्तेमाल, जनरेटिव एआई की सुविधा के इनपुट, कॉन्टेक्स्ट या आउटपुट का आकलन करने के लिए किया जाता है.

Genkit में, हेरिस्टिक एवैल्यूएटर दो कॉम्पोनेंट से बने होते हैं:

स्कोरिंग फ़ंक्शन
एवैल्यूएटर की कार्रवाई

स्कोरिंग फ़ंक्शन तय करना

एलएलएम पर आधारित एवैल्यूएटर की तरह ही, स्कोरिंग फ़ंक्शन तय करें. इस मामले में, स्कोरिंग फ़ंक्शन को जज एलएलएम या उसके कॉन्फ़िगरेशन के बारे में जानने की ज़रूरत नहीं है.

import { BaseEvalDataPoint, Score } from 'genkit/evaluator';

const US_PHONE_REGEX =
  /^[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4}$/i;

/**
 * Scores whether an individual datapoint matches a US Phone Regex.
 */
export async function usPhoneRegexScore(
  dataPoint: BaseEvalDataPoint
): Promise<Score> {
  const d = dataPoint;
  if (!d.output || typeof d.output !== 'string') {
    throw new Error('String output is required for regex matching');
  }
  const matches = US_PHONE_REGEX.test(d.output as string);
  const reasoning = matches
    ? `Output matched regex ${regex.source}`
    : `Output did not match regex ${regex.source}`;
  return {
    score: matches,
    details: { reasoning },
  };
}

एवैल्यूएटर ऐक्शन तय करना

import { BaseEvalDataPoint, EvaluatorAction } from 'genkit/evaluator';

/**
 * Configures a regex evaluator to match a US phone number.
 */
export function createUSPhoneRegexEvaluator(
  metrics: RegexMetric[]
): EvaluatorAction[] {
  return metrics.map((metric) => {
    const regexMetric = metric as RegexMetric;
    return defineEvaluator(
      {
        name: `myAwesomeEval/${metric.name.toLocaleLowerCase()}`,
        displayName: 'Regex Match',
        definition:
          'Runs the output against a regex and responds with 1 if a match is found and 0 otherwise.',
        isBilled: false,
      },
      async (datapoint: BaseEvalDataPoint) => {
        const score = await regexMatchScore(datapoint, regexMetric.regex);
        return fillScores(datapoint, score);
      }
    );
  });
}

कॉन्फ़िगरेशन

प्लग इन के विकल्प

वह PluginOptions तय करें जिसका इस्तेमाल कस्टम एवैल्यूएटर प्लग इन करेगा. इस ऑब्जेक्ट के लिए कोई ज़रूरी शर्त नहीं है. यह, तय किए गए एवैल्यूएटर के टाइप पर निर्भर करता है.

कम से कम, इसमें यह तय करना होगा कि किन मेट्रिक को रजिस्टर करना है.

export enum MyAwesomeMetric {
  WORD_COUNT = 'WORD_COUNT',
  US_PHONE_REGEX_MATCH = 'US_PHONE_REGEX_MATCH',
}

export interface PluginOptions {
  metrics?: Array<MyAwesomeMetric>;
}

अगर यह नया प्लग इन, जज के तौर पर एलएलएम का इस्तेमाल करता है और प्लग इन में यह तय करने की सुविधा है कि किस एलएलएम का इस्तेमाल करना है, तो PluginOptions ऑब्जेक्ट में अन्य पैरामीटर तय करें.

export enum MyAwesomeMetric {
  DELICIOUSNESS = 'DELICIOUSNESS',
  US_PHONE_REGEX_MATCH = 'US_PHONE_REGEX_MATCH',
}

export interface PluginOptions<ModelCustomOptions extends z.ZodTypeAny> {
  judge: ModelReference<ModelCustomOptions>;
  judgeConfig?: z.infer<ModelCustomOptions>;
  metrics?: Array<MyAwesomeMetric>;
}

प्लग इन की परिभाषा

प्लग इन को किसी प्रोजेक्ट में genkit.config.ts फ़ाइल के ज़रिए फ़्रेमवर्क के साथ रजिस्टर किया जाता है. किसी नए प्लग इन को कॉन्फ़िगर करने के लिए, ऐसा फ़ंक्शन तय करें जो GenkitPlugin को तय करता हो और उसे ऊपर बताए गए PluginOptions के साथ कॉन्फ़िगर करता हो.

इस मामले में, हमारे पास दो एवैल्यूअर DELICIOUSNESS और US_PHONE_REGEX_MATCH हैं. यहां उन एलिमेंट को प्लग इन और Firebase Genkit के साथ रजिस्टर किया जाता है.

export function myAwesomeEval<ModelCustomOptions extends z.ZodTypeAny>(
  options: PluginOptions<ModelCustomOptions>
): PluginProvider {
  // Define the new plugin
  const plugin = (options?: MyPluginOptions<ModelCustomOptions>) => {
    return genkitPlugin(
    'myAwesomeEval',
    async (ai: Genkit) => {
      const { judge, judgeConfig, metrics } = options;
      const evaluators: EvaluatorAction[] = metrics.map((metric) => {
        switch (metric) {
          case DELICIOUSNESS:
            // This evaluator requires an LLM as judge
            return createDeliciousnessEvaluator(ai, judge, judgeConfig);
          case US_PHONE_REGEX_MATCH:
            // This evaluator does not require an LLM
            return createUSPhoneRegexEvaluator();
        }
      });
      return { evaluators };
    })
  }
  // Create the plugin with the passed options
  return plugin(options);
}
export default myAwesomeEval;

Genkit कॉन्फ़िगर करना

अपने Genkit कॉन्फ़िगरेशन में, नए तौर पर तय किया गया प्लग इन जोड़ें.

Gemini की मदद से आकलन करने के लिए, सुरक्षा सेटिंग बंद करें, ताकि एवैल्यूएटर, नुकसान पहुंचाने वाले संभावित कॉन्टेंट को स्वीकार कर सके, उसका पता लगा सके, और उसे स्कोर दे सके.

import { gemini15Flash } from '@genkit-ai/googleai';

const ai = genkit({
  plugins: [
    ...
    myAwesomeEval({
      judge: gemini15Flash,
      judgeConfig: {
        safetySettings: [
          {
            category: 'HARM_CATEGORY_HATE_SPEECH',
            threshold: 'BLOCK_NONE',
          },
          {
            category: 'HARM_CATEGORY_DANGEROUS_CONTENT',
            threshold: 'BLOCK_NONE',
          },
          {
            category: 'HARM_CATEGORY_HARASSMENT',
            threshold: 'BLOCK_NONE',
          },
          {
            category: 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
            threshold: 'BLOCK_NONE',
          },
        ],
      },
      metrics: [
        MyAwesomeMetric.DELICIOUSNESS,
        MyAwesomeMetric.US_PHONE_REGEX_MATCH
      ],
    }),
  ],
  ...
});

टेस्ट करना

जनरेटिव एआई की सुविधा के आउटपुट की क्वालिटी का आकलन करने में आने वाली समस्याएं, एलएलएम पर आधारित एवैल्यूएटर की जजिंग क्षमता का आकलन करने में भी आती हैं.

यह जानने के लिए कि कस्टम एवैल्यूएटर उम्मीद के मुताबिक परफ़ॉर्म कर रहा है या नहीं, टेस्ट केस का एक सेट बनाएं. इसमें सही और गलत जवाब साफ़ तौर पर दिखने चाहिए.

उदाहरण के लिए, 'स्वादिष्टता' एट्रिब्यूट की वैल्यू, JSON फ़ाइल deliciousness_dataset.json की तरह दिख सकती है:

[
  {
    "testCaseId": "delicous_mango",
    "input": "What is a super delicious fruit",
    "output": "A perfectly ripe mango – sweet, juicy, and with a hint of tropical sunshine."
  },
  {
    "testCaseId": "disgusting_soggy_cereal",
    "input": "What is something that is tasty when fresh but less tasty after some time?",
    "output": "Stale, flavorless cereal that's been sitting in the box too long."
  }
]

ये उदाहरण, इंसान जनरेट कर सकते हैं या एलएलएम से टेस्ट केस का ऐसा सेट बनाने के लिए कहा जा सकता है जिसे क्यूरेट किया जा सके. कई मानदंड डेटासेट भी उपलब्ध हैं जिनका इस्तेमाल किया जा सकता है.

इसके बाद, इन टेस्ट केस के लिए, Genkit CLI का इस्तेमाल करके, एवैल्यूएटर को चलाएं.

genkit eval:run deliciousness_dataset.json

Genkit के यूज़र इंटरफ़ेस (यूआई) में अपने नतीजे देखें.

genkit start

localhost:4000/evaluate पर जाएं.