diff --git a/docs/technical/ai/structured-data.md b/docs/technical/ai/structured-data.md new file mode 100644 index 0000000..12711b1 --- /dev/null +++ b/docs/technical/ai/structured-data.md @@ -0,0 +1,409 @@ +# Generating Structured Data + +While text generation can be useful, your use case will likely call for generating structured data. +For example, you might want to extract information from text, classify data, or generate synthetic data. + +Many language models are capable of generating structured data, often defined as using "JSON modes" or "tools". +However, you need to manually provide schemas and then validate the generated data as LLMs can produce incorrect or incomplete structured data. + +The AI SDK standardises structured object generation across model providers +with the [`generateObject`](/docs/reference/ai-sdk-core/generate-object) +and [`streamObject`](/docs/reference/ai-sdk-core/stream-object) functions. +You can use both functions with different output strategies, e.g. `array`, `object`, `enum`, or `no-schema`, +and with different generation modes, e.g. `auto`, `tool`, or `json`. +You can use [Zod schemas](/docs/reference/ai-sdk-core/zod-schema), [Valibot](/docs/reference/ai-sdk-core/valibot-schema), or [JSON schemas](/docs/reference/ai-sdk-core/json-schema) to specify the shape of the data that you want, +and the AI model will generate data that conforms to that structure. + + + You can pass Zod objects directly to the AI SDK functions or use the + `zodSchema` helper function. + + +## Generate Object + +The `generateObject` generates structured data from a prompt. +The schema is also used to validate the generated data, ensuring type safety and correctness. + +```ts +import { generateObject } from "ai"; +import { z } from "zod"; + +const { object } = await generateObject({ + model: "openai/gpt-4.1", + schema: z.object({ + recipe: z.object({ + name: z.string(), + ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), + steps: z.array(z.string()), + }), + }), + prompt: "Generate a lasagna recipe.", +}); +``` + + + See `generateObject` in action with [these examples](#more-examples) + + +### Accessing response headers & body + +Sometimes you need access to the full response from the model provider, +e.g. to access some provider-specific headers or body content. + +You can access the raw response headers and body using the `response` property: + +```ts +import { generateObject } from "ai"; + +const result = await generateObject({ + // ... +}); + +console.log(JSON.stringify(result.response.headers, null, 2)); +console.log(JSON.stringify(result.response.body, null, 2)); +``` + +## Stream Object + +Given the added complexity of returning structured data, model response time can be unacceptable for your interactive use case. +With the [`streamObject`](/docs/reference/ai-sdk-core/stream-object) function, you can stream the model's response as it is generated. + +```ts +import { streamObject } from "ai"; + +const { partialObjectStream } = streamObject({ + // ... +}); + +// use partialObjectStream as an async iterable +for await (const partialObject of partialObjectStream) { + console.log(partialObject); +} +``` + +You can use `streamObject` to stream generated UIs in combination with React Server Components (see [Generative UI](../ai-sdk-rsc))) or the [`useObject`](/docs/reference/ai-sdk-ui/use-object) hook. + +See `streamObject` in action with [these examples](#more-examples) + +### `onError` callback + +`streamObject` immediately starts streaming. +Errors become part of the stream and are not thrown to prevent e.g. servers from crashing. + +To log errors, you can provide an `onError` callback that is triggered when an error occurs. + +```tsx highlight="5-7" +import { streamObject } from "ai"; + +const result = streamObject({ + // ... + onError({ error }) { + console.error(error); // your error logging logic here + }, +}); +``` + +## Output Strategy + +You can use both functions with different output strategies, e.g. `array`, `object`, `enum`, or `no-schema`. + +### Object + +The default output strategy is `object`, which returns the generated data as an object. +You don't need to specify the output strategy if you want to use the default. + +### Array + +If you want to generate an array of objects, you can set the output strategy to `array`. +When you use the `array` output strategy, the schema specifies the shape of an array element. +With `streamObject`, you can also stream the generated array elements using `elementStream`. + +```ts highlight="7,18" +import { openai } from "@ai-sdk/openai"; +import { streamObject } from "ai"; +import { z } from "zod"; + +const { elementStream } = streamObject({ + model: openai("gpt-4.1"), + output: "array", + schema: z.object({ + name: z.string(), + class: z + .string() + .describe("Character class, e.g. warrior, mage, or thief."), + description: z.string(), + }), + prompt: "Generate 3 hero descriptions for a fantasy role playing game.", +}); + +for await (const hero of elementStream) { + console.log(hero); +} +``` + +### Enum + +If you want to generate a specific enum value, e.g. for classification tasks, +you can set the output strategy to `enum` +and provide a list of possible values in the `enum` parameter. + +Enum output is only available with `generateObject`. + +```ts highlight="5-6" +import { generateObject } from "ai"; + +const { object } = await generateObject({ + model: "openai/gpt-4.1", + output: "enum", + enum: ["action", "comedy", "drama", "horror", "sci-fi"], + prompt: + "Classify the genre of this movie plot: " + + '"A group of astronauts travel through a wormhole in search of a ' + + 'new habitable planet for humanity."', +}); +``` + +### No Schema + +In some cases, you might not want to use a schema, +for example when the data is a dynamic user request. +You can use the `output` setting to set the output format to `no-schema` in those cases +and omit the schema parameter. + +```ts highlight="6" +import { openai } from "@ai-sdk/openai"; +import { generateObject } from "ai"; + +const { object } = await generateObject({ + model: openai("gpt-4.1"), + output: "no-schema", + prompt: "Generate a lasagna recipe.", +}); +``` + +## Schema Name and Description + +You can optionally specify a name and description for the schema. These are used by some providers for additional LLM guidance, e.g. via tool or schema name. + +```ts highlight="6-7" +import { generateObject } from "ai"; +import { z } from "zod"; + +const { object } = await generateObject({ + model: "openai/gpt-4.1", + schemaName: "Recipe", + schemaDescription: "A recipe for a dish.", + schema: z.object({ + name: z.string(), + ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), + steps: z.array(z.string()), + }), + prompt: "Generate a lasagna recipe.", +}); +``` + +## Accessing Reasoning + +You can access the reasoning used by the language model to generate the object via the `reasoning` property on the result. This property contains a string with the model's thought process, if available. + +```ts +import { openai, OpenAIResponsesProviderOptions } from "@ai-sdk/openai"; +import { generateObject } from "ai"; +import { z } from "zod/v4"; + +const result = await generateObject({ + model: openai("gpt-5"), + schema: z.object({ + recipe: z.object({ + name: z.string(), + ingredients: z.array( + z.object({ + name: z.string(), + amount: z.string(), + }) + ), + steps: z.array(z.string()), + }), + }), + prompt: "Generate a lasagna recipe.", + providerOptions: { + openai: { + strictJsonSchema: true, + reasoningSummary: "detailed", + } satisfies OpenAIResponsesProviderOptions, + }, +}); + +console.log(result.reasoning); +``` + +## Error Handling + +When `generateObject` cannot generate a valid object, it throws a [`AI_NoObjectGeneratedError`](/docs/reference/ai-sdk-errors/ai-no-object-generated-error). + +This error occurs when the AI provider fails to generate a parsable object that conforms to the schema. +It can arise due to the following reasons: + +- The model failed to generate a response. +- The model generated a response that could not be parsed. +- The model generated a response that could not be validated against the schema. + +The error preserves the following information to help you log the issue: + +- `text`: The text that was generated by the model. This can be the raw text or the tool call text, depending on the object generation mode. +- `response`: Metadata about the language model response, including response id, timestamp, and model. +- `usage`: Request token usage. +- `cause`: The cause of the error (e.g. a JSON parsing error). You can use this for more detailed error handling. + +```ts +import { generateObject, NoObjectGeneratedError } from "ai"; + +try { + await generateObject({ model, schema, prompt }); +} catch (error) { + if (NoObjectGeneratedError.isInstance(error)) { + console.log("NoObjectGeneratedError"); + console.log("Cause:", error.cause); + console.log("Text:", error.text); + console.log("Response:", error.response); + console.log("Usage:", error.usage); + } +} +``` + +## Repairing Invalid or Malformed JSON + + + The `repairText` function is experimental and may change in the future. + + +Sometimes the model will generate invalid or malformed JSON. +You can use the `repairText` function to attempt to repair the JSON. + +It receives the error, either a `JSONParseError` or a `TypeValidationError`, +and the text that was generated by the model. +You can then attempt to repair the text and return the repaired text. + +```ts highlight="7-10" +import { generateObject } from "ai"; + +const { object } = await generateObject({ + model, + schema, + prompt, + experimental_repairText: async ({ text, error }) => { + // example: add a closing brace to the text + return text + "}"; + }, +}); +``` + +## Structured outputs with `generateText` and `streamText` + +You can generate structured data with `generateText` and `streamText` by using the `experimental_output` setting. + + + Some models, e.g. those by OpenAI, support structured outputs and tool calling + at the same time. This is only possible with `generateText` and `streamText`. + + + + Structured output generation with `generateText` and `streamText` is + experimental and may change in the future. + + +### `generateText` + +```ts highlight="2,4-18" +// experimental_output is a structured object that matches the schema: +const { experimental_output } = await generateText({ + // ... + experimental_output: Output.object({ + schema: z.object({ + name: z.string(), + age: z.number().nullable().describe("Age of the person."), + contact: z.object({ + type: z.literal("email"), + value: z.string(), + }), + occupation: z.object({ + type: z.literal("employed"), + company: z.string(), + position: z.string(), + }), + }), + }), + prompt: "Generate an example person for testing.", +}); +``` + +### `streamText` + +```ts highlight="2,4-18" +// experimental_partialOutputStream contains generated partial objects: +const { experimental_partialOutputStream } = await streamText({ + // ... + experimental_output: Output.object({ + schema: z.object({ + name: z.string(), + age: z.number().nullable().describe("Age of the person."), + contact: z.object({ + type: z.literal("email"), + value: z.string(), + }), + occupation: z.object({ + type: z.literal("employed"), + company: z.string(), + position: z.string(), + }), + }), + }), + prompt: "Generate an example person for testing.", +}); +``` + +## More Examples + +You can see `generateObject` and `streamObject` in action using various frameworks in the following examples: + +### `generateObject` + + + +### `streamObject` + +