Openai

`OpenAI`

Bases: LLM

OpenAI wrapper that supports Structured Outputs + (optional) reasoning summaries via Responses API.

Source code in src/kibad_llm/llms/openai.py

class OpenAI(LLM):
    """OpenAI wrapper that supports Structured Outputs + (optional) reasoning summaries via Responses API."""

    def __init__(self, *args, **kwargs) -> None:
        self.model = OpenAIResponses(*args, **kwargs)

    def call_llm_chat_with_guided_decoding(
        self,
        messages: list[SimpleChatMessage],
        *,
        json_schema: dict[str, Any] | None = None,
        **request_kwargs,
    ) -> ChatResponse:
        seed = request_kwargs.pop("seed", None)
        if seed is not None:
            warn_once(
                "`seed` parameter is not supported by OpenAI Responses API and will be ignored."
            )

        if json_schema is not None:
            if "text" in request_kwargs and "format" in (request_kwargs.get("text") or {}):
                warn_once(
                    "`json_schema` was provided but `text.format` is already set; "
                    "keeping the explicit `text.format`."
                )
            else:
                schema_name = _schema_name_from(json_schema)
                strict = request_kwargs.pop("strict_json_schema", True)
                if strict:
                    json_schema = make_openai_strict_json_schema(json_schema)
                else:
                    warn_once(
                        "Using non-strict JSON Schema with OpenAI Structured Outputs "
                        "(strict_json_schema=false) is not recommended; the model may emit "
                        "keys or types not declared in the schema."
                    )
                if "text" not in request_kwargs:
                    request_kwargs["text"] = {}

                request_kwargs["text"] = {
                    "format": {
                        "type": "json_schema",
                        "name": schema_name,
                        "strict": strict,
                        "schema": json_schema,
                    }
                }

        llama_index_messages = [
            LlamaIndexChatMessage(role=msg.role, content=msg.content) for msg in messages
        ]
        try:
            return self.model.chat(llama_index_messages, **request_kwargs)
        except BadRequestError as e:
            # align error type with in_process LLMs
            raise ValueError(e.message) from e

    def get_reasoning_from_chat_response(self, response: ChatResponse) -> str:
        """Return the OpenAI Responses API reasoning *summary* (not raw CoT)."""

        thinking_block_contents = [
            block.content
            for block in response.message.blocks
            if isinstance(block, ThinkingBlock) and block.content is not None
        ]
        if len(thinking_block_contents) == 0:
            raise ReasoningExtractionError(
                "Could not find any ThinkingBlock content in chat response. "
                "Did you enable reasoning summaries via OpenAI Responses API, e.g. "
                "`extractor.llm.reasoning_options.summary=auto`?"
            )

        return "\n\n".join(thinking_block_contents)

`get_reasoning_from_chat_response(response)`

Return the OpenAI Responses API reasoning summary (not raw CoT).

Source code in src/kibad_llm/llms/openai.py

def get_reasoning_from_chat_response(self, response: ChatResponse) -> str:
    """Return the OpenAI Responses API reasoning *summary* (not raw CoT)."""

    thinking_block_contents = [
        block.content
        for block in response.message.blocks
        if isinstance(block, ThinkingBlock) and block.content is not None
    ]
    if len(thinking_block_contents) == 0:
        raise ReasoningExtractionError(
            "Could not find any ThinkingBlock content in chat response. "
            "Did you enable reasoning summaries via OpenAI Responses API, e.g. "
            "`extractor.llm.reasoning_options.summary=auto`?"
        )

    return "\n\n".join(thinking_block_contents)

`make_openai_strict_json_schema(schema)`

Convert a JSON Schema (often generated by Pydantic) into a form that is accepted by OpenAI Structured Outputs when using the Responses API with

text={"format": {"type": "json_schema", "strict": True, "schema": ...}}

(or the equivalent response_format shape in other clients).

Why this exists

OpenAI Structured Outputs validates schemas using a restricted subset of JSON Schema plus additional strict-mode constraints. A schema produced by BaseModel.model_json_schema() is typically valid JSON Schema, but can still be rejected by OpenAI for reasons such as:

1) Object strictness: - Every object schema must forbid undeclared keys via additionalProperties: false. - Every object schema must provide required, and in strict mode OpenAI expects required to include every key in properties (even if fields have defaults). Pydantic usually omits defaulted fields from required (e.g., default_factory=list).

2) $ref limitations: OpenAI rejects schemas where a $ref appears alongside sibling keywords (e.g. a property defined as {"$ref": "...", "description": "..."}), even though this is permitted in full JSON Schema drafts. This commonly happens in Pydantic output when a referenced definition is annotated with title/description.

3) Unsupported annotation keywords: In strict mode, OpenAI rejects default in the schema. In JSON Schema, default is an annotation (not used for validation), but OpenAI treats it as invalid input for strict Structured Outputs.

What the function does

This helper walks the entire schema (including nested objects and $defs) and applies the minimal transformations needed to satisfy OpenAI strict-mode requirements:

For every node that looks like an object schema ({"type": "object", "properties": {...}}):
1. Set required = list(properties.keys())
2. Set additionalProperties = false
Remove all default keys anywhere in the schema.
For any dict node that contains $ref and other keys, rewrite it so that $ref is placed inside an anyOf: {"$ref": "...", "description": "..."} -> {"anyOf": [{"$ref": "..."}], "description": "..."} This preserves the validation meaning (a single-entry anyOf) while avoiding the OpenAI restriction on $ref siblings.

Parameters

schema: A JSON Schema dictionary (e.g., from BaseModel.model_json_schema()).

Returns

dict[str, Any] A patched schema dictionary. The returned schema is a deep copy of the input, so the original schema object is not mutated.

Notes

The required + additionalProperties: false changes do tighten validation compared to permissive JSON Schema, but match OpenAI strict-mode expectations and are usually aligned with "always emit all keys" extraction-style outputs.
If you want a field to be effectively optional while still being listed in required, model it as nullable (e.g., via anyOf: [{"type": "string"}, {"type": "null"}]) or use empty arrays as the "unknown" value for list fields.
This function does not guarantee that the resulting schema is accepted by every guided-decoding backend (e.g., vLLM/outlines/xgrammar). Consider keeping a separate "raw" Pydantic schema for self-hosted decoding and using this transformer only for OpenAI strict Structured Outputs.

Example

raw = MyModel.model_json_schema(by_alias=False) strict_schema = make_openai_strict_json_schema(raw) request_kwargs["text"] = { ... "format": { ... "type": "json_schema", ... "name": "my_model", ... "strict": True, ... "schema": strict_schema, ... } ... }

Source code in src/kibad_llm/llms/openai.py

def make_openai_strict_json_schema(schema: dict[str, Any]) -> dict[str, Any]:
    """
    Convert a JSON Schema (often generated by Pydantic) into a form that is accepted by
    OpenAI Structured Outputs when using the Responses API with

        text={"format": {"type": "json_schema", "strict": True, "schema": ...}}

    (or the equivalent `response_format` shape in other clients).

    Why this exists
    ---------------
    OpenAI Structured Outputs validates schemas using a restricted subset of JSON Schema
    plus additional strict-mode constraints. A schema produced by
    `BaseModel.model_json_schema()` is typically valid JSON Schema, but can still be rejected
    by OpenAI for reasons such as:

    1) Object strictness:
       - Every object schema must forbid undeclared keys via `additionalProperties: false`.
       - Every object schema must provide `required`, and in strict mode OpenAI expects
         `required` to include *every* key in `properties` (even if fields have defaults).
         Pydantic usually omits defaulted fields from `required` (e.g., `default_factory=list`).

    2) `$ref` limitations:
       OpenAI rejects schemas where a `$ref` appears alongside sibling keywords (e.g. a
       property defined as `{"$ref": "...", "description": "..."}`), even though this is
       permitted in full JSON Schema drafts. This commonly happens in Pydantic output when
       a referenced definition is annotated with `title`/`description`.

    3) Unsupported annotation keywords:
       In strict mode, OpenAI rejects `default` in the schema. In JSON Schema, `default`
       is an annotation (not used for validation), but OpenAI treats it as invalid input
       for strict Structured Outputs.

    What the function does
    ----------------------
    This helper walks the entire schema (including nested objects and `$defs`) and applies
    the minimal transformations needed to satisfy OpenAI strict-mode requirements:

    - For every node that looks like an object schema
      (`{"type": "object", "properties": {...}}`):
        1. Set `required = list(properties.keys())`
        2. Set `additionalProperties = false`

    - Remove all `default` keys anywhere in the schema.

    - For any dict node that contains `$ref` *and* other keys, rewrite it so that `$ref`
      is placed inside an `anyOf`:
        {"$ref": "...", "description": "..."}  ->  {"anyOf": [{"$ref": "..."}], "description": "..."}
      This preserves the validation meaning (a single-entry `anyOf`) while avoiding the
      OpenAI restriction on `$ref` siblings.

    Parameters
    ----------
    schema:
        A JSON Schema dictionary (e.g., from `BaseModel.model_json_schema()`).

    Returns
    -------
    dict[str, Any]
        A patched schema dictionary. The returned schema is a deep copy of the input, so
        the original schema object is not mutated.

    Notes
    -----
    - The `required` + `additionalProperties: false` changes do tighten validation compared
      to permissive JSON Schema, but match OpenAI strict-mode expectations and are usually
      aligned with "always emit all keys" extraction-style outputs.
    - If you want a field to be effectively optional while still being listed in `required`,
      model it as nullable (e.g., via `anyOf: [{"type": "string"}, {"type": "null"}]`) or
      use empty arrays as the "unknown" value for list fields.
    - This function does not guarantee that the resulting schema is accepted by every
      guided-decoding backend (e.g., vLLM/outlines/xgrammar). Consider keeping a separate
      "raw" Pydantic schema for self-hosted decoding and using this transformer only for
      OpenAI strict Structured Outputs.

    Example
    -------
    >>> raw = MyModel.model_json_schema(by_alias=False)
    >>> strict_schema = make_openai_strict_json_schema(raw)
    >>> request_kwargs["text"] = {
    ...     "format": {
    ...         "type": "json_schema",
    ...         "name": "my_model",
    ...         "strict": True,
    ...         "schema": strict_schema,
    ...     }
    ... }
    """

    schema = copy.deepcopy(schema)

    def walk(node: Any) -> None:
        if isinstance(node, dict):
            # OpenAI strict: `default` is not allowed
            node.pop("default", None)

            # OpenAI strict: `$ref` cannot have sibling keywords; move `$ref` into anyOf
            if "$ref" in node and len(node) > 1:
                ref = node.pop("$ref")
                anyof = node.get("anyOf")
                if isinstance(anyof, list):
                    anyof.insert(0, {"$ref": ref})
                else:
                    node["anyOf"] = [{"$ref": ref}]

            # Recurse
            for v in node.values():
                walk(v)

            # Patch object nodes
            if node.get("type") == "object" and isinstance(node.get("properties"), dict):
                props = node["properties"]
                node["required"] = list(props.keys())
                node["additionalProperties"] = False

        elif isinstance(node, list):
            for item in node:
                walk(item)

    walk(schema)
    return schema