Skip to content

Utils

build_schema_description(schema, header='Feldhinweise und erlaubte Werte (getrennt durch Semikolons):', type_description_prefix='Beschreibung: ', cardinality_prefix='Kardinalität: ', type_prefix='Typ: ', choices_prefix='Zulässige Werte: ', choices_description_prefix='Hinweise zu den Werten: ', component_separator=' | ', choices_separator='; ', indent_step=' ', include_field_descriptions=True, include_type_descriptions=True, indent=0, root_schema=None)

Build a human‑readable summary for a JSON Schema.

Output format: - Optional first line: "" if include_type_descriptions and the description exists - Optional header line (only at top level if header is not None) - One line per property with format depending on which prefix parameters are not None: "- [: ][][][]" - For nested objects, recursively includes their properties with increased indentation

Cardinality rules: - type=array ⇒ "0..*" - non-array with "default" ⇒ "0..1" - non-array without "default" ⇒ "1"

Type extraction: - Supports inline "type", direct "$ref", and compositions via "allOf"/"anyOf"/"oneOf" - For arrays, type is taken from "items"

Choices extraction: - Supports inline enums, direct "$ref", and compositions via "allOf"/"anyOf"/"oneOf" - For arrays, choices are taken from "items" (including "$ref" or compositions)

Parameters:

Name Type Description Default
schema Mapping[str, Any]

The JSON Schema dictionary to process

required
header str | None

Header text for field list (only shown at top level, None to omit)

'Feldhinweise und erlaubte Werte (getrennt durch Semikolons):'
type_description_prefix str | None

Prefix for type descriptions (descriptions for the overall schema and nested schemas, not field descriptions)

'Beschreibung: '
cardinality_prefix str | None

Prefix for cardinality information (None to omit cardinality)

'Kardinalität: '
type_prefix str | None

Prefix for type information (None to omit types)

'Typ: '
choices_prefix str | None

Prefix for choices value lists (None to omit choices)

'Zulässige Werte: '
choices_description_prefix str | None

Prefix for choices descriptions (None to omit choices descriptions)

'Hinweise zu den Werten: '
component_separator str

Separator between field components (name, cardinality, type, choices)

' | '
choices_separator str

Separator between individual choices values

'; '
indent_step str

String used for each indentation level

' '
include_field_descriptions bool

Whether to include field/property descriptions in the output

True
include_type_descriptions bool

Whether to include schema/type descriptions (top-level and nested) in the output. NOTE: This is deprecated; please set type_description_prefix to None to omit type descriptions instead.

True
indent int

Current indentation level (internal, for recursion)

0
root_schema Mapping[str, Any] | None

Root schema containing $defs (internal, for recursion)

None

Returns:

Type Description
str

Multi-line string summarizing schema structure, fields, and constraints

Source code in src/kibad_llm/schema/utils.py
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def build_schema_description(
    schema: Mapping[str, Any],
    header: str | None = "Feldhinweise und erlaubte Werte (getrennt durch Semikolons):",
    type_description_prefix: str | None = "Beschreibung: ",
    cardinality_prefix: str | None = "Kardinalität: ",
    type_prefix: str | None = "Typ: ",
    choices_prefix: str | None = "Zulässige Werte: ",
    choices_description_prefix: str | None = "Hinweise zu den Werten: ",
    component_separator: str = " | ",
    choices_separator: str = "; ",
    indent_step: str = "  ",
    include_field_descriptions: bool = True,
    include_type_descriptions: bool = True,
    # internal args
    indent: int = 0,
    root_schema: Mapping[str, Any] | None = None,
) -> str:
    """
    Build a human‑readable summary for a JSON Schema.

    Output format:
    - Optional first line: "<type_description_prefix><schema.description>" if include_type_descriptions and the description exists
    - Optional header line (only at top level if header is not None)
    - One line per property with format depending on which prefix parameters are not None:
      "<indent>- <name>[: <description>][<separator><cardinality_prefix><cardinality>][<separator><type_prefix><type>][<separator><enum_prefix><values>]"
    - For nested objects, recursively includes their properties with increased indentation

    Cardinality rules:
    - type=array ⇒ "0..*"
    - non-array with "default" ⇒ "0..1"
    - non-array without "default" ⇒ "1"

    Type extraction:
    - Supports inline "type", direct "$ref", and compositions via "allOf"/"anyOf"/"oneOf"
    - For arrays, type is taken from "items"

    Choices extraction:
    - Supports inline enums, direct "$ref", and compositions via "allOf"/"anyOf"/"oneOf"
    - For arrays, choices are taken from "items" (including "$ref" or compositions)

    Args:
        schema: The JSON Schema dictionary to process
        header: Header text for field list (only shown at top level, None to omit)
        type_description_prefix: Prefix for type descriptions (descriptions for the overall schema and nested schemas, not field descriptions)
        cardinality_prefix: Prefix for cardinality information (None to omit cardinality)
        type_prefix: Prefix for type information (None to omit types)
        choices_prefix: Prefix for choices value lists (None to omit choices)
        choices_description_prefix: Prefix for choices descriptions (None to omit choices descriptions)
        component_separator: Separator between field components (name, cardinality, type, choices)
        choices_separator: Separator between individual choices values
        indent_step: String used for each indentation level
        include_field_descriptions: Whether to include field/property descriptions in the output
        include_type_descriptions: Whether to include schema/type descriptions (top-level and nested) in the output.
            NOTE: This is deprecated; please set type_description_prefix to None to omit type descriptions instead.
        indent: Current indentation level (internal, for recursion)
        root_schema: Root schema containing $defs (internal, for recursion)

    Returns:
        Multi-line string summarizing schema structure, fields, and constraints
    """
    if not include_type_descriptions:
        type_description_prefix = None
        warn_once(
            "include_type_descriptions is deprecated; please set type_description_prefix to None instead "
            "of using include_type_descriptions=False."
        )

    if root_schema is None:
        root_schema = schema

    lines = []
    prefix = indent_step * indent

    # Add description
    if type_description_prefix is not None:
        # remove all newlines and extra spaces from the description
        schema_desc = _norm_desc(schema.get("description"))
        if schema_desc:
            lines.append(f"{prefix}{type_description_prefix}{schema_desc}")

    if header:
        lines.append(header)

    props = schema.get("properties", {}) or {}
    for name, spec in props.items():
        # Single check for array vs non-array handling
        is_array = spec.get("type") == "array"
        target = spec.get("items") if is_array else spec
        target_for_hints = _pick_preferred_branch(target, root_schema)

        # Determine cardinality
        has_default = "default" in spec
        cardinality = "0..*" if is_array else ("0..1" if has_default else "1")

        # Extract type and choices from target
        field_type = _extract_type(root_schema, target_for_hints)
        # use choices_separator also to join the enum *descriptions* if needed
        choices_with_description = _extract_choices_with_description(
            root_schema, target_for_hints, description_separator=choices_separator
        )

        # Build field line
        hint = f"{prefix}- {name}:"
        # the field description is mandatory (if exists)
        if include_field_descriptions:
            # remove all newlines and extra spaces from the description
            desc = _norm_desc(spec.get("description"))
            if desc:
                hint += f" {desc}"
        if cardinality_prefix is not None:
            hint += f"{component_separator}{cardinality_prefix}{cardinality}"
        if field_type and type_prefix is not None:
            hint += f"{component_separator}{type_prefix}{field_type}"
        if choices_with_description and choices_prefix is not None:
            choices, choices_desc = choices_with_description
            hint += f"{component_separator}{choices_prefix}" + choices_separator.join(choices)
            # remove all newlines and extra spaces from the choices description
            choices_desc = _norm_desc(choices_desc)
            if choices_desc and choices_description_prefix is not None:
                hint += f"{component_separator}{choices_description_prefix}{choices_desc}"

        lines.append(hint)

        # Handle nested objects recursively:
        # - $ref objects
        # - inline object schemas with "properties" (needed for metadata wrappers)
        if field_type == "object" and isinstance(target_for_hints, ABCMapping):
            nested_schema: Mapping[str, Any] | None = None

            ref = target_for_hints.get("$ref")
            if isinstance(ref, str):
                nested_schema = _resolve_ref(root_schema, ref)
            elif isinstance(target_for_hints.get("properties"), ABCMapping):
                nested_schema = target_for_hints

            if nested_schema:
                nested_content = build_schema_description(
                    nested_schema,
                    indent=indent + 1,
                    root_schema=root_schema,
                    # no header for nested
                    header=None,
                    type_description_prefix=type_description_prefix,
                    cardinality_prefix=cardinality_prefix,
                    type_prefix=type_prefix,
                    choices_prefix=choices_prefix,
                    component_separator=component_separator,
                    choices_separator=choices_separator,
                    indent_step=indent_step,
                    include_field_descriptions=include_field_descriptions,
                )
                lines.append(nested_content)

    return "\n".join(lines)

wrap_terminals_with_metadata(schema, metadata_schema, *, content_key=WRAPPED_CONTENT_KEY, content_description=None)

Wrap every terminal field schema (scalars/enums/const, including nullable unions and refs) into an object containing: - : the original terminal schema - plus metadata fields from metadata_schema (default: evidence_anchor: string)

Notes: - Does NOT wrap the root of $defs entries (important for shared enum defs). - DOES wrap terminal fields inside object definitions in $defs. - Returns a deep-copied dict; input is not mutated.

Source code in src/kibad_llm/schema/utils.py
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
def wrap_terminals_with_metadata(
    schema: Mapping[str, Any],
    metadata_schema: Mapping[str, Any],
    *,
    content_key: str = WRAPPED_CONTENT_KEY,
    content_description: str | None = None,
) -> dict[str, Any]:
    """
    Wrap every terminal field schema (scalars/enums/const, including nullable unions and refs)
    into an object containing:
      - <content_key>: the original terminal schema
      - plus metadata fields from `metadata_schema` (default: evidence_anchor: string)

    Notes:
    - Does NOT wrap the root of $defs entries (important for shared enum defs).
    - DOES wrap terminal fields inside object definitions in $defs.
    - Returns a deep-copied dict; input is not mutated.
    """
    from copy import deepcopy

    root: dict[str, Any] = deepcopy(dict(schema))
    metadata_obj_schema = _normalize_metadata_schema(metadata_schema)

    def transform(node: Any, *, allow_wrap_here: bool = True) -> Any:
        if isinstance(node, list):
            return [transform(x, allow_wrap_here=True) for x in node]
        if not isinstance(node, ABCMapping):
            return node

        node_dict: dict[str, Any] = dict(node)

        if _is_metadata_wrapper(
            node_dict, metadata_obj_schema=metadata_obj_schema, content_key=content_key
        ):
            return node_dict

        if isinstance(node_dict.get("type"), list):
            raise ValueError(
                "Encountered JSON Schema 'type' as a list. "
                "This code expects Pydantic-style unions via anyOf/oneOf. "
                "Please normalize type-lists to anyOf/oneOf first or update the wrapper."
            )

        if allow_wrap_here and _schema_should_be_wrapped(root, node_dict):
            return _wrap_value_schema_with_metadata(
                node_dict,
                metadata_obj_schema=metadata_obj_schema,
                content_key=content_key,
                content_description=content_description,
            )

        # combinators
        for k in ("anyOf", "oneOf", "allOf"):
            v = node_dict.get(k)
            if isinstance(v, list):
                node_dict[k] = [transform(s, allow_wrap_here=True) for s in v]

        # object keywords
        props = node_dict.get("properties")
        if isinstance(props, ABCMapping):
            node_dict["properties"] = {
                name: transform(spec, allow_wrap_here=True) for name, spec in props.items()
            }

        pat_props = node_dict.get("patternProperties")
        if isinstance(pat_props, ABCMapping):
            node_dict["patternProperties"] = {
                pat: transform(spec, allow_wrap_here=True) for pat, spec in pat_props.items()
            }

        add_props = node_dict.get("additionalProperties")
        if isinstance(add_props, ABCMapping):
            node_dict["additionalProperties"] = transform(add_props, allow_wrap_here=True)

        uneval_props = node_dict.get("unevaluatedProperties")
        if isinstance(uneval_props, ABCMapping):
            node_dict["unevaluatedProperties"] = transform(uneval_props, allow_wrap_here=True)

        # array keywords
        items = node_dict.get("items")
        if isinstance(items, ABCMapping):
            node_dict["items"] = transform(items, allow_wrap_here=True)

        prefix_items = node_dict.get("prefixItems")
        if isinstance(prefix_items, list):
            node_dict["prefixItems"] = [transform(s, allow_wrap_here=True) for s in prefix_items]

        # other schema-bearing keywords (common)
        for k in ("not", "if", "then", "else", "contains", "propertyNames", "dependentSchemas"):
            v = node_dict.get(k)
            if isinstance(v, ABCMapping):
                node_dict[k] = transform(v, allow_wrap_here=True)
            elif isinstance(v, list):
                node_dict[k] = [transform(x, allow_wrap_here=True) for x in v]

        # defs: don't wrap the *root* of each def, but do transform inside it
        for defs_key in ("$defs", "definitions"):
            defs = node_dict.get(defs_key)
            if isinstance(defs, ABCMapping):
                node_dict[defs_key] = {
                    n: transform(s, allow_wrap_here=False) for n, s in defs.items()
                }

        return node_dict

    return transform(root, allow_wrap_here=True)