Related topics: Structured LLM The following files were used as context for generating this wiki page:

Structured Outputs

Structured Outputs in Starfish provide a way to generate data in a predictable and type-safe format using Large Language Models (LLMs). This feature allows developers to define the structure of the output data using either Pydantic models or JSON schemas, ensuring that the generated data conforms to the specified format. This approach enhances the reliability and usability of synthetic data generated by LLMs.

Overview of StructuredLLM

The StructuredLLM class is central to generating structured outputs. It takes a model name, a prompt, and an output schema as input. The output schema can be defined using Pydantic models or JSON schemas. The run method of this class executes the LLM with the given prompt and parses the output into the specified structure. src/starfish/llm/structured_llm.py:16-41

Source File References

Core Implementation

Parsers

Tests

Examples

Defining Output Schemas

Starfish supports two primary methods for defining output schemas: Pydantic models and JSON schemas.

Pydantic Models

Pydantic models offer type safety and validation. By defining a Pydantic model, you can ensure that the generated data adheres to specific types and constraints. src/starfish/llm/parser/pydantic_parser.py:13-30
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int
    address: Address
Sources: tests/llm/parser/test_pydantic_parser.py:16-20

JSON Schemas

JSON schemas provide a flexible way to define the structure of the output data. This method is useful when you need a more dynamic or less strict schema definition. src/starfish/llm/parser/json_parser.py:15-23
json_schema = [
    {'name': 'question', 'type': 'str'},
    {'name': 'answer', 'type': 'str'},
]
Sources: examples/structured_llm.ipynb()

Schema Conversion

The JSONParser includes functionality to convert a simplified field list to a JSON schema. src/starfish/llm/parser/json_parser.py:31-60 This allows for a more concise way to define schemas programmatically.
fields = [
    {"name": "name", "type": "str", "description": "Person's name"},
    {"name": "age", "type": "int", "description": "Person's age"}
]
schema = JSONParser.convert_to_schema(fields)
Sources: tests/llm/parser/test_json_parser.py:25-30

Parsing LLM Outputs

The PydanticParser and JSONParser classes are responsible for parsing the output from LLMs and converting it into the specified structured format.

Pydantic Parser

The PydanticParser uses the Pydantic model to parse the LLM output. It handles various scenarios, including markdown code blocks. src/starfish/llm/parser/pydantic_parser.py:44-61
text = """
Here's the information you requested:
```json
{
    "name": "John Smith",
    "age": 42,
    "address": {
        "street": "123 Main St",
        "city": "Anytown"
    }
}
Is there anything else you need? """ result = PydanticParser.parse_llm_output(text, Person)
Sources: [tests/llm/parser/test_pydantic_parser.py:66-83]()

### JSON Parser

The `JSONParser` parses the LLM output based on the provided JSON schema. It ensures that the output conforms to the schema's structure and data types. [src/starfish/llm/parser/json_parser.py:63-72]()

## Format Instructions

Both `PydanticParser` and `JSONParser` provide methods to generate format instructions for LLMs. These instructions guide the LLM to produce output that is compatible with the specified schema.

### Pydantic Format Instructions

The `PydanticParser.get_format_instructions` method generates instructions based on the Pydantic model's fields and descriptions. [src/starfish/llm/parser/pydantic_parser.py:86-100]()

### JSON Format Instructions

The `JSONParser.get_format_instructions` method generates instructions based on the JSON schema, including field names, types, and descriptions. [src/starfish/llm/parser/json_parser.py:75-104]()

```python
fields = [{"name": "name", "type": "str", "description": "Person's name"}, {"name": "age", "type": "int", "description": "Person's age"}]
schema = JSONParser.convert_to_schema(fields)

instructions = JSONParser.get_format_instructions(schema)
Sources: [tests/llm/parser/test_json_parser.py:25-32]

Example Usage

Here’s an example of how to use StructuredLLM with a Pydantic model:
from starfish import StructuredLLM
from pydantic import BaseModel

class QnASchema(BaseModel):
    question: str
    answer: str

qna_llm = StructuredLLM(
    model_name="openai/gpt-4o-mini",
    prompt="Generate facts about {{city}}",
    output_schema=QnASchema
)

response = await qna_llm.run(city="San Francisco")
print(response.data)
Sources: examples/structured_llm.ipynb()

Conclusion

Structured Outputs in Starfish provide a robust and flexible way to generate synthetic data using LLMs. By defining output schemas with Pydantic models or JSON schemas, developers can ensure that the generated data conforms to specific formats and constraints, enhancing the reliability and usability of the data.