Welcome to Starfishdata.ai Easy Scaling
data_factory
decorator. This allows any function to be transformed into a scalable data pipeline, enabling parallel processing across multiple inputs. The scaling is achieved through concurrent workers, making it suitable for both experimentation and production environments. README.md
The data_factory
decorator simplifies the process of parallelizing data generation tasks. It handles the complexities of concurrency, error handling, and job resumption, allowing developers to focus on the core logic of their data generation functions. This approach supports various LLM providers and dynamic prompts, making it a flexible solution for diverse data generation needs. README.md
@data_factory
decorator is central to easy scaling in Starfish. It transforms a regular function into a parallel processing pipeline. Sources: src/starfish/data_factory/factory.py:15-20
max_concurrency
parameter controls the number of concurrent workers. Sources: src/starfish/data_factory/factory.py:43-46@data_factory
, input data is processed in parallel, and results are obtained after post-processing. Sources: examples/data_factory.ipynb
data_factory
decorator can be applied to any function, regardless of its complexity. This includes simple functions and complex workflows involving pre-processing, multiple LLM calls, post-processing, and error handling. Sources: examples/data_factory.ipynb
data_factory
decorator to scale a question-answering workflow across multiple cities. Sources: examples/data_factory.ipynb
@data_factory
, specifying the desired concurrency level. Sources: src/starfish/data_factory/factory.py:15-20data_factory
provides functionality to resume jobs from where they left off. This is useful for long-running data generation tasks that may be interrupted. Sources: README.md
resume()
method. Sources: examples/data_factory.ipynb
data_factory
allows users to define hooks that modify the state of the workflow during runtime. These hooks can be used to implement custom logic for error handling, data validation, or other tasks. Sources: tests/data_factory/factory/test_run.py
test_hook
modifies the state of the workflow by updating the variable
key with a new value. Sources: tests/data_factory/factory/test_run.py
LocalStorage
class provides methods for saving and retrieving data artifacts, such as request configurations and record data. This class is essential for managing the data generated during the scaling process. Sources: src/starfish/data_factory/storage/local/local_storage.py
LocalStorage
class implements methods for saving and retrieving data artifacts.
.env.template
file is provided to help users get started. Sources: README.md
data_factory
decorator, simplifies the creation and execution of parallel data generation pipelines. It provides features like concurrency management, automatic retries, error handling, and job resumption, making it a powerful tool for both experimentation and production environments.