Welcome to Starfishdata.ai Data Generation Templates
data_factory
to create scalable data pipelines. src/starfish/data_template/templates/starfish/math_problem_gen_wf.py, src/starfish/data_template/template_gen.py
data_gen_template
object acts as a registry for data generation templates. It allows templates to be registered, listed, and retrieved. src/starfish/data_template/template_gen.py
list()
method of the data_gen_template
object returns a list of available templates. The templates are identified by a name, which typically follows the format subfolder_name/template_name
. src/starfish/data_template/template_gen.py
@data_gen_template.register
decorator is used to register a function as a data generation template. This decorator takes several arguments, including the name of the template, input schema, output schema, description, author, Starfish version, and dependencies. src/starfish/data_template/template_gen.py, src/starfish/data_template/templates/community/topic_generator.py:20-26
community/topic_generator
. src/starfish/data_template/templates/community/topic_generator.py:20-32
get()
method of the data_gen_template
object retrieves a registered template by its name. This method returns the registered function, which can then be executed with appropriate input data. src/starfish/data_template/template_gen.py
community/topic_generator
template and executes it. src/starfish/data_template/examples.py:25-27
data_factory
decorator. This decorator allows the template function to be executed in parallel across multiple inputs, enabling scalable data generation. src/starfish/data_template/templates/starfish/get_city_info_wf.py, src/starfish/data_template/templates/community/topic_generator.py:33
data_factory
decorator to create a parallel data processing function. src/starfish/data_template/templates/community/topic_generator.py:35-37
This diagram illustrates how data generation templates are integrated into data generation workflows using data_factory
. src/starfish/data_template/templates/community/topic_generator.py, src/starfish/data_template/templates/starfish/get_city_info_wf.py
starfish
repository includes several example data generation templates, including:
community/topic_generator
: Generates relevant topics for community discussions using AI models. src/starfish/data_template/templates/community/topic_generator.pystarfish/math_problem_gen_wf
: Generates math problem-solution pairs. src/starfish/data_template/templates/starfish/math_problem_gen_wf.pystarfish/get_city_info_wf
: Retrieves information about cities. src/starfish/data_template/templates/starfish/get_city_info_wf.pycommunity/topic_generator_success
: Generates relevant topics for community discussions using AI models. src/starfish/data_template/templates/community/topic_generator_success.pydata_factory
decorator for workflow integration, developers can easily create and execute complex data generation processes. The template registry allows for easy discovery and reuse of templates, promoting code reuse and simplifying the development of data generation pipelines. src/starfish/data_template/template_gen.py