Configuration
Welcome to Starfishdata.ai Configuration
Related Pages
Related topics: Installation, Telemetry
Configuration
Configuration within the Starfish project involves setting up the environment and managing various parameters that control the behavior of the application. This includes setting API keys, model configurations, and other runtime parameters. The project uses environment variables for configuration, providing flexibility and ease of setup. The configuration also extends to the storage layer, which is responsible for persisting metadata and data artifacts. This page outlines the different aspects of configuration within the Starfish project.
Environment Variables
The Starfish project utilizes environment variables for configuration. A .env.template
file is provided to help users get started quickly. This file includes settings for API keys, model configurations, and other runtime parameters. Users are expected to copy the template to .env
and edit it with their specific configurations. cp .env.template .env
, nano .env
Sources: README.md
Setting Up Environment Variables
To configure the Starfish project, follow these steps:
-
Copy the
.env.template
file to.env
:Sources: README.md
-
Edit the
.env
file with your preferred editor to set the necessary API keys and configurations:Sources: README.md
Telemetry Configuration
Starfish collects minimal and anonymous telemetry data to help improve the library. Participation is optional, and users can opt out by setting TELEMETRY_ENABLED=false
in their environment variables. Sources: README.md
Storage Layer Configuration
The storage layer is responsible for persisting metadata and data artifacts for synthetic data generation jobs. It provides a pluggable interface for different storage backends and a hybrid local implementation using SQLite for metadata and JSON files for data. Sources: tests/data_factory/storage/README.md
Local Storage Configuration
The local storage implementation uses SQLite for metadata and JSON files for data artifacts. The tests use separate test databases (by default in /tmp/starfish_test_*
directories) to avoid interfering with production data. Sources: tests/data_factory/storage/README.md
Setting Up Local Storage
The LocalStorage
class in src/starfish/data_factory/storage/local/local_storage.py
handles the local storage implementation. The setup
method creates the necessary directories and database. Sources: tests/data_factory/storage/local/test_local_storage.py
Sources: tests/data_factory/storage/local/test_local_storage.py:17-35
Configuration Paths
The local storage implementation uses the following directory structure:
- Configs:
{storage_uri}/configs/{master_job_id}.request.json
- Record Data:
{storage_uri}/data/{record_uid[:2]}/{record_uid[2:4]}/{record_uid}.json
Sources: src/starfish/data_factory/storage/models.py
Data Handler
The FileSystemDataHandler
class in src/starfish/data_factory/storage/local/data_handler.py
manages interactions with data and config files on the local filesystem. It ensures that all top-level data directories exist. Sources: src/starfish/data_factory/storage/local/data_handler.py
Sources: src/starfish/data_factory/storage/local/data_handler.py:14-48
Data Handler Directories
Directory | Description |
---|---|
CONFIGS_DIR | Directory where request configuration files are stored. |
DATA_DIR | Directory where record data files are stored. |
ASSOCIATIONS_DIR | Directory where associations files are stored (currently not in use). |
Sources: src/starfish/data_factory/storage/local/data_handler.py
Configuration Workflow
The following diagram illustrates the configuration workflow for the storage layer:
Sources: tests/data_factory/storage/local/test_local_storage.py, tests/data_factory/storage/test_storage_main.py
Test Configuration
The tests use a specific configuration for the storage layer. The TEST_DB_DIR
and TEST_DB_URI
variables define the location of the test database. The TEST_MODE
variable determines whether to run a basic or full test. Sources: tests/data_factory/storage/test_storage_main.py
Sources: tests/data_factory/storage/test_storage_main.py:18-23
Conclusion
Configuration in the Starfish project involves setting up environment variables and configuring the storage layer. The environment variables are used to set API keys, model configurations, and other runtime parameters. The storage layer is responsible for persisting metadata and data artifacts for synthetic data generation jobs. The local storage implementation uses SQLite for metadata and JSON files for data artifacts. The configuration workflow involves creating a project, master job, and execution job, saving the necessary data, and completing the jobs.