Related Pages
Related topics: Installation, TelemetryConfiguration
Configuration within the Starfish project involves setting up the environment and managing various parameters that control the behavior of the application. This includes setting API keys, model configurations, and other runtime parameters. The project uses environment variables for configuration, providing flexibility and ease of setup. The configuration also extends to the storage layer, which is responsible for persisting metadata and data artifacts. This page outlines the different aspects of configuration within the Starfish project.Environment Variables
The Starfish project utilizes environment variables for configuration. A.env.template
file is provided to help users get started quickly. This file includes settings for API keys, model configurations, and other runtime parameters. Users are expected to copy the template to .env
and edit it with their specific configurations. cp .env.template .env
, nano .env
Sources: README.md
Setting Up Environment Variables
To configure the Starfish project, follow these steps:-
Copy the
.env.template
file to.env
:Sources: README.md -
Edit the
.env
file with your preferred editor to set the necessary API keys and configurations:Sources: README.md
Telemetry Configuration
Starfish collects minimal and anonymous telemetry data to help improve the library. Participation is optional, and users can opt out by settingTELEMETRY_ENABLED=false
in their environment variables. Sources: README.md
Storage Layer Configuration
The storage layer is responsible for persisting metadata and data artifacts for synthetic data generation jobs. It provides a pluggable interface for different storage backends and a hybrid local implementation using SQLite for metadata and JSON files for data. Sources: tests/data_factory/storage/README.mdLocal Storage Configuration
The local storage implementation uses SQLite for metadata and JSON files for data artifacts. The tests use separate test databases (by default in/tmp/starfish_test_*
directories) to avoid interfering with production data. Sources: tests/data_factory/storage/README.md
Setting Up Local Storage
TheLocalStorage
class in src/starfish/data_factory/storage/local/local_storage.py
handles the local storage implementation. The setup
method creates the necessary directories and database. Sources: tests/data_factory/storage/local/test_local_storage.py
Configuration Paths
The local storage implementation uses the following directory structure:- Configs:
{storage_uri}/configs/{master_job_id}.request.json
- Record Data:
{storage_uri}/data/{record_uid[:2]}/{record_uid[2:4]}/{record_uid}.json
Data Handler
TheFileSystemDataHandler
class in src/starfish/data_factory/storage/local/data_handler.py
manages interactions with data and config files on the local filesystem. It ensures that all top-level data directories exist. Sources: src/starfish/data_factory/storage/local/data_handler.py
Data Handler Directories
Directory | Description |
---|---|
CONFIGS_DIR | Directory where request configuration files are stored. |
DATA_DIR | Directory where record data files are stored. |
ASSOCIATIONS_DIR | Directory where associations files are stored (currently not in use). |
Configuration Workflow
The following diagram illustrates the configuration workflow for the storage layer: Sources: tests/data_factory/storage/local/test_local_storage.py, tests/data_factory/storage/test_storage_main.pyTest Configuration
The tests use a specific configuration for the storage layer. TheTEST_DB_DIR
and TEST_DB_URI
variables define the location of the test database. The TEST_MODE
variable determines whether to run a basic or full test. Sources: tests/data_factory/storage/test_storage_main.py