Welcome to Starfishdata.ai Storage Layer
LocalStorage
class that interacts with metadata and data handlers. The metadata handler manages the persistence of project, job, and record metadata using SQLite. The data handler is responsible for storing and retrieving data artifacts, such as request configurations and record data, using JSON files. Sources: src/starfish/data_factory/storage/local/local_storage.py, tests/data_factory/storage/README.md
The LocalStorage
orchestrates interactions between the metadata and data handlers, providing a unified interface for the rest of the system. Sources: src/starfish/data_factory/storage/local/local_storage.py
LocalStorage
is the main class that implements the storage layer functionality. It provides methods for setting up the storage, saving and retrieving projects, logging master and execution jobs, and managing record data. It uses MetadataHandler
and DataHandler
for specific tasks. Sources: src/starfish/data_factory/storage/local/local_storage.py
Key features of LocalStorage
:
LocalStorage
, metadata handler, and data handler.
save_project
method in LocalStorage
. The LocalStorage
then delegates the task to the metadata handler, which interacts with the SQLite database to persist the project metadata. Sources: src/starfish/data_factory/storage/local/local_storage.py:104-105, src/starfish/data_factory/storage/local/local_storage.py:107-108LocalStorage
uses the metadata handler to log the job’s start information in the SQLite database. Similarly, when a job ends, the metadata handler updates the job’s status and summary information. Sources: src/starfish/data_factory/storage/local/local_storage.py:113-114, src/starfish/data_factory/storage/local/local_storage.py:116-118LocalStorage
calls the data handler’s save_record_data
method. The data handler then writes the record data to a JSON file and returns a reference to the file. Sources: src/starfish/data_factory/storage/local/local_storage.py:98-100LocalStorage
uses the appropriate handler to retrieve the data from either the SQLite database (for metadata) or the JSON files (for data artifacts). Sources: src/starfish/data_factory/storage/local/local_storage.py:109-110, src/starfish/data_factory/storage/local/local_storage.py:101-102LocalStorage
class provides the following key API endpoints:
Here are the key API endpoints for the LocalStorage
class, grouped by functionality:
setup()
: Initializes metadata DB schema and base file directoriesclose()
: Closes underlying connections/resourcessave_request_config(config_ref: str, config_data: Dict[str, Any]) -> str
: Saves request configurationget_request_config(config_ref: str) -> Dict[str, Any]
: Retrieves request configurationgenerate_request_config_path(master_job_id: str) -> str
: Generates path for request configsave_record_data(record_uid: str, master_job_id: str, job_id: str, data: Dict[str, Any]) -> str
: Saves record dataget_record_data(output_ref: str) -> Dict[str, Any]
: Retrieves record datasave_project(project_data: Project)
: Saves project metadataget_project(project_id: str) -> Optional[Project]
: Retrieves project metadatalist_projects(limit: Optional[int], offset: Optional[int]) -> List[Project]
: Lists projectslog_master_job_start(job_data: GenerationMasterJob)
: Logs master job startlog_master_job_end(master_job_id: str, final_status: str, summary: Optional[Dict[str, Any]], end_time: datetime, update_time: datetime)
: Logs master job endupdate_master_job_status(master_job_id: str, status: str, update_time: datetime)
: Updates master job statusget_master_job(master_job_id: str) -> Optional[GenerationMasterJob]
: Retrieves master joblist_master_jobs(project_id: Optional[str], status_filter: Optional[List[str]], limit: Optional[int], offset: Optional[int]) -> List[GenerationMasterJob]
: Lists master jobslog_execution_job_start(job_data: GenerationJob)
: Logs execution job startlog_execution_job_end(job_id: str, final_status: str, counts: Dict[str, int], end_time: datetime, update_time: datetime, error_message: Optional[str])
: Logs execution job endget_execution_job(job_id: str) -> Optional[GenerationJob]
: Retrieves execution joblist_execution_jobs(master_job_id: str, status_filter: Optional[List[str]], limit: Optional[int], offset: Optional[int]) -> List[GenerationJob]
: Lists execution jobslog_record_metadata(record_data: Record)
: Logs record metadataget_record_metadata(record_uid: str) -> Optional[Record]
: Retrieves record metadataget_records_for_master_job(master_job_id: str, status_filter: Optional[List[StatusRecord]], limit: Optional[int], offset: Optional[int]) -> List[Record]
: Gets records for master jobcount_records_for_master_job(master_job_id: str, status_filter: Optional[List[StatusRecord]]) -> Dict[str, int]
: Counts records for master joblist_record_metadata(master_job_uuid: str, job_uuid: str) -> List[Record]
: Lists record metadata