Telemetry
Welcome to Starfishdata.ai Telemetry
Related Pages
Related topics: Configuration
The following files were used as context for generating this wiki page:
- src/starfish/telemetry/init.py
- src/starfish/telemetry/posthog_client.py
- src/starfish/data_factory/factory_.py
- src/starfish/data_factory/utils/data_class.py
- src/starfish/common/env_loader.py
- src/starfish/llm/prompt/prompt_template.py
Telemetry
Telemetry within Starfish is designed to collect minimal and anonymous data to help improve the library. It provides insights into the usage and performance of different features, aiding in identifying areas for optimization and bug fixing. Participation is optional, and users can opt-out via an environment variable.
The telemetry system uses posthog
to send events. It collects data related to job execution, platform, and configuration to provide insights into the library’s usage and performance. src/starfish/telemetry/posthog_client.py, src/starfish/data_factory/factory_.py
Telemetry Configuration
The telemetry system is configured using environment variables and a configuration file stored in the application data directory. src/starfish/telemetry/posthog_client.py
Configuration Parameters
The AnalyticsConfig
dataclass holds the configuration parameters for the telemetry service. src/starfish/telemetry/posthog_client.py
Parameter | Type | Description | Source |
---|---|---|---|
api_key | str | The API key for the analytics service (Posthog). | src/starfish/telemetry/posthog_client.py:41 |
active | bool | Flag to enable or disable telemetry. Defaults to True . | src/starfish/telemetry/posthog_client.py:42 |
verbose | bool | Flag for verbose logging. Defaults to False . | src/starfish/telemetry/posthog_client.py:43 |
endpoint | Optional[str] | Optional custom endpoint for the analytics service. | src/starfish/telemetry/posthog_client.py:44 |
Opting Out
Users can disable telemetry by setting the TELEMETRY_ENABLED
environment variable to false
. README.md
Sources: README.md, src/starfish/common/env_loader.py
Telemetry Data Collection
The telemetry system collects data related to data factory jobs and sends it to the analytics service. src/starfish/data_factory/factory_.py
Telemetry Events
Telemetry events are represented by the Event
dataclass, which includes the event name, data, and a unique client ID. src/starfish/telemetry/posthog_client.py
Sources: src/starfish/telemetry/posthog_client.py
Data Factory Telemetry
The DataFactory
class in src/starfish/data_factory/factory_.py sends telemetry events at the end of a job. This includes information about the job configuration, execution environment, and outcome. src/starfish/data_factory/factory_.py
The TelemetryData
dataclass is used to structure the data sent with the telemetry event. src/starfish/data_factory/utils/data_class.py
Sources: src/starfish/data_factory/utils/data_class.py
Telemetry Data Attributes
Attribute | Type | Description | Source |
---|---|---|---|
job_id | str | Identifier for the job. | src/starfish/data_factory/utils/data_class.py |
target_reached | bool | Whether the target count was achieved. | src/starfish/data_factory/utils/data_class.py |
run_mode | str | Execution mode of the job. | src/starfish/data_factory/utils/data_class.py |
num_inputs | int | Number of input records processed. | src/starfish/data_factory/utils/data_class.py |
library_version | str | Version of the processing library. | src/starfish/data_factory/utils/data_class.py |
config | dict | Configuration parameters for the job. | src/starfish/data_factory/utils/data_class.py |
error_summary | dict | Summary of errors encountered during the job. | src/starfish/data_factory/utils/data_class.py |
count_summary | dict | Summary of record counts (completed, failed, filtered). | src/starfish/data_factory/utils/data_class.py, src/starfish/data_factory/factory_.py |
run_time_platform | str | The platform on which the job is run. | src/starfish/data_factory/utils/data_class.py, src/starfish/data_factory/factory_.py |
Sending Telemetry
The _send_telemetry_event
method in the DataFactory
class is responsible for sending the telemetry data to the analytics service. src/starfish/data_factory/factory_.py
Sources: src/starfish/data_factory/factory_.py
Analytics Service
The AnalyticsService
class handles the communication with the analytics backend (Posthog). src/starfish/telemetry/posthog_client.py
Service Setup
The _setup_client
method configures the Posthog client. It checks if telemetry is active and initializes the client with the API key and endpoint. If telemetry is disabled, it uses a NoOpPosthog
client. src/starfish/telemetry/posthog_client.py
Capturing Events
The capture_event
method sends an event to the analytics service. It ensures that the event data includes the client ID. src/starfish/telemetry/posthog_client.py
Sources: src/starfish/telemetry/posthog_client.py
Client ID Generation
The TelemetryConfig
class is responsible for generating and retrieving a unique client identifier. The identifier is stored in a file in the application data directory. src/starfish/telemetry/posthog_client.py
Sources: src/starfish/telemetry/posthog_client.py
Conclusion
The telemetry system in Starfish provides valuable insights into the library’s usage and performance, aiding in continuous improvement. It is designed to be minimally invasive and respects user privacy by allowing users to opt-out. The data collected helps the developers understand how the library is being used and identify areas for optimization.