persist_managers
mainsequence.tdag.time_series.persist_managers
DataLakePersistManager
Bases: PersistManager
A class to manage data persistence in a local data lake.
This class handles the storage and retrieval of time series data in a local file system, organized by date ranges and table hashes.
__init__(*args, **kwargs)
Initializes the DataLakePersistManager with configuration from environment variables.
set_already_run(already_run)
This methos is critical as it control the level of introspection and avouids recursivity This happens for example when TimeSeries.update(,): TimeSeries.update(latest_value,,**): self.get_update_statistics() <- will incurr in a circular refefence using local data late Args: introspection:
Returns:
verify_if_already_run(ts)
This method handles all the configuration and setup necessary when running a detached local data lake :param ts: :return:
PersistManager
build_update_details(source_class_name)
depends_on_connect(new_ts, is_api)
Connects a time Serie as relationship in the DB
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_ts
|
TimeSerie
|
|
required |
get_persisted_ts()
full Request of the persisted data should always default to DB :return:
local_persist_exist_set_config(remote_table_hashed_name, local_configuration, remote_configuration, data_source, time_serie_source_code_git_hash, time_serie_source_code, remote_build_metadata)
This method runs on initialization of the TimeSerie class. We also use it to retrieve the table if is already persisted :param config:
:return:
patch_build_configuration(local_configuration, remote_configuration, remote_build_metadata)
Args: local_configuration: remote_configuration:
Returns:
patch_update_details(local_hash_id=None, **kwargs)
Patch update details ofr related_table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hash_id
|
|
required | |
kwargs
|
|
{}
|
persist_updated_data(temp_df, historical_update_id, update_tracker=None, overwrite=False)
Main update time series function, it is called from TimeSeries class
Parameters:
Name | Type | Description | Default |
---|---|---|---|
temp_df
|
DataFrame
|
|
required |
latest_value
|
|
required | |
session
|
|
required |
synchronize_metadata(meta_data, local_metadata, set_last_index_value=False, class_name=None)
forces a synchronization between table and metadata :return:
update_source_informmation(git_hash_id, source_code)
Args: git_hash_id: source_code:
Returns:
TimeScaleLocalPersistManager
Bases: PersistManager
Main Controler to interacti with TimeSerie ORM
get_full_source_data(remote_table_hash_id, engine='pandas')
Returns full stored data, uses multiprocessing to achieve several queries by rows and speed :return: