Skip to content

persist_managers

mainsequence.tdag.time_series.persist_managers

DataLakePersistManager

Bases: PersistManager

A class to manage data persistence in a local data lake.

This class handles the storage and retrieval of time series data in a local file system, organized by date ranges and table hashes.

__init__(*args, **kwargs)

Initializes the DataLakePersistManager with configuration from environment variables.

set_already_run(already_run)

This methos is critical as it control the level of introspection and avouids recursivity This happens for example when TimeSeries.update(,): TimeSeries.update(latest_value,,**): self.get_update_statistics() <- will incurr in a circular refefence using local data late Args: introspection:

Returns:

verify_if_already_run(ts)

This method handles all the configuration and setup necessary when running a detached local data lake :param ts: :return:

PersistManager

build_update_details(source_class_name)
depends_on_connect(new_ts, is_api)

Connects a time Serie as relationship in the DB

Parameters:

Name Type Description Default
new_ts TimeSerie
required
get_persisted_ts()

full Request of the persisted data should always default to DB :return:

local_persist_exist_set_config(remote_table_hashed_name, local_configuration, remote_configuration, data_source, time_serie_source_code_git_hash, time_serie_source_code, remote_build_metadata)

This method runs on initialization of the TimeSerie class. We also use it to retrieve the table if is already persisted :param config:

:return:

patch_build_configuration(local_configuration, remote_configuration, remote_build_metadata)

Args: local_configuration: remote_configuration:

Returns:

patch_update_details(local_hash_id=None, **kwargs)

Patch update details ofr related_table

Parameters:

Name Type Description Default
hash_id
required
kwargs
{}
persist_updated_data(temp_df, historical_update_id, update_tracker=None, overwrite=False)

Main update time series function, it is called from TimeSeries class

Parameters:

Name Type Description Default
temp_df DataFrame
required
latest_value
required
session
required
synchronize_metadata(meta_data, local_metadata, set_last_index_value=False, class_name=None)

forces a synchronization between table and metadata :return:

update_source_informmation(git_hash_id, source_code)

Args: git_hash_id: source_code:

Returns:

TimeScaleLocalPersistManager

Bases: PersistManager

Main Controler to interacti with TimeSerie ORM

get_full_source_data(remote_table_hash_id, engine='pandas')

Returns full stored data, uses multiprocessing to achieve several queries by rows and speed :return: