Skip to content

local_data_lake

mainsequence.client.data_sources_interfaces.local_data_lake

DataLakeInterface

build_time_and_symbol_filter(start_date=None, great_or_equal=True, less_or_equal=True, end_date=None, unique_identifier_list=None, unique_identifier_range_map=None) staticmethod

Build hashable parquet filters based on the parameters.

Args: metadata (dict): Metadata dictionary, not used for filtering here but included for extensibility. start_date (datetime.datetime, optional): Start date for filtering. great_or_equal (bool): Whether the start date condition is >= or >. less_or_equal (bool): Whether the end date condition is <= or <. end_date (datetime.datetime, optional): End date for filtering. asset_symbols (list, optional): List of asset symbols to filter on.

Returns: tuple: Hashable parquet filters for use with pandas or pyarrow.

filter_by_assets_ranges(table_name, asset_ranges_map)

:param table_name: :param asset_ranges_map: :return:

persist_datalake(data, overwrite, table_name, time_index_name, index_names)

Partition per week , do not partition per asset_symbol as system only allows 1024 partittions Args: data:

Returns:

query_datalake(table_name, filters=None)

Queries the data lake for time series data.

If the table_hash is in nodes_to_persist, it retrieves or creates the data. Otherwise, it updates the series from the source.

Args: ts: The time series object. latest_value: The latest timestamp to query from. symbol_list: List of symbols to retrieve data for. great_or_equal: Boolean flag for date comparison. update_tree_kwargs: Dictionary of kwargs for updating the tree.

Returns: pd.DataFrame: The queried data.

memory_usage_exceeds_limit(max_usage_percentage)

Checks if the current memory usage exceeds the given percentage of total memory.

read_full_data(file_path, filters=None, use_s3_if_available=False, max_memory_usage=80) cached

Cached access to static datalake file