local_data_lake
mainsequence.client.data_sources_interfaces.local_data_lake
DataLakeInterface
build_time_and_symbol_filter(start_date=None, great_or_equal=True, less_or_equal=True, end_date=None, unique_identifier_list=None, unique_identifier_range_map=None)
staticmethod
Build hashable parquet filters based on the parameters.
Args:
metadata (dict): Metadata dictionary, not used for filtering here but included for extensibility.
start_date (datetime.datetime, optional): Start date for filtering.
great_or_equal (bool): Whether the start date condition is >=
or >
.
less_or_equal (bool): Whether the end date condition is <=
or <
.
end_date (datetime.datetime, optional): End date for filtering.
asset_symbols (list, optional): List of asset symbols to filter on.
Returns: tuple: Hashable parquet filters for use with pandas or pyarrow.
filter_by_assets_ranges(table_name, asset_ranges_map)
:param table_name: :param asset_ranges_map: :return:
persist_datalake(data, overwrite, table_name, time_index_name, index_names)
Partition per week , do not partition per asset_symbol as system only allows 1024 partittions Args: data:
Returns:
query_datalake(table_name, filters=None)
Queries the data lake for time series data.
If the table_hash is in nodes_to_persist, it retrieves or creates the data. Otherwise, it updates the series from the source.
Args: ts: The time series object. latest_value: The latest timestamp to query from. symbol_list: List of symbols to retrieve data for. great_or_equal: Boolean flag for date comparison. update_tree_kwargs: Dictionary of kwargs for updating the tree.
Returns: pd.DataFrame: The queried data.
memory_usage_exceeds_limit(max_usage_percentage)
Checks if the current memory usage exceeds the given percentage of total memory.
read_full_data(file_path, filters=None, use_s3_if_available=False, max_memory_usage=80)
cached
Cached access to static datalake file