Skip to content

Orchestration & Monitoring

Once your time series pipelines are built, TDAG offers multiple modes for executing and monitoring their updates efficiently. These modes support local development, debugging, and scalable production deployments.

Execution Modes

  1. Local Storage Mode We can run our pipeline locally using a local database. This mode is ideal for fast prototyping or parameter sweeps (e.g., hyperparameter tuning). It is faster than the other modes as it does not perform costly remote database writes.
  2. Debug Mode: We run our pipelines for one loop as a single process, persisting and reading from our remote database. This is helpful for debugging and development before moving to production.
  3. Live Mode: We run our pipelines as a separate distributed process via a Ray cluster. This mode is designed for production use.

Running Time Series in Local Storage Mode

For quick local development and testing of a new time series we can use the local storage mode to run the time serie using

SessionDataSource.set_local_db()
time_series = CryptoPortfolioTimeSerie()
result = time_series.run()

A classic use-case is to see how a strategy performs with different parameters by running it in a loop. Here we have a Long Short portfolio and we want to observe the hyperspace of portfolios generated by several combinations of parameters. In this case, we don’t want to make time-consuming writes of the resulting data to the remote database. Instead, we work with a local database directly on the host.

The host database is a database automatically created on your computer. On the platform, you can see your local database with the name DUCK_DB_<HOST_IDENTIFIER>. The name is printed when you set the local database in your script and it will stay the same if executed on the same device. The LocalTimeSeries object is still created and accessible on the Platform, and can be identified by the host database name. You will however not be able to see the database table on the LocalTimeSeries page.

Let’s look at a code example to understand it better.

SessionDataSource.set_local_db()
total_return = []
for rolling_window in range(60, 30 * 24, 20):
    for lasso_alpha in [1, 1e-2, 1e-3, 1e-4, 1e-5]:
        long_short_portfolio = LongShortPortfolio(
           ticker_long="XLF", 
           ticker_short="XLE",
           long_rolling_windows=[long_rollling_window],
           short_rolling_windows=[100, 200], 
           lasso_alpha=1e-2
        )
        portfolio_df = long_short_portfolio.run()
        total_return.append(long_short_portfolio["portfolio"].iloc[-1] - 1)

Running Time Series in Live/Debug mode

When we want to move our time series to production, we can execute backend system so it can be distributed and the data stored in the shared database for reusability. This is done using the .run() method.

time_series = CryptoPortfolioTimeSerie()
time_series.run(debug_mode=False)

We can use additional parameters to specify how the timeseries should run.

  • debug_mode: Setting this to True runs the Pipeline in Debug Mode, otherwise in Live Mode.
  • update_tree: A boolean variable whether to update all the dependencies of the time series or only the called time series. This is helpful if this time series has many dependencies and we are only interested in the final time serie.
  • update_only_tree: A boolean variable whether to update only the dependencies of the time series.
  • remote_scheduler: An optional custom scheduler to run the time series. If no remote_scheduler is provided, a default scheduler is created automatically.
  • force_update: A scheduler manages at which times to run the time series. This boolean variable is used to ignore the scheduler.

For example, to run this time series immediately in debug mode and only update the called time series, we can use:

time_series = CryptoPortfolioTimeSerie()
time_series.run(debug_mode=True, update_tree=False, force_update=True)