Matching multi-stage measures in pre-aggregations

Use case

We use multi-stage measures — measures that aggregate over already-aggregated data — for things like year-over-year comparisons, period-over-period comparisons, percent of total, ranks, and nested aggregates. We want those queries to be accelerated by pre-aggregations instead of falling back to the upstream data source. Multi-stage measures don’t slot into rollups the same way as plain additive measures: the values themselves cannot be re-aggregated from a rollup. Cube matches them by including the underlying base measures in the rollup and recomputing the multi-stage CTEs on top of the pre-aggregation at query time. This recipe shows the rules for building a rollup that matches multi-stage measures, plus patterns for the most common cases.

Multi-stage measures are powered by Tesseract, the next-generation data modeling engine. Tesseract is currently in preview. Set CUBEJS_TESSERACT_SQL_PLANNER to true to enable it.

How matching works

When a query references a multi-stage measure, Cube looks for a rollup that satisfies three conditions at once:

Base measures are included. Every measure referenced inside the multi-stage sql expression (directly or transitively) must be a member of the rollup. The multi-stage measure itself is not included in measures — only its building blocks.
All driving dimensions are included. Anything the multi-stage parameter partitions by must be present in the rollup’s dimensions:
- For group_by / add_group_by: every listed dimension.
- For reduce_by: every listed dimension, plus the dimensions the query groups by.
- For time_shift: the shifted time_dimension and a granularity at least as fine as the shift interval and the query’s time granularity.
The base measures stay additive. The rollup can only be used if the inner aggregation can be recomputed from the rollup rows. Use additive types (count, sum, min, max, count_distinct_approx) for the base. Avoid avg and count_distinct as bases — decompose them into sum + count and count_distinct_approx respectively, the same way as for non-additive measures.

If any of these are missing, the rollup is skipped for the multi-stage query and Cube either picks another pre-aggregation or falls back to the data source.

Pattern 1: time shift (year-over-year)

A time-shift measure compares the current period against a shifted one:

cubes:
  - name: orders
    sql_table: orders

    measures:
      - name: revenue
        sql: amount
        type: sum

      - name: revenue_prior_year
        multi_stage: true
        sql: "{revenue}"
        type: number
        time_shift:
          - time_dimension: created_at
            interval: 1 year
            type: prior

    dimensions:
      - name: status
        sql: status
        type: string

      - name: created_at
        sql: created_at
        type: time

    pre_aggregations:
      - name: revenue_by_status
        # Only the *base* measure goes in.
        measures:
          - revenue
        dimensions:
          - status
        # Shift dimension must be the time_dimension of the rollup,
        # and its granularity must be at least as fine as the shift interval
        # and the query's requested granularity.
        time_dimension: created_at
        granularity: day
        partition_granularity: month
        build_range_start:
          sql: "SELECT date_trunc('year', CURRENT_DATE - INTERVAL '2 year')"
        build_range_end:
          sql: "SELECT CURRENT_DATE"

A query like:

{
  "measures": ["orders.revenue", "orders.revenue_prior_year"],
  "dimensions": ["orders.status"],
  "timeDimensions": [
    { "dimension": "orders.created_at", "granularity": "month",
      "dateRange": ["2024-01-01", "2024-12-31"] }
  ]
}

matches revenue_by_status because:

the base revenue is in the rollup,
status is in the rollup, and
the rollup’s created_at granularity (day) is finer than the requested month.

Pattern 2: percent of total with `group_by`

Percent-of-total uses group_by to fix the inner aggregation to specific dimensions:

measures:
  - name: revenue
    sql: amount
    type: sum

  - name: country_revenue
    multi_stage: true
    sql: "{revenue}"
    type: sum
    group_by:
      - country

  - name: country_revenue_percentage
    multi_stage: true
    sql: "{revenue} / NULLIF({country_revenue}, 0)"
    type: number

pre_aggregations:
  - name: revenue_by_country_and_product
    measures:
      - revenue
    # Must include every dimension referenced in any group_by used by the
    # query's multi-stage measures, PLUS every dimension the query groups by.
    dimensions:
      - country
      - product
    time_dimension: created_at
    granularity: day
    partition_granularity: month
    build_range_start:
      sql: "SELECT date_trunc('year', CURRENT_DATE - INTERVAL '1 year')"
    build_range_end:
      sql: "SELECT CURRENT_DATE"

The rollup carries revenue grouped by country and product. Cube re-aggregates it to country for the inner CTE, then divides at the outer stage to produce the percentage. If the rollup omitted country, the inner group_by: [country] could not be computed and the match would fail.

Pattern 3: nested aggregates with `add_group_by`

add_group_by computes an aggregate of an aggregate (e.g., the average of per-customer averages). The rollup must include the add_group_by dimension:

measures:
  - name: order_total
    sql: amount
    type: sum

  - name: avg_customer_total
    multi_stage: true
    sql: "{order_total}"
    type: avg
    add_group_by:
      - customer_id

pre_aggregations:
  - name: orders_by_customer
    measures:
      - order_total
    dimensions:
      - customer_id
      - region
    time_dimension: created_at
    granularity: day
    partition_granularity: month
    build_range_start:
      sql: "SELECT date_trunc('year', CURRENT_DATE - INTERVAL '1 year')"
    build_range_end:
      sql: "SELECT CURRENT_DATE"

A query grouping by region and selecting avg_customer_total matches: Cube computes SUM(amount) per (region, customer_id) from the rollup, then averages over customer_id per region in the outer CTE.

Pattern 4: ranking with `reduce_by`

reduce_by ranks rows within a group. The rollup must include both the rank’s reduce_by dimension and the query’s grouping dimensions:

measures:
  - name: revenue
    sql: amount
    type: sum

  - name: product_rank_in_country
    multi_stage: true
    type: rank
    order_by:
      - sql: "{revenue}"
        dir: desc
    reduce_by:
      - product

pre_aggregations:
  - name: revenue_by_country_and_product
    measures:
      - revenue
    dimensions:
      - country
      - product
    time_dimension: created_at
    granularity: day
    partition_granularity: month
    build_range_start:
      sql: "SELECT date_trunc('year', CURRENT_DATE - INTERVAL '1 year')"
    build_range_end:
      sql: "SELECT CURRENT_DATE"

Both product (from reduce_by) and country (from the query’s GROUP BY) must be in the rollup so Cube can rank product within each country.

Common pitfalls

Don’t list the multi-stage measure in measures. Only its base measures belong in the rollup. Multi-stage measures are computed on top of the rollup, not stored in it.
Use additive bases. avg and count_distinct make the rollup non-additive and disqualify it from multi-stage matching. Replace them with sum + count (and compute the average at query time) and with count_distinct_approx. See the non-additive measures recipe for the full pattern.
Match granularity end-to-end. For time_shift, the rollup’s granularity must be at least as fine as both the query’s granularity and the shift interval.
Include every group_by / add_group_by / reduce_by dimension in the rollup, in addition to the dimensions the query itself groups by.

Topics

Matching multi-stage measures in pre-aggregations

Use case

How matching works

Pattern 1: time shift (year-over-year)

Pattern 2: percent of total with `group_by`

Pattern 3: nested aggregates with `add_group_by`

Pattern 4: ranking with `reduce_by`

Common pitfalls

See also

Topics

Documentation Index

​Use case

​How matching works

​Pattern 1: time shift (year-over-year)

​Pattern 2: percent of total with group_by

​Pattern 3: nested aggregates with add_group_by

​Pattern 4: ranking with reduce_by

​Common pitfalls

​See also

Use case

How matching works

Pattern 1: time shift (year-over-year)

Pattern 2: percent of total with `group_by`

Pattern 3: nested aggregates with `add_group_by`

Pattern 4: ranking with `reduce_by`

Common pitfalls

See also