evaluate.evaluate_forecast()

Evaluate forecast samples against ground truth for several metrics at once.

Usage

evaluate.evaluate_forecast(
    pred,
    truth,
    *,
    metrics=None,
)

A one-call convenience that applies each metric in metrics to the same forecast samples and ground truth, converting to host floats at the end. It is the one-shot counterpart to backtest() and is also used internally by backtest() to score each rolling window.

Metric-specific parameters live with the metric in the metrics mapping, not on this function. To tune a metric, bind its keyword with functools.partial(); for example, to score coverage at the 80% level::

from functools import partial

metrics = {**DEFAULT_METRICS, "coverage": partial(eval_coverage, alpha=0.8)}
evaluate_forecast(pred, truth, metrics=metrics)

Parameters

pred: Float[Array, " sample *batch"] | Float[np.ndarray, " sample *batch"]: Forecast samples with the sample axis first, shape (sample, *batch).
truth: Float[Array, " *batch"] | Float[np.ndarray, " *batch"]: Ground-truth values with shape (*batch).
metrics: Mapping[str, Metric] | None = None: Mapping of metric name to function; when None defaults to DEFAULT_METRICS (mae, rmse, crps and coverage). Each function takes (pred, truth) and returns a scalar array (see ~numpyro_forecast.typing.Metric); bind any extra parameters with functools.partial() (see above).

Returns

dict[str, float]: Each metric name mapped to its value.