Measuring forecast accuracy: Keeping score on keeping score

March 15, 2022

Employee is checking his performance data

Measuring forecast accuracy (FA) determines the degree to which an organization can accurately predict sales. High forecast accuracy leads to lower required inventory levels, fewer lost sales and optimized working capital. This blog post focuses on measuring forecast performance and answers common questions on the topic, including:

What measures should you use to calculate forecast accuracy?
How can you benchmark results?
At what level of the hierarchy should you measure forecast accuracy?
For what time horizon should you measure forecast accuracy?
What time period should you use to measure forecast accuracy?

What measures should you use to calculate forecast accuracy?

First, let’s review common forecast metrics. You can categorize your metrics into unit measures and percentage measures. The most common unit measures are error (E) and mean absolute error (MAE). You define error as actual minus forecast (A-F). Table A shows sample calculations.

Measuring forecast accuracy

Table A: Sample calculations

Common percent measures include percent error (PE), mean absolute percent error (MAPE) and weighted average percent error (WAPE).

The first forecast error metric to review is Bias. You can measure Bias in units — error (E) or percent error (PE). In Table B, Bias is -4 when measuring in units and -8.3% when measuring in percent. You can also describe Bias as systematic over- or under-forecasting for a consecutive number of periods. You can adjust the forecast by the appropriate amount to improve accuracy.

Measuring forecast accuracy

Table B: Measuring the Bias forecast error metric

MAPE and WAPE are the second most common and easily interpretable forecast error metrics.

Measuring forecast accuracy

Table C: Measuring the MAPE and WAPE forecast error metrics

In Table C, both MAPE and WAPE are the same when measured at the individual product level (A or B). However, they’re different when aggregated and then measured at the sum of the product level (A and B combined). This difference is because MAPE averages percentages, while WAPE uses the sum of absolute errors at the product level. In our example, MAPE for product B has a greater influence on the total MAPE, although the volumes of product B are smaller, whereas the absolute error of product A has a greater weight when calculating WAPE.

When using MAPE, consider the following two drawbacks: intermittent and low-volume demand. When actuals are zero, MAPE is undetermined and low-volume demand has a significant impact.

Remember, not all products are created equal. Should you only measure unit volume? What if you measure forecast error by revenue or gross profit? Measuring forecast error by gross profit is a best practice. Finally, how do you know if you’re adding value to your forecast?

How can you benchmark your results?

Forecast value add (FVA) evaluates the effectiveness of the forecasting process. It measures whether adjustments made to a forecast make it more or less accurate. For example, did the change in the forecast due to selective information make it more accurate than the historical data?

Measuring FVA allows you to focus on each step of the process and whether that step had a positive or negative impact on the final forecast. One of the most common FVA analyses is to use a “naïve” forecast as a benchmark. The naïve forecast simply uses last month’s actuals as the forecast going forward. If your statistical forecast or your demand planner’s judgmental adjustments aren’t more accurate than using the naïve forecast, it’s time to revisit your forecasting process.

When evaluating statistical models, consider using root mean square error (RMSE). RMSE is one of the most used measures for evaluating the quality of predictions. It shows how data congregates around the line of best fit.

Using RMSE is a common way to compare prediction errors of different models. It’s considered an excellent general-purpose error metric for numerical predictions.

At what level of the hierarchy should you measure forecast accuracy?

Next, let’s review at what level of the hierarchy an organization should measure forecast accuracy. You define hierarchy as the levels of the company or the level of aggregation. It’s beneficial to measure forecast accuracy at multiple levels of the hierarchy. Measuring forecast accuracy at the company level is a suitable gauge of whether your total forecast is accurate and whether you’re under- or over-forecasting. Forecasts are more accurate at a higher level of aggregation because random variations in demand cancel each other out. The most accurate forecasts occur at the highest levels of aggregation.

If you’ve aggregated your data, do you need to measure forecast error at a lower level of the hierarchy? Let’s look at products sold from two warehouses.

Measuring forecast accuracy

Table D: Absolute errors summed by warehouse

Measuring forecast accuracy

Table E: Error at the aggregate level (total warehouses)

In Table D, forecast error is 0% when aggregated for both warehouses. Unfortunately, this result isn’t the best representation of how our sample business is performing. The company has the right amount of product, but incorrectly assumes it’s in the right inventory location. Accuracy is also a key factor in calculating proper safety stock levels. The more accurate your forecast by inventory location, the less expedited freight and lost sales. Measuring forecast error by SKU/inventory location is a best practice because costs are controlled at the SKU/inventory location. Similarly, SKU/customer-level forecasts are important if each customer’s shipments differ dramatically by warehouse.

Understanding what’s impacting the error also helps determine whether you’ve a mix or volume issue. A mix issue is when most products are under- or over-forecast by a significant amount. A volume issue is when all products are under- or over-forecast. In Table E, MAPE and WAPE give a more accurate picture of how your warehouses are performing. If your organization doesn’t forecast or run statistical models at a certain level of the hierarchy, it doesn’t mean you didn’t create a forecast for that level. Or that you don’t measure forecast accuracy at a more granular level. You may decide to run your statistical models in monthly periods and reconcile the monthly periods into weekly periods for production planning (shifts). You may forecast at the SKU level and reconcile the forecast to your warehouses based on proportional factors. This approach has the advantage of the higher level of aggregation, where your forecasts are more accurate, while still being accountable for the more granular level of the hierarchy. You may inherently decide that measuring forecast accuracy in quarterly periods makes sense for your C-level (ABC code) products. Forecasts are always wrong; it’s about being less wrong.

For what time horizon should you measure forecast accuracy?

Measuring lag accuracy is one additional step toward improving your forecasting process. Measuring forecast accuracy by lag is common practice. It measures accuracy based on lead time.

An organization may plan for manufacturing one month ahead — that is, a forecast created in January for February production is a 1-month lag. The same organization may have a 3-month lead-time for packaging materials and a 6-month lead-time for raw materials. In this scenario, you measure the forecast created in January for April (lag 3) and the forecast created in January for July (lag 6).

One caveat to lag accuracy is the understanding that just because your measured lag is accurate, it doesn’t mean the forecast for the entire time horizon is accurate. For example, your lag 1 forecast may be accurate, but the forecast for periods three through 24 may not be accurate. The farther out in the time horizon, the less accurate your forecast. A 1-month-ahead forecast is usually more accurate than trying to forecast 12 months ahead.

For what time period should you measure forecast accuracy?

What time period should you forecast? Daily, weekly, monthly or quarterly are all options. When selecting a time period for forecasting, you might ask the following questions:

Is it necessary?
Is it manageable?
Will it be accurate?

These days, newer systems allow demand planners to forecast in multiple time periods. The systems allow you to forecast near-term periods in daily or weekly increments, followed by months and finally quarters. This eases the problem of forecasting two years’ worth of data in weekly periods. It may not be necessary or manageable to expect a demand planner to accurately forecast 104 periods. Systems today also allow for greater flexibility in reconciling from greater time periods to those that are more granular — that is, months to weeks.

How well is your demand planning process performing?

Benchmarking between organizations is challenging. Not all demand patterns are equally predictable. The more variable the historical demand pattern, the more challenging to forecast.

— By Fabrizio Carle and Kevin Croteau

Measuring forecast accuracy: Keeping score on keeping score

Related Blog Posts

Services

Resources

Company