Mathematical Formulations
This section provides the mathematical foundations and theoretical background for the statistical metrics implemented in Monet Stats. Understanding these formulations helps in proper interpretation and application of the metrics in atmospheric sciences research.
Mathematical Notation
- \(O\): Observed values
- \(M\): Modeled/predicted values
- \(N\): Number of observations
- \(\bar{O}\): Mean of observed values
- \(\bar{M}\): Mean of modeled values
- \(\sigma_O\): Standard deviation of observed values
- \(\sigma_M\): Standard deviation of modeled values
Error Metrics
Mean Absolute Error (MAE)
The MAE measures the average magnitude of errors without considering their direction, providing a linear penalty for errors.
Root Mean Square Error (RMSE)
RMSE gives a higher weight to larger errors due to the squaring operation, making it sensitive to outliers.
Mean Bias (MB)
MB quantifies the systematic overestimation or underestimation bias in the model.
Normalized Mean Bias (NMB)
NMB expresses bias as a percentage of the observed mean, allowing comparison across different scales.
Mean Absolute Percentage Error (MAPE)
MAPE measures the average absolute percentage error, useful for scale-independent comparisons.
Symmetric Mean Absolute Percentage Error (sMAPE)
sMAPE provides a symmetric version of MAPE that is bounded between 0% and 200%.
Mean Absolute Scaled Error (MASE)
MASE compares forecast errors to the average error of a naive forecast, making it scale-independent.
Root Mean Square Percentage Error (RMSPE)
RMSPE is the percentage version of RMSE, emphasizing larger errors.
Normalized Root Mean Square Error (NRMSE)
NRMSE normalizes RMSE by the range of observed values, allowing comparison across different scales.
Median Absolute Error (MedAE)
MedAE is the median of absolute errors, providing a robust measure less sensitive to outliers than MAE.
Mean Normalized Bias (MNB)
MNB measures the normalized bias by comparing the sum of differences to the sum of observations.
Normalized Mean Error (NME)
NME normalizes the total absolute error by the sum of observations.
Fractional Bias (FB)
FB measures the average bias as a fraction of the sum of model and observed means.
Fractional Error (FE)
FE measures the average error as a fraction of the sum of model and observed values.
Index of Agreement (IOA)
IOA ranges from 0 to 1, with values closer to 1 indicating better agreement.
Modified Index of Agreement (d1)
d1 is a modified version of IOA using absolute differences instead of squared differences.
Modified Coefficient of Efficiency (E1)
E1 is a robust version of the coefficient of efficiency using absolute differences.
Center of Mass Error (COE)
Where \(\bar{x}_o, \bar{y}_o\) and \(\bar{x}_m, \bar{y}_m\) are the centers of mass of observed and modeled fields respectively.
Volumetric Error
Measures the relative difference in total volume between modeled and observed fields.
Normalized Mean Square Error (NMSE)
NMSE normalizes the mean square error by the variance of observations.
Logarithmic Error
Where \(\epsilon\) is a small constant to avoid \(\ln(0)\).
Correlation Metrics
Coefficient of Determination (R²)
R² represents the proportion of variance in the observed data that is explained by the model.
Pearson Correlation Coefficient
Pearson correlation measures the linear relationship strength between observed and modeled values.
Spearman Rank Correlation
Where \(d_i\) is the difference between ranks of corresponding values, and \(N\) is the number of observations.
Kendall Rank Correlation
Kendall's tau measures the ordinal association between two measured quantities.
Anomaly Correlation (AC)
AC measures the correlation between anomalies (deviations from mean) of observations and model values.
Concordance Correlation Coefficient (CCC)
CCC measures how far the data deviates from the line of perfect concordance (slope=1, intercept=0).
Taylor Skill Score (TSS)
Where \(r\) is the correlation coefficient and \(r_0\) is a reference correlation.
Kling-Gupta Efficiency (KGE)
Where:
- \(r\): Pearson correlation coefficient
- \(\alpha = \sigma_M / \sigma_O\): Ratio of standard deviations
- \(\beta = \bar{M} / \bar{O}\): Ratio of means
KGE provides a comprehensive evaluation of performance across correlation, variability, and bias dimensions.
Efficiency Metrics
Nash-Sutcliffe Efficiency (NSE)
NSE compares the model performance to a simple mean forecast, with values > 0 indicating better performance than climatology.
Log Nash-Sutcliffe Efficiency (NSElog)
Where:
- \(r\): Pearson correlation coefficient
- \(\alpha = \sigma_M / \sigma_O\): Ratio of standard deviations
- \(\beta = \bar{M} / \bar{O}\): Ratio of means
Modified Nash-Sutcliffe Efficiency (mNSE)
mNSE uses absolute differences instead of squared differences, making it more robust to outliers.
Relative Nash-Sutcliffe Efficiency (rNSE)
rNSE is similar to NSE but normalized by the range of observations.
Percent of Correct (PC)
PC measures the percentage of predictions that fall within a specified tolerance of observations.
Mean Squared Error (MSE)
MSE measures the average of the squares of the errors, giving more weight to larger errors.
Relative/Percentage Metrics
Normalized Median Bias (NMdnB)
NMdnB measures the normalized median bias, robust to outliers.
Normalized Median Error (NMdnE)
NMdnE measures the normalized median error, robust to outliers.
Unpaired Space/Unpaired Time Peak Bias (USUTPB)
USUTPB measures the bias in peak values regardless of spatial or temporal pairing.
Unpaired Space/Unpaired Time Peak Error (USUTPE)
USUTPE measures the error in peak values regardless of spatial or temporal pairing.
Mean Normalized Peak Bias (MNPB)
MNPB measures the mean normalized bias in peak values across multiple series.
Mean Normalized Peak Error (MNPE)
MNPE measures the mean normalized error in peak values across multiple series.
Normalized Mean Peak Bias (NMPB)
NMPB measures the normalized mean of peak biases across multiple series.
Normalized Mean Peak Error (NMPE)
NMPE measures the normalized mean of peak errors across multiple series.
Contingency Table Metrics
Contingency Table Structure
| Forecast Yes | Forecast No | Total | |
|---|---|---|---|
| Observed Yes | A (Hits) | B (Misses) | A+B |
| Observed No | C (False Alarms) | D (Correct Negatives) | C+D |
| Total | A+C | B+D | N |
Probability of Detection (POD)
POD measures the ability to correctly detect the occurrence of an event.
False Alarm Ratio (FAR)
FAR indicates the proportion of predicted events that did not actually occur.
Critical Success Index (CSI)
CSI measures the accuracy of event forecasts, excluding correct negatives.
Heidke Skill Score (HSS)
HSS measures the improvement of the forecast over random chance.
Equitable Threat Score (ETS)
Where \(A_r = \frac{(A+B)(A+C)}{N}\) is the number of hits expected by random chance.
ETS measures the threat score adjusted for random hits, useful for rare events.
Frequency Bias Index (FBI)
FBI measures the ratio of forecast events to observed events.
True Skill Statistic (TSS)
Where POFD (Probability of False Detection) = \(\frac{C}{C+D}\)
TSS measures the ability to discriminate between events and non-events.
Binary Brier Skill Score
Where \(\text{BS} = \frac{1}{N} \sum_{i=1}^{N} (f_i - o_i)^2\) is the Brier Score, and \(\text{BS}_{ref}\) is the reference Brier Score.
Spatial Verification Metrics
Fractions Skill Score (FSS)
Where \(\text{MSE}_{frac}\) is the mean squared error of fractional coverage, and \(\text{MSE}_{ref}\) is the reference MSE.
FSS evaluates the ability to predict spatial patterns of categorical events.
Structure-Amplitude-Location (SAL)
Where: - \(S\): Structure component (-2 to 2, 0 is best) - \(A\): Amplitude component (-2 to 2, 0 is best) - \(L\): Location component (0 to 2, 0 is best)
SAL decomposes verification errors into structure, amplitude, and location components.
Extreme Dependency Score (EDS)
Where \(p = \frac{N_{obs}}{N}\) and \(q = \frac{N_{mod}}{N}\) are the observed and forecasted event frequencies.
EDS measures the skill in forecasting rare events.
Ensemble Metrics
Continuous Ranked Probability Score (CRPS)
Where \(F_o(x)\) is the cumulative distribution function of observations and \(F_m(x)\) is the CDF of the ensemble forecast.
CRPS measures the overall quality of probabilistic forecasts, considering both reliability and sharpness.
Brier Score
Where \(p_i\) is the predicted probability and \(o_i\) is the observed outcome (0 or 1).
The Brier score measures the accuracy of probabilistic binary forecasts.
Brier Skill Score (BSS)
Where BS is the Brier Score and \(\text{BS}_{ref}\) is the reference Brier Score.
Spread-Error Relationship
Where \(\bar{M}\) is the ensemble mean. This measures the relationship between ensemble spread (uncertainty) and forecast error.
Rank Histogram
The rank histogram (Talagrand diagram) assesses ensemble reliability by plotting the frequency of the observation rank among ensemble members. A flat histogram indicates reliable ensemble forecasts.
Circular Statistics
For wind direction and other circular variables:
Circular Mean
Circular Variance
Circular statistics properly handle the periodic nature of angular measurements like wind direction.
Wind Direction Mean Bias (WDMB)
Where \(\text{circlebias}\) handles the circular nature of wind direction differences.
Wind Direction Root Mean Square Error (WDRMSE)
Wind Direction Index of Agreement (WDIOA)
Where \(\bar{O}_c\) is the circular mean of observations.
References
- Willmott, C.J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE). Climate Research, 30(1), 79-82.
- Nash, J.E., & Sutcliffe, J.V. (1970). River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10(3), 282-290.
- Gupta, H.V., et al. (2009). Decomposition of the mean squared error and NSE criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377(1-2), 80-91.
- Potts, J.M., et al. (1996). A simple, objective method for partitioning variance in model performance evaluation. American Meteorological Society, 29(2), 202-215.
- Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15(5), 559-570.