Contingency Metrics
The contingency metrics module provides comprehensive metrics for evaluating categorical forecasts, particularly useful for binary event prediction in atmospheric sciences. These metrics are based on contingency table analysis and are commonly used for precipitation verification, air quality exceedance predictions, and other categorical forecast evaluations.
Overview
Contingency metrics analyze the relationship between observed and predicted categorical events by organizing outcomes into a 2×2 contingency table:
| Predicted Event | Predicted No Event | |
|---|---|---|
| Observed Event | Hits (A) | Misses (B) |
| Observed No Event | False Alarms (C) | Correct Rejections (D) |
Core Functions
HSS(obs, mod, minval)
Calculate the Heidke Skill Score for categorical forecast verification.
import numpy as np
from monet_stats import HSS
# Example: Precipitation verification
obs_precip = np.array([0, 1, 1, 0, 0, 1, 1, 1, 0, 0]) # 1 = rain, 0 = no rain
mod_precip = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 1]) # Model predictions
hss_score = HSS(obs_precip, mod_precip, threshold=0.1)
print(f"Heidke Skill Score: {hss_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional, for range-based events)
Returns:
float: Heidke Skill Score (-∞ to 1, where 1 = perfect, 0 = no skill)
Typical Use Cases:
- Evaluating categorical forecast skill (e.g., precipitation, air quality events)
- Used in meteorology and environmental modeling for binary event prediction accuracy
Typical Values and Range:
- Range: -∞ to 1
- 1: Perfect forecast
- 0: No skill (random forecast)
- Negative values: Worse than random
ETS(obs, mod, minval, maxval=None)
Calculate the Equitable Threat Score (ETS), also known as the Gilbert Skill Score.
ets_score = ETS(obs_precip, mod_precip, threshold=0.1)
print(f"Equitable Threat Score: {ets_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: Equitable Threat Score (-∞ to 1)
Use Cases:
- Fair evaluation of rare events
- Accounting for random hits in skill calculation
- Commonly used for precipitation verification
CSI(obs, mod, minval, maxval=None)
Calculate the Critical Success Index (CSI), also known as the Threat Score.
csi_score = CSI(obs_precip, mod_precip, threshold=0.1)
print(f"Critical Success Index: {csi_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: Critical Success Index (0 to 1)
Use Cases:
- High-impact event verification
- Areas where false alarms and misses are equally costly
- Common in severe weather forecasting
POD(obs, mod, minval, maxval=None)
Calculate the Probability of Detection (POD), also known as the Hit Rate.
pod_score = POD(obs_precip, mod_precip, threshold=0.1)
print(f"Probability of Detection: {pod_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: Probability of Detection (0 to 1)
Use Cases:
- Assessing forecast sensitivity to detect events
- Important for public safety applications
- Used in severe weather warnings
FAR(obs, mod, minval, maxval=None)
Calculate the False Alarm Rate (FAR).
far_score = FAR(obs_precip, mod_precip, threshold=0.1)
print(f"False Alarm Rate: {far_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: False Alarm Rate (0 to 1)
Use Cases:
- Evaluating false positive rates
- Important for resource allocation decisions
- Used in operational forecasting
FBI(obs, mod, minval, maxval=None)
Calculate the False Bias Index.
fbi_score = FBI(obs_precip, mod_precip, threshold=0.1)
print(f"False Bias Index: {fbi_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: False Bias Index
Use Cases:
- Assessing forecast bias toward false alarms
- Comparing model tendency to overpredict events
TSS(obs, mod, minval, maxval=None)
Calculate the True Skill Statistic (TSS), also known as the Hanssen-Kuipers Discriminant.
tss_score = TSS(obs_precip, mod_precip, threshold=0.1)
print(f"True Skill Statistic: {tss_score:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
float: True Skill Statistic (-1 to 1)
Use Cases:
- Balanced evaluation of hits and misses
- Comparing forecast systems
- Assessing discrimination skill
scores(obs, mod, minval, maxval=None)
Calculate a comprehensive set of contingency table metrics.
# Get all contingency metrics at once
contingency_results = scores(obs_precip, mod_precip, threshold=0.1)
print("Contingency Metrics:")
for metric, value in contingency_results.items():
print(f" {metric}: {value:.3f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)minval: Threshold value for event definitionmaxval: Maximum threshold (optional)
Returns:
dict: Dictionary containing all contingency metrics:hits: Number of hits (A)misses: Number of misses (B)false_alarms: Number of false alarms (C)correct_negatives: Number of correct negatives (D)POD: Probability of DetectionFAR: False Alarm RateCSI: Critical Success IndexHSS: Heidke Skill ScoreETS: Equitable Threat ScoreTSS: True Skill Statistic
BSS_binary(obs, mod, threshold)
Calculate the Binary Brier Skill Score for categorical forecasts.
# Convert probabilities to binary predictions
obs_binary = (obs_precip >= threshold).astype(int)
mod_binary = (mod_precip >= threshold).astype(int)
bss_score = BSS_binary(obs_binary, mod_binary)
print(f"Binary Brier Skill Score: {bss_score:.3f}")
Parameters:
obs: Observed binary outcomes (0 or 1)mod: Forecast binary predictions (0 or 1)threshold: Decision threshold
Returns:
float: Binary Brier Skill Score
Use Cases:
- Evaluating binary forecast skill
- Comparing against reference forecasts
- Skill score calculations for categorical predictions
Advanced Functions
HSS_max_threshold(obs, mod, threshold_range)
Find the optimal threshold that maximizes the Heidke Skill Score.
# Test multiple thresholds to find optimal performance
thresholds = np.arange(0.1, 10.0, 0.5)
optimal_threshold = HSS_max_threshold(obs_precip, mod_precip, thresholds)
print(f"Optimal threshold for HSS: {optimal_threshold:.1f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)threshold_range: Array of threshold values to test
Returns:
float: Threshold that maximizes HSS
Use Cases:
- Optimizing forecast thresholds
- Comparing model performance across different event definitions
- Calibration analysis
POD_max_threshold(obs, mod, threshold_range)
Find the optimal threshold that maximizes the Probability of Detection.
optimal_pod_threshold = POD_max_threshold(obs_precip, mod_precip, thresholds)
print(f"Optimal threshold for POD: {optimal_pod_threshold:.1f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)threshold_range: Array of threshold values to test
Returns:
float: Threshold that maximizes POD
Use Cases:
- Maximizing event detection sensitivity
- Optimizing warning systems
- High-impact weather verification
FAR_min_threshold(obs, mod, threshold_range)
Find the optimal threshold that minimizes the False Alarm Rate.
optimal_far_threshold = FAR_min_threshold(obs_precip, mod_precip, thresholds)
print(f"Optimal threshold for FAR minimization: {optimal_far_threshold:.1f}")
Parameters:
obs: Observed values (array-like)mod: Modeled values (array-like)threshold_range: Array of threshold values to test
Returns:
float: Threshold that minimizes FAR
Use Cases:
- Minimizing false positive rates
- Optimizing resource allocation
- Cost-sensitive forecasting
Examples
Basic Precipitation Verification
import numpy as np
from monet_stats import scores, POD, FAR, CSI
# Sample precipitation data (0 = no rain, 1 = rain)
observed = np.array([0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0])
modeled = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0])
# Multiple threshold analysis
thresholds = [0.1, 1.0, 5.0, 10.0]
print("Precipitation Verification Results:")
print("-" * 50)
for threshold in thresholds:
print(f"\nThreshold: {threshold} mm")
results = scores(observed, modeled, threshold)
print(f" Events: {results['hits'] + results['misses']} observed, "
f"{results['hits'] + results['false_alarms']} predicted")
print(f" POD: {results['POD']:.3f}")
print(f" FAR: {results['FAR']:.3f}")
print(f" CSI: {results['CSI']:.3f}")
print(f" HSS: {results['HSS']:.3f}")
Air Quality Exceedance Analysis
import numpy as np
from monet_stats import HSS, ETS, POD
# Air quality concentration data (µg/m³)
observed_conc = np.array([35, 45, 55, 65, 75, 85, 95, 105, 115, 125])
modeled_conc = np.array([40, 48, 52, 68, 72, 88, 92, 108, 112, 130])
# Exceedance threshold for PM2.5 (75 µg/m³)
threshold = 75
# Convert to binary exceedances
obs_binary = (observed_conc >= threshold).astype(int)
mod_binary = (modeled_conc >= threshold).astype(int)
# Calculate metrics
hss = HSS(obs_binary, mod_binary, threshold)
ets = ETS(obs_binary, mod_binary, threshold)
pod = POD(obs_binary, mod_binary, threshold)
print(f"Air Quality Exceedance Analysis (threshold = {threshold} µg/m³):")
print(f" Heidke Skill Score: {hss:.3f}")
print(f" Equitable Threat Score: {ets:.3f}")
print(f" Probability of Detection: {pod:.3f}")
# Find optimal threshold
thresholds = np.arange(50, 120, 5)
optimal_hss_threshold = HSS_max_threshold(obs_binary, mod_binary, thresholds)
print(f" Optimal threshold for HSS: {optimal_hss_threshold} µg/m³")
Multi-Event Contingency Analysis
import pandas as pd
from monet_stats import scores
# Multiple event types
events_data = pd.DataFrame({
'observed': [0, 1, 1, 0, 1, 0, 1, 1, 0, 0], # Binary events
'modeled': [0, 1, 0, 0, 1, 1, 1, 0, 0, 1],
'event_type': ['light', 'moderate', 'heavy', 'extreme'] * 2 + ['light', 'moderate']
})
# Analyze by event type
event_analysis = events_data.groupby('event_type').apply(
lambda x: scores(x['observed'], x['modeled'], threshold=0.5)
).unstack()
print("Multi-Event Contingency Analysis:")
print(event_analysis[['POD', 'FAR', 'CSI', 'HSS']].round(3))
Threshold Optimization
import numpy as np
import matplotlib.pyplot as plt
from monet_stats import HSS_max_threshold, POD_max_threshold, FAR_min_threshold
# Generate synthetic verification data
np.random.seed(42)
n_samples = 1000
# Create observed events (rare)
observed = np.random.binomial(1, 0.1, n_samples) # 10% event rate
# Model with skill but different bias
modeled = observed.copy()
# Add some misses and false alarms
misses = (observed == 1) & (np.random.rand(n_samples) < 0.3)
false_alarms = (observed == 0) & (np.random.rand(n_samples) < 0.2)
modeled[misses] = 0
modeled[false_alarms] = 1
# Test threshold range
threshold_range = np.linspace(0, 1, 21)
# Calculate metrics for different thresholds
hss_values = []
pod_values = []
far_values = []
for threshold in threshold_range:
obs_binary = (observed >= threshold).astype(int)
mod_binary = (modeled >= threshold).astype(int)
hss_values.append(HSS(obs_binary, mod_binary, threshold))
pod_values.append(POD(obs_binary, mod_binary, threshold))
far_values.append(FAR(obs_binary, mod_binary, threshold))
# Find optimal thresholds
optimal_hss = HSS_max_threshold(observed, modeled, threshold_range)
optimal_pod = POD_max_threshold(observed, modeled, threshold_range)
optimal_far = FAR_min_threshold(observed, modeled, threshold_range)
# Plot results
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.plot(threshold_range, hss_values, 'b-', label='HSS')
plt.axvline(optimal_hss, color='r', linestyle='--', label=f'Optimal HSS: {optimal_hss:.2f}')
plt.xlabel('Threshold')
plt.ylabel('HSS')
plt.title('Heidke Skill Score')
plt.legend()
plt.subplot(1, 3, 2)
plt.plot(threshold_range, pod_values, 'g-', label='POD')
plt.axvline(optimal_pod, color='r', linestyle='--', label=f'Optimal POD: {optimal_pod:.2f}')
plt.xlabel('Threshold')
plt.ylabel('POD')
plt.title('Probability of Detection')
plt.legend()
plt.subplot(1, 3, 3)
plt.plot(threshold_range, far_values, 'orange', label='FAR')
plt.axvline(optimal_far, color='r', linestyle='--', label=f'Optimal FAR: {optimal_far:.2f}')
plt.xlabel('Threshold')
plt.ylabel('FAR')
plt.title('False Alarm Rate')
plt.legend()
plt.tight_layout()
plt.show()
Best Practices
1. Rare Event Handling
For rare events (<5% occurrence rate):
- Use Equitable Threat Score (ETS) instead of CSI
- Consider alternative formulations for skill scores
- Be cautious with small sample sizes
2. Threshold Selection
- Use physical thresholds when available
- Optimize thresholds for specific applications
- Consider the costs of false alarms vs. misses
3. Sample Size Considerations
- Minimum recommended: 50-100 events for stable statistics
- Use bootstrapping for uncertainty estimation with small samples
- Report confidence intervals for skill scores
4. Multiple Threshold Analysis
- Analyze performance across a range of thresholds
- Use contingency tables for detailed breakdown
- Consider ROC curve analysis for comprehensive evaluation
Common Issues and Solutions
Issue: NaN Values in Contingency Metrics
# Solution: Handle missing data properly
obs_clean = obs[~np.isnan(obs) & ~np.isnan(mod)]
mod_clean = mod[~np.isnan(obs) & ~np.isnan(mod)]
metrics = scores(obs_clean, mod_clean, threshold)
Issue: Zero Division in FAR Calculation
# Solution: Add small epsilon to avoid division by zero
def safe_far(obs, mod, threshold):
hits, misses, false_alarms, correct_negatives = contingency_table(obs, mod, threshold)
if hits + false_alarms == 0:
return 0.0
return false_alarms / (hits + false_alarms)
far = safe_far(obs, mod, threshold)
Issue: Imbalanced Classes
# Solution: Use skill scores that account for random chance
# Prefer ETS over CSI for rare events
ets_score = ETS(obs, mod, threshold)
hss_score = HSS(obs, mod, threshold)