FAQ and Troubleshooting Guide

This FAQ addresses common questions and issues encountered when using Monet Stats for atmospheric sciences applications. If you don't find your question here, please check the GitHub Issues or submit a new issue.

Installation and Setup

Q: I'm getting ImportError when trying to import monet_stats

A: This typically indicates that Monet Stats is not properly installed. Try these steps:

# Check if package is installed
pip show monet-stats

# If not installed
pip install monet-stats

# Reinstall if needed
pip install --reinstall monet-stats

# For development installation
pip install -e .

Q: I need additional dependencies like xarray or pandas

A: Install the optional dependencies:

# Install with xarray support
pip install monet-stats[xarray]

# Install with all optional dependencies
pip install monet-stats[dev,test]

# Install specific dependencies
pip install xarray pandas scipy matplotlib

Q: How do I set up a development environment?

A: Follow these steps for development setup:

# Clone the repository
git clone https://github.com/noaa-oar-arl/monet-stats.git
cd monet-stats

# Create virtual environment
python -m venv monet-stats-env
source monet-stats-env/bin/activate  # Windows: monet-stats-env\Scripts\activate

# Install development dependencies
pip install -e ".[dev,test]"

# Install pre-commit hooks
pre-commit install

Data Format Issues

Q: My xarray DataArray imports fail

A: Ensure you have xarray installed and your DataArrays have compatible dimensions:

import xarray as xr
import monet_stats as ms

# Ensure DataArrays have same coordinates
obs_da = xr.DataArray(obs_data, dims=['time'])
mod_da = xr.DataArray(mod_data, dims=['time'])

# This works
result = ms.R2(obs_da, mod_da)

# This will fail due to dimension mismatch
# mod_da = xr.DataArray(mod_data, dims=['space'])  # Error!

Q: How do I handle NaN values in my data?

A: Monet Stats automatically handles NaN values by using pairwise deletion:

import numpy as np
import monet_stats as ms

# Data with NaN values
obs_with_nan = np.array([1, 2, np.nan, 4, 5])
mod_with_nan = np.array([1.1, 2.1, 3.1, np.nan, 5.1])

# Functions automatically use valid pairs only
rmse = ms.RMSE(obs_with_nan, mod_with_nan)  # Uses (1,1.1), (2,2.1), (5,5.1)

Q: My data shapes don't match

A: Ensure your observed and modeled arrays have compatible shapes:

# Correct: Same shape
obs = np.array([1, 2, 3, 4, 5])
mod = np.array([1.1, 2.1, 2.9, 4.1, 4.8])

# Error: Different shapes
# obs = np.array([1, 2, 3])  # This will raise ValueError
# mod = np.array([1.1, 2.1, 2.9, 4.1, 4.8])

# Solution: Align your data first
obs, mod = obs[:len(mod)], mod[:len(obs)]  # Truncate to shorter length

Metric Calculation Issues

Q: Why do I get NaN values for my metrics?

A: NaN results typically occur when:

Insufficient valid data pairs:

# Too few valid pairs
obs = np.array([1, np.nan, np.nan])
mod = np.array([np.nan, 2, np.nan])

# Result: NaN (only one valid pair, which may be insufficient)
rmse = ms.RMSE(obs, mod)

Division by zero:

# Zero variance in observed data
obs = np.array([1, 1, 1, 1])
mod = np.array([1.1, 1.1, 1.1, 1.1])

# Result: NaN (division by zero in R² calculation)
r2 = ms.R2(obs, mod)

Q: My contingency metrics return weird values

A: Check your threshold and event definition:

# For precipitation data
obs_precip = np.array([0, 1, 1, 0, 0])  # 0 = no rain, 1 = rain
mod_precip = np.array([0, 1, 0, 0, 1])

# Correct threshold for binary events
pod = ms.POD(obs_precip, mod_precip, threshold=0.5)

# Wrong threshold (should be 0.5 for binary data)
# pod = ms.POD(obs_precip, mod_precip, threshold=10.0)  # Wrong!

Q: Wind direction metrics give strange results

A: Wind direction requires special circular statistics:

import monet_stats as ms
import numpy as np

# Wind directions in degrees (0-360)
obs_wind = np.array([10, 20, 350])  # Note the circular nature
mod_wind = np.array([15, 25, 5])    # 350° is close to 5°

# Use circular bias for wind direction
circular_bias = ms.circlebias(mod_wind - obs_wind)

# Standard RMSE for wind direction (may give misleading results)
wind_rmse = ms.RMSE(obs_wind, mod_wind)

Performance Issues

Q: My calculations are too slow for large datasets

A: Optimize performance with these techniques:

import monet_stats as ms
import numpy as np

# For very large arrays (>1M elements), process in chunks
def process_large_data(obs, mod, chunk_size=100000):
    results = []
    for i in range(0, len(obs), chunk_size):
        obs_chunk = obs[i:i+chunk_size]
        mod_chunk = mod[i:i+chunk_size]

        result = ms.RMSE(obs_chunk, mod_chunk)
        results.append(result)

    return np.mean(results)

# Use NumPy arrays for best performance
obs_np = np.array(obs_data)  # Convert to NumPy array
mod_np = np.array(mod_data)

# Avoid Python lists and loops
# Slow:
# results = [ms.RMSE(obs[i], mod[i]) for i in range(len(obs))]

# Fast:
results = ms.RMSE(obs_np, mod_np)

Q: How can I reduce memory usage?

A: Use memory-efficient data types and processing:

import numpy as np

# Use float32 instead of float64 for large datasets
obs_float32 = obs.astype(np.float32)
mod_float32 = mod.astype(np.float32)

# Process data in chunks
def memory_efficient_analysis(obs, mod, chunk_size=50000):
    total_results = []

    for i in range(0, len(obs), chunk_size):
        obs_chunk = obs[i:i+chunk_size]
        mod_chunk = mod[i:i+chunk_size]

        # Calculate metrics for chunk
        chunk_results = {
            'RMSE': ms.RMSE(obs_chunk, mod_chunk),
            'R2': ms.R2(obs_chunk, mod_chunk),
            'MAE': ms.MAE(obs_chunk, mod_chunk)
        }
        total_results.append(chunk_results)

    return total_results

Statistical Interpretation

Q: What's the difference between RMSE and MAE?

A: RMSE and MAE measure different aspects of error:

import numpy as np
import monet_stats as ms

# Example with different error distributions
obs = np.array([10, 10, 10, 10, 10])

# Case 1: Small errors uniformly
mod1 = np.array([10.1, 10.2, 9.9, 10.0, 10.1])

# Case 2: Large error on one point, small on others
mod2 = np.array([10.0, 10.0, 10.0, 10.0, 15.0])

print("Case 1 - Uniform errors:")
print(f"  RMSE: {ms.RMSE(obs, mod1):.3f}")
print(f"  MAE:  {ms.MAE(obs, mod1):.3f}")

print("\nCase 2 - One large error:")
print(f"  RMSE: {ms.RMSE(obs, mod2):.3f}")
print(f"  MAE:  {ms.MAE(obs, mod2):.3f}")

Key Differences:

RMSE squares errors, so large errors are heavily weighted
MAE treats all errors equally
RMSE is more sensitive to outliers

Q: When should I use skill scores vs. raw error metrics?

A: Use both for comprehensive evaluation:

import monet_stats as ms
import numpy as np

# Model vs. climatology reference
obs = np.array([15, 16, 14, 17, 18, 16, 15])
mod = np.array([15.5, 15.8, 14.2, 16.9, 17.2, 15.5, 14.8])
climatology = np.mean(obs)  # Simple climatology

# Raw error metrics
rmse_model = ms.RMSE(obs, mod)
rmse_climo = ms.RMSE(obs, [climatology] * len(obs))

# Skill score
skill_score = ms.NSE(obs, mod)

print(f"Model RMSE: {rmse_model:.3f}")
print(f"Climatology RMSE: {rmse_climo:.3f}")
print(f"Model Skill Score: {skill_score:.3f}")

# Interpretation:
# - Raw errors show absolute performance
# - Skill scores show performance relative to reference

Q: How do I interpret negative skill scores?

A: Negative skill scores indicate the model performs worse than the reference:

import monet_stats as ms
import numpy as np

# Model that performs worse than climatology
obs = np.array([1, 2, 3, 4, 5])
bad_model = np.array([10, 10, 10, 10, 10])  # Constant prediction

# Compare to climatology reference
climatology = np.mean(obs)

bad_skill = ms.NSE(obs, bad_model)
clim_skill = ms.NSE(obs, [climatology] * len(obs))

print(f"Bad model skill: {bad_skill:.3f}")
print(f"Climatology skill: {clim_skill:.3f}")

# Negative skill means model is worse than climatology
# This often happens with poor forecasts or non-stationary data

Spatial and Ensemble Issues

Q: My spatial verification metrics fail

A: Ensure your spatial data has the correct dimensions:

import numpy as np
import monet_stats as ms

# 2D spatial data (lat, lon)
obs_2d = np.random.normal(20, 2, (10, 10))  # 10x10 grid
mod_2d = obs_2d + np.random.normal(0, 1, (10, 10))

# This works with spatial metrics
fss = ms.FSS(obs_2d, mod_2d, window=5)

# This fails (1D data)
# fss = ms.FSS(obs_1d, mod_1d, window=5)  # Error!

Q: How do I verify ensemble forecasts?

A: Use ensemble-specific metrics and proper formatting:

import numpy as np
import monet_stats as ms

# Ensemble data: (n_members, n_times)
n_members = 50
n_times = 100

# Generate ensemble forecasts
ensemble = np.random.normal(20, 2, (n_members, n_times))
observed = np.random.normal(20, 1.5, n_times)

# Calculate ensemble statistics
ensemble_mean = np.mean(ensemble, axis=0)
ensemble_std = np.std(ensemble, axis=0)

# Ensemble metrics
crps = ms.CRPS(ensemble, observed)  # Continuous Ranked Probability Score
bss = ms.BSS(observed > 20, ensemble_mean > 20, threshold=0.5)

print(f"Ensemble CRPS: {crps:.3f}")
print(f"Ensemble BSS: {bss:.3f}")

# Spread-skill relationship
spread_skill_corr = ms.pearsonr(ensemble_std, 1/ensemble_mean)[0]
print(f"Spread-Skill Correlation: {spread_skill_corr:.3f}")

Common Error Messages

Error: "ValueError: cannot convert float NaN to integer"

Cause: Division by zero or insufficient data for integer conversion.

Solution:

import numpy as np
import monet_stats as ms

# Check for zero variance
obs = np.array([1, 1, 1, 1])
mod = np.array([1.1, 1.1, 1.1, 1.1])

# Check variance before calculation
if np.std(obs) == 0:
    print("Warning: Zero variance in observed data")
    r2 = 0.0  # Handle appropriately
else:
    r2 = ms.R2(obs, mod)

Error: "RuntimeWarning: invalid value encountered in divide"

Cause: Division by very small numbers or zero in calculations.

Solution:

import numpy as np
from monet_stats import NMB

def safe_nmb(obs, mod):
    """Safe NMB calculation with error handling"""
    obs_sum = np.sum(obs)
    mod_sum = np.sum(mod)

    if abs(obs_sum) < 1e-10:  # Very small denominator
        if abs(mod_sum) < 1e-10:
            return 0.0  # Both sums are effectively zero
        else:
            return np.sign(mod_sum) * np.inf  # Infinite bias

    return (mod_sum - obs_sum) / obs_sum

# Usage
nmb = safe_nmb(obs, mod)

Error: "MemoryError: Unable to allocate array"

Cause: Trying to process very large arrays in memory.

Solution:

import monet_stats as ms
import numpy as np

def process_large_dataset(obs_file, mod_file, chunk_size=100000):
    """Process large datasets in chunks"""
    # Load data in chunks (example with text files)
    obs_chunks = np.genfromtxt(obs_file, max_rows=chunk_size)
    mod_chunks = np.genfromtxt(mod_file, max_rows=chunk_size)

    results = []
    while len(obs_chunks) > 0:
        # Calculate metrics for chunk
        chunk_result = {
            'RMSE': ms.RMSE(obs_chunks, mod_chunks),
            'R2': ms.R2(obs_chunks, mod_chunks),
            'MAE': ms.MAE(obs_chunks, mod_chunks)
        }
        results.append(chunk_result)

        # Load next chunk
        obs_chunks = np.genfromtxt(obs_file, max_rows=chunk_size, skip_header=len(obs_chunks))
        mod_chunks = np.genfromtxt(mod_file, max_rows=chunk_size, skip_header=len(mod_chunks))

    return results

Best Practices

1. Data Preparation

import numpy as np
import monet_stats as ms

def prepare_data(obs, mod):
    """Clean and prepare data for analysis"""
    # Remove NaN values using pairwise deletion
    valid_mask = ~np.isnan(obs) & ~np.isnan(mod)
    obs_clean = obs[valid_mask]
    mod_clean = mod[valid_mask]

    # Check for sufficient data
    if len(obs_clean) < 10:
        raise ValueError("Insufficient valid data pairs")

    return obs_clean, mod_clean

# Usage
obs_clean, mod_clean = prepare_data(observed, modeled)
results = ms.RMSE(obs_clean, mod_clean)

2. Error Handling

def safe_metric_calculation(obs, mod, metric_func, **kwargs):
    """Safely calculate metrics with error handling"""
    try:
        result = metric_func(obs, mod, **kwargs)

        # Check for NaN or infinite results
        if not np.isfinite(result):
            print(f"Warning: Non-finite result for {metric_func.__name__}")
            return np.nan

        return result

    except Exception as e:
        print(f"Error in {metric_func.__name__}: {e}")
        return np.nan

# Usage
rmse = safe_metric_calculation(obs, mod, ms.RMSE)
r2 = safe_metric_calculation(obs, mod, ms.R2)

3. Comprehensive Analysis

def comprehensive_verification(obs, mod):
    """Perform comprehensive model verification"""
    results = {}

    # Error metrics
    results['RMSE'] = ms.RMSE(obs, mod)
    results['MAE'] = ms.MAE(obs, mod)
    results['MB'] = ms.MB(obs, mod)
    results['NMB'] = ms.NMB(obs, mod)

    # Skill scores
    results['R2'] = ms.R2(obs, mod)
    results['NSE'] = ms.NSE(obs, mod)
    results['KGE'] = ms.KGE(obs, mod)

    # Relative metrics
    results['MPE'] = ms.MPE(obs, mod)
    results['NME'] = ms.NME(obs, mod)

    return results

# Usage
verification_results = comprehensive_verification(observed, modeled)
for metric, value in verification_results.items():
    print(f"{metric}: {value:.4f}")

Getting Help

Where to Find Help

Documentation: Full Documentation
GitHub Issues: Report bugs or request features
Community Discussions: GitHub Discussions
Email Support: arl.webmaster@noaa.gov

How to Report Issues

When reporting issues, please include:

Environment Information:

import monet_stats
print(f"Monet Stats version: {monet_stats.__version__}")
import sys
print(f"Python version: {sys.version}")

Minimal Reproducible Example:

import numpy as np
import monet_stats as ms

# Your problematic code here
obs = np.array([1, 2, 3])
mod = np.array([1.1, 2.1, 2.9])

# This causes the error
result = ms.YourMetric(obs, mod)  # Replace with actual call

Expected vs. Actual Behavior:
What you expected to happen
What actually happened
Any error messages received

Contributing

If you'd like to contribute to Monet Stats:

Fork the repository on GitHub
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Submit a pull request

See the Contributing Guide for detailed instructions.