Skip to content

Performance Guide

XRegrid is designed for high-performance regridding. This guide shows you how to get the best performance for your specific use case.

Performance Overview

XRegrid provides significant performance improvements over existing solutions:

Resolution Grid Points XRegrid xESMF Speedup
1.0° Global 64,800 0.7 ms 44 ms ~60x
0.5° Global 259,200 4.2 ms 178 ms ~40x
0.25° Global 1,036,800 23 ms 750 ms ~30x
0.1° Global 6,480,000 350 ms 6.5s* ~18x

* xESMF time for 0.1° is estimated based on linear scaling trend.

Key Performance Features

1. Optimized Sparse Matrix Operations

XRegrid uses optimized sparse matrix-vector and matrix-matrix multiplications. By transposing and flattening data into a 2D format (non_spatial, spatial), we can leverage high-performance BLAS routines through SciPy:

# XRegrid automatically vectorizes all non-spatial dimensions
# Efficient (matrix @ data.T).T pattern avoids redundant copies
result = (weights_matrix @ data_2d.T).T

2. Efficient Memory Usage

Scipy sparse matrices have lower memory overhead compared to other sparse libraries:

  • More compact storage format
  • Better cache locality
  • Optimized for matrix-matrix multiplication

3. Proper ESMF Integration

  • Dask-Parallel Weight Generation: Large grids can have weights generated in parallel across Dask workers.
  • Truly Distributed Weight Handling: Weights are assembled and stored directly on the Dask cluster as Futures, protecting the driver from Out-Of-Memory (OOM) crashes on massive grids.
  • Vectorized Mesh Triangulation: Conservative regridding for unstructured meshes (MPAS/UGRID) uses NumPy vectorization, providing a ~13x speedup over traditional iterative approaches during initialization.
  • Efficient Index Reconstruction: Workers reconstruct global destination indices locally, minimizing driver-worker communication.
  • Proper Coordinate Handling: Automatic transposition to (longitude, latitude) as required by ESMF.

Optimization Strategies

Weight Reuse

The most important optimization for repeated regridding:

# First time: compute and save weights
regridder = Regridder(
    source, target,
    method='bilinear',
    reuse_weights=True,
    filename='global_1deg_to_05deg_weights.nc'
)

# Subsequent times: load existing weights (much faster!)
regridder = Regridder(
    source, target,
    method='bilinear',
    reuse_weights=True,
    filename='global_1deg_to_05deg_weights.nc'
)

Performance Impact: - Weight generation: 10-60 seconds (depending on grid size) - Weight loading: 0.1-2 seconds - Speedup: 10-30x for the initialization phase

Global Grid Periodicity

Always use periodic=True for global grids:

regridder = Regridder(
    source, target,
    method='bilinear',
    periodic=True  # Critical for performance and accuracy!
)

Why this matters: - Enables proper spherical geometry calculations - Reduces number of required interpolation points - Handles dateline crossing correctly

Stationary Mask Caching

A common pattern in climate data is a fixed land-sea mask. XRegrid detects if the NaN mask is identical across multiple time steps and caches the weight normalization factors:

# skipna=True handles NaNs by re-normalizing weights
# If the mask is stationary (constant over time), normalization is only computed once
result = regridder(da_with_nans, skipna=True)

Performance Impact: - First call/chunk: Computes weights and mask normalization - Subsequent calls/chunks: Reuses normalization cache - Speedup: ~2x for NaN-heavy datasets

Dask Parallelization

XRegrid scales linearly with Dask chunks. It also utilizes worker-local caching to avoid re-sending large weight matrices over the network:

# Load data with appropriate chunks
data = xr.open_dataset('large_file.nc', chunks={'time': 20, 'lat': 180, 'lon': 360})

# Regridding preserves chunks and parallelizes automatically
result = regridder(data.temperature)

Chunking Guidelines: - Time dimension: 10-50 time steps per chunk - Spatial dimensions: Keep spatial dimensions unchunked if possible - Memory target: 100-500 MB per chunk

Example: Optimal Chunking

# For 0.25° global data (1440x720 spatial)
# 100 time steps, ~4GB total
data = xr.open_dataset(
    'large_climate_data.nc',
    chunks={
        'time': 25,    # 25 time steps per chunk
        'lat': 720,    # Keep spatial dims unchunked
        'lon': 1440    # for optimal regridding
    }
)

API Usability: XRegrid vs. ESMPy

While XRegrid is faster than xESMF, it also provides a much more intuitive API than raw ESMPy. ESMPy is a powerful low-level interface, but it requires substantial boilerplate to work with xarray datasets.

Code Comparison

Here is what is required to regrid a simple lat-lon dataset.

Using raw ESMPy

import esmpy
import numpy as np
import xarray as xr

# Load data
ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)

# 1. Create Source Grid (Manual coordinate handling)
src_grid = esmpy.Grid(
    np.array([ds.lon.size, ds.lat.size]),
    staggerloc=[esmpy.StaggerLoc.CENTER],
    coord_sys=esmpy.CoordSys.SPH_DEG
)
src_lon_ptr = src_grid.get_coords(0)
src_lat_ptr = src_grid.get_coords(1)
lon_mesh, lat_mesh = np.meshgrid(ds.lon.values, ds.lat.values)
src_lon_ptr[...] = lon_mesh.T  # ESMF uses (lon, lat) / Fortran order
src_lat_ptr[...] = lat_mesh.T

# 2. Create Target Grid
dst_lon = np.arange(200, 331, 1.0)
dst_lat = np.arange(15, 76, 1.0)
dst_grid = esmpy.Grid(
    np.array([len(dst_lon), len(dst_lat)]),
    staggerloc=[esmpy.StaggerLoc.CENTER],
    coord_sys=esmpy.CoordSys.SPH_DEG
)
dst_lon_ptr = dst_grid.get_coords(0)
dst_lat_ptr = dst_grid.get_coords(1)
lon_mesh_dst, lat_mesh_dst = np.meshgrid(dst_lon, dst_lat)
dst_lon_ptr[...] = lon_mesh_dst.T
dst_lat_ptr[...] = lat_mesh_dst.T

# 3. Create Fields and Initialize Regrid
src_field = esmpy.Field(src_grid, name="air")
dst_field = esmpy.Field(dst_grid, name="air_regridded")
regrid = esmpy.Regrid(src_field, dst_field, regrid_method=esmpy.RegridMethod.BILINEAR)

# 4. Apply Regrid (Requires manual data copy)
src_field.data[...] = ds.air.values.T
regrid(src_field, dst_field)

# 5. Extract result back to xarray
result = xr.DataArray(
    dst_field.data.T,
    coords={"lat": dst_lat, "lon": dst_lon},
    dims=("lat", "lon")
)

Using XRegrid

from xregrid import Regridder

# Define target grid as an xarray Dataset
target_grid = xr.Dataset({
    "lat": (["lat"], np.arange(15, 76, 1.0)),
    "lon": (["lon"], np.arange(200, 331, 1.0))
})

# Create and apply in two steps
regridder = Regridder(ds, target_grid)
result = regridder(ds.air)

Advantages of XRegrid

  1. Dask Support: XRegrid works natively with Dask-backed DataArrays, parallelizing the weight application across chunks. ESMPy requires manual implementation of this logic.
  2. Metadata Preservation: XRegrid automatically preserves name, attributes, and non-spatial coordinates.
  3. Automatic Detection: XRegrid uses cf-xarray to automatically identify latitude and longitude, even if they aren't named lat or lon.
  4. Sparse Application: XRegrid uses optimized SciPy sparse matrices for applying weights, which is often faster than the built-in ESMPy __call__ for large datasets.

Detailed Performance Analysis

Single Time Step Performance

Resolution Total Points Weight Apply Time Memory Usage
1.0° 64,800 0.7 ms ~10 MB
0.5° 259,200 4.2 ms ~25 MB
0.25° 1,036,800 23 ms ~80 MB
0.1° 6,480,000 350 ms ~450 MB

Multi-Time Step Performance

Vectorization and stationary mask caching significantly improve performance for multi-time step datasets.

Time Steps Resolution Total Time Time per Step
10 1.0° 9 ms 0.9 ms
100 1.0° 65 ms 0.65 ms
10 0.25° 260 ms 26 ms
100 0.25° 2.3s 23 ms

Note: Performance improves with more time steps due to vectorization

Dask Scaling Performance

Workers Chunks Resolution Time Speedup
1 4 0.5° 2.1s 1.0x
4 4 0.5° 0.6s 3.5x
8 8 0.5° 0.3s 7.0x
1 10 0.25° 8.5s 1.0x
4 10 0.25° 2.4s 3.5x
8 20 0.25° 1.1s 7.7x

Method-Specific Performance

Bilinear

  • Best for: Continuous fields (temperature, pressure)
  • Performance: Fastest method
  • Memory: Low memory usage

Conservative

  • Best for: Flux quantities (precipitation, radiation)
  • Performance: ~2-3x slower than bilinear
  • Memory: Higher memory usage due to more complex weights

Nearest Neighbor

  • Best for: Categorical data (land use, vegetation types)
  • Performance: Fastest for sparse grids
  • Memory: Lowest memory usage
# Performance comparison for 0.25° global grid
methods = {
    'bilinear': '53 ms',
    'conservative': '125 ms',
    'nearest_s2d': '31 ms'
}

Large-Scale Optimization

Ultra-High Resolution (3km Global)

For extremely large grids (>50M points):

# Example: 3km global grid (~88M points)
regridder = Regridder(
    source_3km, target_1deg,
    method='conservative',  # Often required for such large ratios
    reuse_weights=True,     # Essential!
    filename='3km_to_1deg.nc'
)

# Process in temporal chunks
for year in years:
    data = load_year_data(year, chunks={'time': 12})
    result = regridder(data)
    result.to_netcdf(f'regridded_{year}.nc')

Memory Management

For memory-constrained environments:

# Use smaller chunks
data = xr.open_dataset(
    'huge_file.nc',
    chunks={'time': 5, 'lat': 360, 'lon': 720}
)

# Process iteratively if needed
for i, chunk in enumerate(data.time.groupby('time.year')):
    year, year_data = chunk
    result = regridder(year_data.temperature)
    result.to_netcdf(f'output_{year}.nc')

Benchmarking Your Setup

Use this script to benchmark XRegrid on your system:

import time
import numpy as np
import xarray as xr
from xregrid import Regridder

def benchmark_regridding(source_res, target_res, time_steps=10):
    """Benchmark regridding performance."""
    # Create grids
    source = xr.Dataset({
        'lat': (['lat'], np.linspace(-90, 90, source_res)),
        'lon': (['lon'], np.linspace(0, 359, source_res*2))
    })

    target = xr.Dataset({
        'lat': (['lat'], np.linspace(-90, 90, target_res)),
        'lon': (['lon'], np.linspace(0, 359.5, target_res*2))
    })

    # Create test data
    data = xr.DataArray(
        np.random.rand(time_steps, source_res, source_res*2),
        dims=['time', 'lat', 'lon'],
        coords={'lat': source.lat, 'lon': source.lon}
    )

    # Time regridder creation
    start = time.time()
    regridder = Regridder(source, target, method='bilinear')
    creation_time = time.time() - start

    # Time regridding
    start = time.time()
    result = regridder(data)
    regrid_time = time.time() - start

    print(f"Grid: {source_res}° → {target_res}°")
    print(f"Creation time: {creation_time:.3f}s")
    print(f"Regrid time: {regrid_time:.3f}s")
    print(f"Time per step: {regrid_time/time_steps:.3f}s")
    print(f"Points/second: {result.size/regrid_time:,.0f}")
    print()

# Run benchmarks
benchmark_regridding(180, 360)    # 1.0° to 0.5°
benchmark_regridding(720, 1440)   # 0.25° to 0.125°
benchmark_regridding(1800, 3600)  # 0.1° to 0.05°

Performance Troubleshooting

Slow Weight Generation

Symptoms: Long delays during regridder creation

Solutions: 1. Use weight reuse: reuse_weights=True 2. Check coordinate ordering and validity 3. Verify grid periodicity settings 4. Consider using a coarser method first

Slow Weight Application

Symptoms: Slow data regridding after regridder creation

Solutions: 1. Check Dask chunking strategy 2. Verify coordinate dimensions match expected order 3. Use periodic=True for global grids 4. Monitor memory usage - may need smaller chunks

Memory Issues

Symptoms: Out-of-memory errors or system slowdown

Solutions: 1. Reduce chunk sizes in time dimension 2. Process data in temporal batches 3. Use conservative method only when necessary 4. Enable weight reuse to avoid recomputation

Poor Parallel Scaling

Symptoms: Adding workers doesn't improve performance

Solutions: 1. Increase number of chunks to match workers 2. Check that spatial dimensions aren't chunked 3. Verify adequate memory per worker 4. Monitor CPU and memory usage during processing

Best Practices Summary

  1. Always use weight reuse for repeated regridding
  2. Set periodic=True for global grids
  3. Chunk in time only for optimal performance
  4. Target 100-500 MB per chunk for memory efficiency
  5. Save weights to fast storage (SSD) for quick loading
  6. Monitor memory usage and adjust chunks as needed
  7. Use conservative method sparingly - only when flux conservation is critical
  8. Benchmark your specific use case to find optimal settings

Following these guidelines, you should see substantial performance improvements over other regridding solutions, especially for large or frequently-used grids.