Performance Guide¶

XRegrid is designed for high-performance regridding. This guide shows you how to get the best performance for your specific use case.

Performance Overview¶

XRegrid provides significant performance improvements over existing solutions:

Resolution	Grid Points	XRegrid	xESMF	Speedup
1.0° Global	64,800	0.7 ms	44 ms	~60x
0.5° Global	259,200	4.2 ms	178 ms	~40x
0.25° Global	1,036,800	23 ms	750 ms	~30x
0.1° Global	6,480,000	350 ms	6.5s*	~18x

* xESMF time for 0.1° is estimated based on linear scaling trend.

Key Performance Features¶

1. Optimized Sparse Matrix Operations¶

XRegrid uses optimized sparse matrix-vector and matrix-matrix multiplications. By transposing and flattening data into a 2D format (non_spatial, spatial), we can leverage high-performance BLAS routines through SciPy:

# XRegrid automatically vectorizes all non-spatial dimensions
# Efficient (matrix @ data.T).T pattern avoids redundant copies
result = (weights_matrix @ data_2d.T).T

2. Efficient Memory Usage¶

Scipy sparse matrices have lower memory overhead compared to other sparse libraries:

More compact storage format
Better cache locality
Optimized for matrix-matrix multiplication

3. Proper ESMF Integration¶

Dask-Parallel Weight Generation: Large grids can have weights generated in parallel across Dask workers.
Truly Distributed Weight Handling: Weights are assembled and stored directly on the Dask cluster as Futures, protecting the driver from Out-Of-Memory (OOM) crashes on massive grids.
Vectorized Mesh Triangulation: Conservative regridding for unstructured meshes (MPAS/UGRID) uses NumPy vectorization, providing a ~13x speedup over traditional iterative approaches during initialization.
Efficient Index Reconstruction: Workers reconstruct global destination indices locally, minimizing driver-worker communication.
Proper Coordinate Handling: Automatic transposition to (longitude, latitude) as required by ESMF.

Optimization Strategies¶

Weight Reuse¶

The most important optimization for repeated regridding:

# First time: compute and save weights
regridder = Regridder(
    source, target,
    method='bilinear',
    reuse_weights=True,
    filename='global_1deg_to_05deg_weights.nc'
)

# Subsequent times: load existing weights (much faster!)
regridder = Regridder(
    source, target,
    method='bilinear',
    reuse_weights=True,
    filename='global_1deg_to_05deg_weights.nc'
)

Performance Impact: - Weight generation: 10-60 seconds (depending on grid size) - Weight loading: 0.1-2 seconds - Speedup: 10-30x for the initialization phase

Global Grid Periodicity¶

Always use periodic=True for global grids:

regridder = Regridder(
    source, target,
    method='bilinear',
    periodic=True  # Critical for performance and accuracy!
)

Why this matters: - Enables proper spherical geometry calculations - Reduces number of required interpolation points - Handles dateline crossing correctly

Stationary Mask Caching¶

A common pattern in climate data is a fixed land-sea mask. XRegrid detects if the NaN mask is identical across multiple time steps and caches the weight normalization factors:

# skipna=True handles NaNs by re-normalizing weights
# If the mask is stationary (constant over time), normalization is only computed once
result = regridder(da_with_nans, skipna=True)

Performance Impact: - First call/chunk: Computes weights and mask normalization - Subsequent calls/chunks: Reuses normalization cache - Speedup: ~2x for NaN-heavy datasets

Dask Parallelization¶

XRegrid scales linearly with Dask chunks. It also utilizes worker-local caching to avoid re-sending large weight matrices over the network:

# Load data with appropriate chunks
data = xr.open_dataset('large_file.nc', chunks={'time': 20, 'lat': 180, 'lon': 360})

# Regridding preserves chunks and parallelizes automatically
result = regridder(data.temperature)

Chunking Guidelines: - Time dimension: 10-50 time steps per chunk - Spatial dimensions: Keep spatial dimensions unchunked if possible - Memory target: 100-500 MB per chunk

Example: Optimal Chunking¶

# For 0.25° global data (1440x720 spatial)
# 100 time steps, ~4GB total
data = xr.open_dataset(
    'large_climate_data.nc',
    chunks={
        'time': 25,    # 25 time steps per chunk
        'lat': 720,    # Keep spatial dims unchunked
        'lon': 1440    # for optimal regridding
    }
)

API Usability: XRegrid vs. ESMPy¶

While XRegrid is faster than xESMF, it also provides a much more intuitive API than raw ESMPy. ESMPy is a powerful low-level interface, but it requires substantial boilerplate to work with xarray datasets.

Code Comparison¶

Here is what is required to regrid a simple lat-lon dataset.

Using raw ESMPy¶

import esmpy
import numpy as np
import xarray as xr

# Load data
ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)

# 1. Create Source Grid (Manual coordinate handling)
src_grid = esmpy.Grid(
    np.array([ds.lon.size, ds.lat.size]),
    staggerloc=[esmpy.StaggerLoc.CENTER],
    coord_sys=esmpy.CoordSys.SPH_DEG
)
src_lon_ptr = src_grid.get_coords(0)
src_lat_ptr = src_grid.get_coords(1)
lon_mesh, lat_mesh = np.meshgrid(ds.lon.values, ds.lat.values)
src_lon_ptr[...] = lon_mesh.T  # ESMF uses (lon, lat) / Fortran order
src_lat_ptr[...] = lat_mesh.T

# 2. Create Target Grid
dst_lon = np.arange(200, 331, 1.0)
dst_lat = np.arange(15, 76, 1.0)
dst_grid = esmpy.Grid(
    np.array([len(dst_lon), len(dst_lat)]),
    staggerloc=[esmpy.StaggerLoc.CENTER],
    coord_sys=esmpy.CoordSys.SPH_DEG
)
dst_lon_ptr = dst_grid.get_coords(0)
dst_lat_ptr = dst_grid.get_coords(1)
lon_mesh_dst, lat_mesh_dst = np.meshgrid(dst_lon, dst_lat)
dst_lon_ptr[...] = lon_mesh_dst.T
dst_lat_ptr[...] = lat_mesh_dst.T

# 3. Create Fields and Initialize Regrid
src_field = esmpy.Field(src_grid, name="air")
dst_field = esmpy.Field(dst_grid, name="air_regridded")
regrid = esmpy.Regrid(src_field, dst_field, regrid_method=esmpy.RegridMethod.BILINEAR)

# 4. Apply Regrid (Requires manual data copy)
src_field.data[...] = ds.air.values.T
regrid(src_field, dst_field)

# 5. Extract result back to xarray
result = xr.DataArray(
    dst_field.data.T,
    coords={"lat": dst_lat, "lon": dst_lon},
    dims=("lat", "lon")
)

Using XRegrid¶

from xregrid import Regridder

# Define target grid as an xarray Dataset
target_grid = xr.Dataset({
    "lat": (["lat"], np.arange(15, 76, 1.0)),
    "lon": (["lon"], np.arange(200, 331, 1.0))
})

# Create and apply in two steps
regridder = Regridder(ds, target_grid)
result = regridder(ds.air)

Advantages of XRegrid¶

Dask Support: XRegrid works natively with Dask-backed DataArrays, parallelizing the weight application across chunks. ESMPy requires manual implementation of this logic.
Metadata Preservation: XRegrid automatically preserves name, attributes, and non-spatial coordinates.
Automatic Detection: XRegrid uses cf-xarray to automatically identify latitude and longitude, even if they aren't named lat or lon.
Sparse Application: XRegrid uses optimized SciPy sparse matrices for applying weights, which is often faster than the built-in ESMPy __call__ for large datasets.

Detailed Performance Analysis¶

Single Time Step Performance¶

Resolution	Total Points	Weight Apply Time	Memory Usage
1.0°	64,800	0.7 ms	~10 MB
0.5°	259,200	4.2 ms	~25 MB
0.25°	1,036,800	23 ms	~80 MB
0.1°	6,480,000	350 ms	~450 MB

Multi-Time Step Performance¶

Vectorization and stationary mask caching significantly improve performance for multi-time step datasets.

Time Steps	Resolution	Total Time	Time per Step
10	1.0°	9 ms	0.9 ms
100	1.0°	65 ms	0.65 ms
10	0.25°	260 ms	26 ms
100	0.25°	2.3s	23 ms

Note: Performance improves with more time steps due to vectorization

Dask Scaling Performance¶

Workers	Chunks	Resolution	Time	Speedup
1	4	0.5°	2.1s	1.0x
4	4	0.5°	0.6s	3.5x
8	8	0.5°	0.3s	7.0x
1	10	0.25°	8.5s	1.0x
4	10	0.25°	2.4s	3.5x
8	20	0.25°	1.1s	7.7x

Method-Specific Performance¶

Bilinear¶

Best for: Continuous fields (temperature, pressure)
Performance: Fastest method
Memory: Low memory usage

Conservative¶

Best for: Flux quantities (precipitation, radiation)
Performance: ~2-3x slower than bilinear
Memory: Higher memory usage due to more complex weights

Nearest Neighbor¶

Best for: Categorical data (land use, vegetation types)
Performance: Fastest for sparse grids
Memory: Lowest memory usage

# Performance comparison for 0.25° global grid
methods = {
    'bilinear': '53 ms',
    'conservative': '125 ms',
    'nearest_s2d': '31 ms'
}

Large-Scale Optimization¶

Ultra-High Resolution (3km Global)¶

For extremely large grids (>50M points):

# Example: 3km global grid (~88M points)
regridder = Regridder(
    source_3km, target_1deg,
    method='conservative',  # Often required for such large ratios
    reuse_weights=True,     # Essential!
    filename='3km_to_1deg.nc'
)

# Process in temporal chunks
for year in years:
    data = load_year_data(year, chunks={'time': 12})
    result = regridder(data)
    result.to_netcdf(f'regridded_{year}.nc')

Memory Management¶

For memory-constrained environments:

# Use smaller chunks
data = xr.open_dataset(
    'huge_file.nc',
    chunks={'time': 5, 'lat': 360, 'lon': 720}
)

# Process iteratively if needed
for i, chunk in enumerate(data.time.groupby('time.year')):
    year, year_data = chunk
    result = regridder(year_data.temperature)
    result.to_netcdf(f'output_{year}.nc')

Benchmarking Your Setup¶

Use this script to benchmark XRegrid on your system:

import time
import numpy as np
import xarray as xr
from xregrid import Regridder

def benchmark_regridding(source_res, target_res, time_steps=10):
    """Benchmark regridding performance."""
    # Create grids
    source = xr.Dataset({
        'lat': (['lat'], np.linspace(-90, 90, source_res)),
        'lon': (['lon'], np.linspace(0, 359, source_res*2))
    })

    target = xr.Dataset({
        'lat': (['lat'], np.linspace(-90, 90, target_res)),
        'lon': (['lon'], np.linspace(0, 359.5, target_res*2))
    })

    # Create test data
    data = xr.DataArray(
        np.random.rand(time_steps, source_res, source_res*2),
        dims=['time', 'lat', 'lon'],
        coords={'lat': source.lat, 'lon': source.lon}
    )

    # Time regridder creation
    start = time.time()
    regridder = Regridder(source, target, method='bilinear')
    creation_time = time.time() - start

    # Time regridding
    start = time.time()
    result = regridder(data)
    regrid_time = time.time() - start

    print(f"Grid: {source_res}° → {target_res}°")
    print(f"Creation time: {creation_time:.3f}s")
    print(f"Regrid time: {regrid_time:.3f}s")
    print(f"Time per step: {regrid_time/time_steps:.3f}s")
    print(f"Points/second: {result.size/regrid_time:,.0f}")
    print()

# Run benchmarks
benchmark_regridding(180, 360)    # 1.0° to 0.5°
benchmark_regridding(720, 1440)   # 0.25° to 0.125°
benchmark_regridding(1800, 3600)  # 0.1° to 0.05°

Performance Troubleshooting¶

Slow Weight Generation¶

Symptoms: Long delays during regridder creation

Solutions: 1. Use weight reuse: reuse_weights=True 2. Check coordinate ordering and validity 3. Verify grid periodicity settings 4. Consider using a coarser method first

Slow Weight Application¶

Symptoms: Slow data regridding after regridder creation

Solutions: 1. Check Dask chunking strategy 2. Verify coordinate dimensions match expected order 3. Use periodic=True for global grids 4. Monitor memory usage - may need smaller chunks

Memory Issues¶

Symptoms: Out-of-memory errors or system slowdown

Solutions: 1. Reduce chunk sizes in time dimension 2. Process data in temporal batches 3. Use conservative method only when necessary 4. Enable weight reuse to avoid recomputation

Poor Parallel Scaling¶

Symptoms: Adding workers doesn't improve performance

Solutions: 1. Increase number of chunks to match workers 2. Check that spatial dimensions aren't chunked 3. Verify adequate memory per worker 4. Monitor CPU and memory usage during processing

Best Practices Summary¶

Always use weight reuse for repeated regridding
Set periodic=True for global grids
Chunk in time only for optimal performance
Target 100-500 MB per chunk for memory efficiency
Save weights to fast storage (SSD) for quick loading
Monitor memory usage and adjust chunks as needed
Use conservative method sparingly - only when flux conservation is critical
Benchmark your specific use case to find optimal settings

Following these guidelines, you should see substantial performance improvements over other regridding solutions, especially for large or frequently-used grids.