π Welcome to the climatrix
Tutorial¶
This tutorial will walk you through some typical use cases showing how climatrix
makes managing climate data easier.
We'll simulate sparse meteorological observations spread across Europe.
π οΈ Step A: Configure Access to CDS¶
Warning
If you already have a ~/.cdsapirc
file, you can skip this step.
To configure access to the CDS (Climate Data Store), run:
cm dataset config cds
To configure CDS store.
π₯ Step B: Download ERA5-Land Reanalysis Data¶
We'll use the ERA5-Land global reanalysis product. To download it, run:
cm dataset download era5-land --year 2018 --month 10 --day 12 --target ./era5-land.nc
Note
Downloading data can take a few minutesβplease be patient.
π Step C: Open the Dataset¶
Weβll open the dataset using the climatrix
accessor:
import xarray as xr
import climatrix as cm # (1)!
dset = xr.open_dataset("./era5-land.nc").cm
- Even though we're not using climatrix directly, we must import it to enable the
climatrix
xarray accessor to available.
π Step D: Shift to Signed Longitude Convention¶
ERA5-Land uses the positive longitude convention (\(\lambda \in [0, 360]\)). To make it easier to work with Europe, weβll convert it to the signed convention (\(\lambda \in [-180, 180]\)).
dset = dset.to_signed_longitude()
Warning
Changing longitude convention on a large dataset can be time and memory intensive.
π Step E: Subset to Europe¶
We'll now extract a region covering Europe:
europe = dset.subset(north=71, south=36, west=-24, east=35)
β±οΈ Step F: Select a Single Time Instant¶
Note
cliamtrix
currently doesnβt support plotting dynamic datasets. Letβs select a single timestamp.
To select a single time instant, let's use:
europe = europe.time("2018-10-12T04:00:00")
europe.plot()

π― Step G: Sample Data Around Warsaw¶
We'll create a sparse sample of data points around Warsaw, using a normal distribution:
WARSAW = (21.017532, 52.237049)
sparse = europe.sample_normal(number=5_00, center_point=WARSAW, sigma=1.5)
Tip
You can use the portion
argument instead of number
to sample a fraction of the dataset (e.g., 50%).
πΌοΈ Step H: Plot the Sparse Observations¶
Now we can plot the output:
sparse.plot()
Warning
Plotting requires downloading coastline and border data, so it may take longer the first time.
ποΈ Step H.1: Creating Custom Domains (Optional)¶
You can create custom domains with multiple axis types using the builder pattern. This is especially useful when working with vertical levels or custom time series:
from climatrix.dataset.domain import Domain
# Create a domain with vertical levels around Warsaw
custom_domain = (Domain.from_axes()
.vertical(pressure=[1000, 850, 700, 500]) # (1)!
.lat(latitude=[51.5, 52.0, 52.5])
.lon(longitude=[20.5, 21.0, 21.5])
.time(time=['2018-10-12T00:00', '2018-10-12T06:00'])
.sparse()) # (2)!
print(f"Custom domain size: {custom_domain.size}")
print(f"Has vertical axis: {custom_domain.has_axis('vertical')}")
- Define pressure levels as the vertical coordinate
- Create as sparse domain - vertical and time are independent dimensions
Domain Builder Features
The from_axes()
builder supports:
- Vertical axes:
pressure
,depth
,level
, etc. - Time axes: Various time coordinate names
- Custom names: Any parameter name for each axis type
- Flexible inputs: Slices, lists, or numpy arrays
π Step I: Reconstruct Using IDW¶
Weβll reconstruct a dense field from the sparse data using Inverse Distance Weighting (IDW):
idw = sparse.reconstruct(europe.domain, method="idw") # (1)!
idw.plot()
- We want to reconstruct data for all Europe (
europe.domain
).
Note
Note that we reconstructed the data for the entire Europe. Those visible artifacts are the result of too few samples concentrated around Warsaw. They are not representative for the entire Europe.
π Step J: Compare with Original Data¶
We'll use Comparison
object to visualize the differences.
import matplotlib.pyplot as plt # (1)!
cmp = cm.Comparison(europe, idw)
cmp.plot_diff()
cmp.plot_signed_diff_hist()
plt.show()
- We explicitly import
matplotlib
to be able to runplt.show()
and display figures.
π― Step K: Optimize Hyperparameters¶
To improve reconstruction quality, let's optimize the IDW hyperparameters. We'll split our sparse data for training and validation:
from climatrix.optim import HParamFinder
# Create training and validation datasets
train_sparse = europe.sample_normal(number=300, center_point=WARSAW, sigma=1.5) # (1)!
val_sparse = europe.sample_normal(number=200, center_point=WARSAW, sigma=1.5) # (2)!
# Find optimal hyperparameters
finder = HParamFinder(train_sparse, val_sparse, method="idw", n_iters=20) # (3)!
result = finder.optimize()
print(f"Best parameters: {result['best_params']}")
print(f"Best MAE score: {result['best_score']}")
- Training data - used to fit the hyperparameter optimization
- Validation data - used to evaluate parameter combinations
- Using fewer iterations for this tutorial example
π Step L: Apply Optimized Parameters¶
Now let's reconstruct using the optimized parameters:
# Reconstruct with optimized parameters
optimized_idw = sparse.reconstruct(
europe.domain,
method="idw",
**result['best_params'] # (1)!
)
# Compare optimized vs default reconstruction
optimized_cmp = cm.Comparison(europe, optimized_idw)
default_cmp = cm.Comparison(europe, idw)
print(f"Default IDW RMSE: {default_cmp.compute_rmse():.4f}")
print(f"Optimized IDW RMSE: {optimized_cmp.compute_rmse():.4f}")
optimized_idw.plot(title="Optimized IDW Reconstruction")
- Apply the best parameters found by the optimizer
Note
For hyperparameter optimization, make sure to install climatrix with: pip install climatrix[optim]
π Step M: Interactive plotting¶
You can interactively plot the resulting dataset. Just run
cm.plot.Plot(dataset=cm_ds).show(port=5000)
That will run locally the server and enables you to conveniently explore the dataset via web browser.
Tip
If the preview does not open, choose your faviourite web browser and open [http://localhost:5000/](http://localhost:5000/)
if you selected port 5000
.
Note
For interactive plotting, make sure to install climatrix with: pip install climatrix[plot]