Skip to content

๐Ÿš€ Getting Started with climatrix

Welcome to climatrix โ€“ a Python library designed for efficient sampling and reconstruction of climate datasets. This guide will help you set up and start using climatrix effectively.


๐Ÿ“ฆ Installation

๐Ÿ”ง Prerequisites

Ensure you have the following installed:

  • Python 3.12 or higher
  • pip (Python package installer)

๐Ÿ› ๏ธ Installing climatrix

You can install climatrix directly from GitHub:

pip install git+https://github.com/jamesWalczak/climatrix.git
Climatrix is already available on PyPI

The project can be downloaded with pip install climatrix.

๐Ÿงช Verifying the Installation

To confirm that climatrix is installed correctly, run the following in your Python environment:

import climatrix as cm

print(cm.__version__)

๐Ÿ” Exploring climatrix

The core functionality of climatrix revolves around the BaseClimatrixDataset and Domain classes, which provides methods for:

Creating BaseClimatrixDataset

You can create BaseClimatrixDataset directly, by passing xarray.DataArray or xarray.Daaset to the initializer:

Note

In the current version, climatrix supports only static (single-element or no time dimension) and single-variable datasets.

It means, BaseClimatrixDataset can be created based on xarray.DataArray or single-variable xarray.Dataset.

import climatrix as cm

dset = cm.BaseClimatrixDataset(xarray_dataset)

but climatrix was implemented as xarray accessor, so there is more convenient way to create BaseClimatrixDataset:

import climatrix as cm # (1)!

dset = xarray_dataset.cm
  1. Even though, we don't use climatrix explicitly, we need to import climatrix to make xarray accessor visible.
Warning

When using climatrix as accessor, remember to import climatrix first!

Accessing spatio-temporal axes

By using Climatrix, you can easily acces spatio temporal axis.

Info

You don't need to know the name of axis (lat, latitude or anything else), Climatrix automatically finds proper axis by matching regular expressions.

All predefined axis are available via Axis enum class.

To access latitude name, just use:

xarray_dataset.cm.latitude.name

and to access values, use:

xarray_dataset.cm.latitude.values

Below, you can find available attributes:

Attribute Meaning
latitude Axis corresponding to AxisType.LATITUDE for the dataset
longitude Axis corresponding to AxisType.LONGITUDE for the dataset
time Axis corresponding to AxisType.TIME for the dataset
point Axis corresponding to AxisType.POINT for the dataset
vertical Axis corresponding to AxisType.VERTICAL for the dataset

Subsetting dataset by geographical coordinates

Climatrix facilitates subsetting region based on bounding box. To select Europe, just use the following command:

europe = xarray_dataset.cm.subset(north=71, south=36, west=-24, east=35)
Warning

If you attempt to select region not aligned with the dataset longitude convention, Climatrix will inform you about it and ask for explicit update of convention.

Tip

With Climatrix chaning convention is easy!

To switch to signed longitude convention (\(\lambda \in [-180, 180]\)) use to_signed_longitude method.

europe = xarray_dataset.cm.to_signed_longitude()

To switch to positive-only longitude convention (\(\lambda \in [0, 360]\)), use to_positive_longitude method.

europe = xarray_dataset.cm.to_positive_longitude()

Selecting time

You can select time instants by integer (indices on time axis):

xr_dset = xr.tutorial.open_dataset("air_temperature")
single_time_instant = xr_dset.cm.itime(0)
several_time_instants = xr_dset.cm.itime([0, 100])
several_time_instants = xr_dset.cm.itime(slice(5, 200))

or by date:

xr_dset = xr.tutorial.open_dataset("air_temperature")
single_time_instant = xr_dset.cm.time("2013-02-10")
several_time_instants = xr_dset.cm.time(["2013-02-10", "2013-02-12"])
several_time_instants = xr_dset.cm.time(slice("2013-02-10", "2013-02-12"))
Tip

You can also use sel or isel method with AxisType.TIME.

Sampling data

In Climatrix there are following sampling methods implemented:

Sampling Description
uniform data are randomly (uniformly) sampled from the entire spatial domain
normal data are randomly (following normal distribution) sampled around the defined center point (locaion)

To sample \(10\%\) (\(0.1\)) of spatial points, use:

import xarray as xr
import climatrix as cm

xr_dset = xr.tutorial.open_dataset("air_temperature") # (1)!
dset = xr_dset.cm.itime(0) # (2)!

sparse = dset.sample_uniform(portion=0.1)

sparse.plot(title="Uniform Sampling (10%)")
  1. We will use tutorial dataset from xarray. To use it, some extra packages might be required.
  2. We select just a first time instant (here, 2013-01-01T00:00)
Tip

If you need exact number of resulting points, use number parameter. It is valid also for sample_normal

Note

For sampling method, you can specify NaN-policy (nan parameter). There are three options:

  • ignore - NaN values will be sampled,
  • raise - error will be raised if any NaN valu will be found
  • resample - attempts to return not-NaN values

Reconstructing

The main functionality of the accessor is to ease data reconstruction. You can reconstruct dense domain from a sparse one, or sparse from another sparse.

import xarray as xr
import climatrix as cm

xr_dset = xr.tutorial.open_dataset("air_temperature") 
dset = xr_dset.cm.itime(0)
sparse = dset.sample_uniform(portion=0.1) # (1)!

dense = sparse.reconstruct(dset.domain, method="idw") # (2)! 
dense.plot(title="Reconstructed dataset")
  1. First, we need to sample some sparse dataset.
  2. Note, we use domain of dset not xr_dset.cm. We want to reconstruct to the original domain after time subsetting
Note

You can pass extra reconstructor-specific arguments as the last (recon_kwargs) argument of the reconstruct method. To find definitions of these extra arguments, refer to Reconstruction section in API reference.

Plotting

To plot dataset (for either dense or sparse domain), just use plot method:

dset = xr_dset.cm.itime(0).plot()
Warning

At the moment, plotting is enabled only for static datasets. Remember to select a single time instant before plotting.