๐ Getting Started with climatrix
¶
Welcome to climatrix โ a Python library designed for efficient sampling and reconstruction of climate datasets. This guide will help you set up and start using climatrix
effectively.
๐ฆ Installation¶
๐ง Prerequisites¶
Ensure you have the following installed:
- Python 3.12 or higher
- pip (Python package installer)
๐ ๏ธ Installing climatrix
¶
You can install climatrix
directly from GitHub:
pip install git+https://github.com/jamesWalczak/climatrix.git
Climatrix is already available on PyPI
The project can be downloaded with pip install climatrix
.
๐งช Verifying the Installation¶
To confirm that climatrix
is installed correctly, run the following in your Python environment:
import climatrix as cm
print(cm.__version__)
๐ Exploring climatrix
¶
The core functionality of climatrix
revolves around the BaseClimatrixDataset
and Domain
classes, which provides methods for:
- Accessing spatio-temporal axes
- Subsetting datasets based on geographic bounds,
- Selecting time,
- Sampling data using uniform or normal distributions,
- Reconstructing datasets from samples,
- Plotting data for visualization.
Creating BaseClimatrixDataset
¶
You can create BaseClimatrixDataset
directly, by passing xarray.DataArray
or xarray.Daaset
to the initializer:
Note
In the current version, climatrix
supports only static (single-element or no time dimension) and single-variable datasets.
It means, BaseClimatrixDataset
can be created based
on xarray.DataArray
or single-variable xarray.Dataset
.
import climatrix as cm
dset = cm.BaseClimatrixDataset(xarray_dataset)
but climatrix
was implemented as xarray
accessor, so there is more convenient way to create BaseClimatrixDataset
:
import climatrix as cm # (1)!
dset = xarray_dataset.cm
- Even though, we don't use
climatrix
explicitly, we need to import climatrix to makexarray
accessor visible.
Warning
When using climatrix
as accessor, remember to import climatrix
first!
Accessing spatio-temporal axes¶
By using Climatrix, you can easily acces spatio temporal axis.
Info
You don't need to know the name of axis (lat
, latitude
or anything else), Climatrix automatically finds proper axis by matching regular expressions.
All predefined axis are available via Axis
enum class.
To access latitude name, just use:
xarray_dataset.cm.latitude.name
and to access values, use:
xarray_dataset.cm.latitude.values
Below, you can find available attributes:
Attribute | Meaning |
---|---|
latitude |
Axis corresponding to AxisType.LATITUDE for the dataset |
longitude |
Axis corresponding to AxisType.LONGITUDE for the dataset |
time |
Axis corresponding to AxisType.TIME for the dataset |
point |
Axis corresponding to AxisType.POINT for the dataset |
vertical |
Axis corresponding to AxisType.VERTICAL for the dataset |
Subsetting dataset by geographical coordinates¶
Climatrix facilitates subsetting region based on bounding box. To select Europe, just use the following command:
europe = xarray_dataset.cm.subset(north=71, south=36, west=-24, east=35)
Warning
If you attempt to select region not aligned with the dataset longitude convention, Climatrix will inform you about it and ask for explicit update of convention.
Tip
With Climatrix chaning convention is easy!
To switch to signed longitude convention (\(\lambda \in [-180, 180]\)) use to_signed_longitude
method.
europe = xarray_dataset.cm.to_signed_longitude()
To switch to positive-only longitude convention (\(\lambda \in [0, 360]\)), use to_positive_longitude
method.
europe = xarray_dataset.cm.to_positive_longitude()
Selecting time¶
You can select time instants by integer (indices on time axis):
xr_dset = xr.tutorial.open_dataset("air_temperature")
single_time_instant = xr_dset.cm.itime(0)
several_time_instants = xr_dset.cm.itime([0, 100])
several_time_instants = xr_dset.cm.itime(slice(5, 200))
or by date:
xr_dset = xr.tutorial.open_dataset("air_temperature")
single_time_instant = xr_dset.cm.time("2013-02-10")
several_time_instants = xr_dset.cm.time(["2013-02-10", "2013-02-12"])
several_time_instants = xr_dset.cm.time(slice("2013-02-10", "2013-02-12"))
Sampling data¶
In Climatrix there are following sampling methods implemented:
Sampling | Description |
---|---|
uniform | data are randomly (uniformly) sampled from the entire spatial domain |
normal | data are randomly (following normal distribution) sampled around the defined center point (locaion) |
To sample \(10\%\) (\(0.1\)) of spatial points, use:
import xarray as xr
import climatrix as cm
xr_dset = xr.tutorial.open_dataset("air_temperature") # (1)!
dset = xr_dset.cm.itime(0) # (2)!
sparse = dset.sample_uniform(portion=0.1)
sparse.plot(title="Uniform Sampling (10%)")
- We will use tutorial dataset from
xarray
. To use it, some extra packages might be required. - We select just a first time instant (here, 2013-01-01T00:00)
Tip
If you need exact number of resulting points, use number
parameter. It is valid also for sample_normal
Note
For sampling method, you can specify NaN-policy (nan
parameter). There are three options:
ignore
- NaN values will be sampled,raise
- error will be raised if any NaN valu will be foundresample
- attempts to return not-NaN values
Reconstructing¶
The main functionality of the accessor is to ease data reconstruction. You can reconstruct dense domain from a sparse one, or sparse from another sparse.
import xarray as xr
import climatrix as cm
xr_dset = xr.tutorial.open_dataset("air_temperature")
dset = xr_dset.cm.itime(0)
sparse = dset.sample_uniform(portion=0.1) # (1)!
dense = sparse.reconstruct(dset.domain, method="idw") # (2)!
dense.plot(title="Reconstructed dataset")
- First, we need to sample some sparse dataset.
- Note, we use domain of
dset
notxr_dset.cm
. We want to reconstruct to the original domain after time subsetting
Note
You can pass extra reconstructor-specific arguments as the last (recon_kwargs
) argument of the reconstruct
method. To find definitions of these extra arguments, refer to Reconstruction section in API reference.
Plotting¶
To plot dataset (for either dense or sparse domain), just use plot
method:
dset = xr_dset.cm.itime(0).plot()
Warning
At the moment, plotting is enabled only for static datasets. Remember to select a single time instant before plotting.