๐งช API Reference¶
Welcome to the climatrix
API reference. Below you'll find details on key modules, classes, and methods โ with examples and usage tips to help you integrate it smoothly into your climate data workflows.
Abstract
The main module climatrix
provides tools to extend xarray
datasets for climate subsetting, sampling, reconstruction. It is accessible via accessor.
The library contains a few public classes:
Class name | Description |
---|---|
AxisType |
Enumerator class for type of spatio-temporal axes |
Axis |
Class managing spatio-temporal axes |
BaseClimatrixDataset |
Base class for managing xarray data |
Domain |
Base class for domain-specific operations |
SparseDomain |
Subclass of Domain aim at managing sparse representations |
DenseDomain |
Subclass of Domain aim at managing dense representations |
Plot |
Interactive plotting utility for climate datasets |
๐ Axes¶
climatrix.dataset.axis.AxisType
¶
Bases: StrEnum
Enum for axis types.
Attributes:
Name | Type | Description |
---|---|---|
LATITUDE |
str
|
Latitude axis type. |
LONGITUDE |
str
|
Longitude axis type. |
TIME |
str
|
Time axis type. |
VERTICAL |
str
|
Vertical axis type. |
POINT |
str
|
Point axis type. |
get(value)
classmethod
¶
Get the AxisType
type given by value
.
If value
is an instance of AxisType
,
return it as is.
If value
is a string, return the corresponding
AxisType
.
If value
is neither an instance of AxisType
nor a string, raise a ValueError.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
str or AxisType
|
The axis type |
required |
Returns:
Type | Description |
---|---|
AxisType
|
The axis type. |
Raises:
Type | Description |
---|---|
ValueError
|
If |
climatrix.dataset.axis.Axis
¶
Base class for axis types.
Attributes:
Name | Type | Description |
---|---|---|
type |
ClassVar[AxisType]
|
The type of the axis. |
dtype |
ClassVar[dtype]
|
The data type of the axis values. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
name |
str
|
The name of the axis. |
values |
ndarray
|
The values of the axis. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the axis. |
required |
values
|
ndarray
|
The values of the axis. |
required |
is_dimension
|
bool
|
Whether the axis is a dimension or not (default is True). |
True
|
Examples:
Axis is a factory class for all axis types. To create an axis (by matching the name), use:
>>> axis = Axis(name="latitude", values=np.array([1, 2, 3]))
To create a Latitude
axis explicitly, use:
>>> axis = Latitude(name="latitude", values=np.array([1, 2, 3]))
>>> axis = Latitude(
... name="latitude",
... values=np.array([1, 2, 3]),
... is_dimension=True)
Notes
- The
Axis
class is a factory class for all axis types. - If the given axis has "unusual" name, you need to create it
explicitly using the corresponding class (e.g.
Latitude
).
size
property
¶
Get the size of the axis.
Returns:
Type | Description |
---|---|
int
|
The size of the axis. |
matches(name)
classmethod
¶
Check if the axis matches the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the axis matches the name, False otherwise. |
climatrix.dataset.axis.Latitude
¶
Bases: Axis
Latitude axis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the latitude axis. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
climatrix.dataset.axis.Longitude
¶
Bases: Axis
Longitude axis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the longitude axis. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
climatrix.dataset.axis.Time
¶
Bases: Axis
Time axis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the time axis. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
__eq__(other)
¶
Check if two axes are equal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
object
|
The other object to compare with. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the axes are equal, False otherwise. |
climatrix.dataset.axis.Point
¶
Bases: Axis
Point axis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the point axis. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
climatrix.dataset.axis.Vertical
¶
Bases: Axis
Vertical axis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the vertical axis. |
is_dimension |
bool
|
Whether the axis is a dimension or not. |
๐ Data¶
climatrix.dataset.base.BaseClimatrixDataset
¶
Base class for Climatrix workflows.
This class provides a set of methods for manipulating xarray datasets. It is designed to be used as an xarray accessor, allowing you to call its methods directly on xarray datasets.
The class supports basic arithmetic operations, including: addition, subtraction, multiplication, and division.
Attributes:
Name | Type | Description |
---|---|---|
da |
DataArray
|
The underlying |
domain |
Domain
|
The domain object representing the spatial
and temporal dimensions of the dataset.
See |
domain = Domain(xarray_obj)
instance-attribute
¶
subset(north=None, south=None, west=None, east=None)
¶
Subset data with the specified bounding box.
If an argument is not provided, it means no bounds set
in that direction. For example, if north
is not provided,
it means that the maximum latitude of the dataset will be used.
If north
and south
are provided, the dataset will be
subsetted to the area between these two latitudes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
north
|
float
|
North latitude of the bounding box. |
None
|
south
|
float
|
South latitude of the bounding box. |
None
|
west
|
float
|
West longitude of the bounding box. |
None
|
east
|
float
|
East longitude of the bounding box. |
None
|
Returns:
Type | Description |
---|---|
Self
|
The subsetted dataset. |
Raises:
Type | Description |
---|---|
LongitudeConventionMismatch
|
|
Examples:
>>> import climatrix as cm
>>> globe_dset = xr.open_dataset("path/to/dataset.nc")
>>> globe_dset
<xarray.Dataset>
Dimensions: (time: 1, latitude: 180, longitude: 360)
Coordinates:
* time (time) datetime64[ns] 2020-01-01
* latitude (latitude) float64 -90.0 -89.0 -88.0 ... 88.0 89.0
* longitude (longitude) float64 0.0 1.0 2.0 ... 357.0 358.0 359.0
Data variables:
temperature (time, latitude, longitude) float64 ...
>>> dset2 = globe_dset.cm.subset(
... north=10.0,
... south=5.0,
... west=20.0,
... east=25.0,
... )
>>> dset2 = globe_dset.cm.subset(
... north=10.0,
... south=5.0,
... west=-50.0,
... east=25.0,
... )
LongitudeConventionMismatch: The dataset is in positive-only convention
(longitude goes from 0 to 360) while you are
requesting negative values (longitude goes from -180 to 180).
to_signed_longitude()
¶
Convert the dataset to signed longitude convention.
The longitude values are converted to be in the range (-180 to 180 degrees).
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc").cm
>>> dset.da
<xarray.DataArray 'temperature' (time: 1, latitude: 180, longitude: 360)>
...
Dimensions: (time: 1, latitude: 180, longitude: 360)
Coordinates:
* time (time) datetime64[ns] 2020-01-01
* latitude (latitude) float64 -90.0 -89.0 -88.0 ... 88.0 89.0
* longitude (longitude) float64 0.0 1.0 2.0 ... 357.0 358.0 359.0
Data variables:
temperature (time, latitude, longitude) float64 ...
>>> dset2 = cm.to_signed_longitude()
>>> dset2.da
<xarray.DataArray 'temperature' (time: 1, latitude: 180, longitude: 360)>
...
Dimensions: (time: 1, latitude: 180, longitude: 360)
Coordinates:
* time (time) datetime64[ns] 2020-01-01
* latitude (latitude) float64 -90.0 -89.0 -88.0 ... 88.0 89.0
* longitude (longitude) float64 -180.0 -179.0 -178.0 ... 177.0 178.0 179.0
References
[1] Mancini, M., Walczak, J. Stojiljkovic, M., geokube: A Python library for geospatial data processing, 2024, https://doi.org/10.5281/zenodo.10597965 https://github.com/CMCC-Foundation/geokube
to_positive_longitude()
¶
Convert the dataset to positive longitude convention.
The longitude values are converted to be in the range (0 to 360 degrees).
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc").cm
>>> dset.da
<xarray.DataArray 'temperature' (time: 1, latitude: 180,
longitude: 360)>
...
Dimensions: (time: 1, latitude: 180, longitude: 360)
Coordinates:
* time (time) datetime64[ns] 2020-01-01
* latitude (latitude) float64 -90.0 -89.0 -88.0 ... 88.0 89.0
* longitude (longitude) float64 -180.0 ... 178.0 179.0
Data variables:
temperature (time, latitude, longitude) float64 ...
>>> dset2 = dset.to_positive_longitude()
>>> dset2.da
<xarray.DataArray 'temperature' (time: 1, latitude: 180,
longitude: 360)>
...
Dimensions: (time: 1, latitude: 180, longitude: 360)
Coordinates:
* time (time) datetime64[ns] 2020-01-01
* latitude (latitude) float64 -90.0 -89.0 -88.0 ... 88.0 89.0
* longitude (longitude) float64 0.0 1.0 ... 357.0 358.0 359.0
Data variables:
temperature (time, latitude, longitude) float64 ...
References
[1] Mancini, M., Walczak, J. Stojiljkovic, M., geokube: A Python library for geospatial data processing, 2024, https://doi.org/10.5281/zenodo.10597965 https://github.com/CMCC-Foundation/geokube
squeeze()
¶
Squeeze the dataset to remove dimensions of size 1.
Returns:
Type | Description |
---|---|
Self
|
The squeezed dataset. |
profile_along_axes(*axes)
¶
Generate profiles along the specified axes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*axes
|
AxisType | str
|
The axes along which to generate profiles. |
()
|
Yields:
Type | Description |
---|---|
BaseClimatrixDataset
|
A dataset containing the profile along the specified axes. |
mask_nan(source)
¶
Apply NaN values from another dataset to the current one.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
BaseClimatrixDataset
|
Dataset whose NaN values will be applied to the current one. |
required |
Returns:
Type | Description |
---|---|
BaseClimatrixDataset
|
A new dataset with NaN values applied. |
Raises:
Type | Description |
---|---|
TypeError
|
If the |
ValueError
|
If the domain of the |
DomainMismatchError
|
If the domains of the |
Examples:
>>> import climatrix as cm
>>> dset1 = xr.open_dataset("path/to/dataset1.nc").cm
>>> dset2 = xr.open_dataset("path/to/dataset2.nc").cm
>>> dset1.mask_nan(dset2)
time(time)
¶
Select data at a specific time or times.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time
|
datetime, np.datetime64, slice, list, or np.ndarray
|
Time or times to be selected. |
required |
Returns:
Type | Description |
---|---|
Self
|
The dataset with the selected time or times. |
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc")
Selecting by datetime
object:
>>> dset.cm.time(datetime(2020, 1, 1))
Selecting by np.datetime64
object:
>>> dset.cm.time(np.datetime64("2020-01-01"))
Selecting by str
object:
>>> dset.cm.time(slice("2020-01-01"))
Selecting by list
of any of the above:
>>> dset.cm.time([datetime(2020, 1, 1), np.datetime64("2020-01-02")])
Selecting by slice
object:
>>> dset.cm.time(slice(datetime(2020, 1, 1), datetime(2020, 1, 2)))
itime(time)
¶
Select time value by index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time
|
int, list[int], np.ndarray, or slice
|
Time index or indices to be selected. |
required |
Returns:
Type | Description |
---|---|
Self
|
The dataset with the selected time or times. |
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc")
Selecting by int
object:
>>> dset.cm.itime(0)
Selecting by list
of int
s:
>>> dset.cm.itime([0, 1])
Selecting by slice
object:
>>> dset.cm.itime(slice(0, 2))
sample_uniform(portion=None, number=None, nan='ignore')
¶
Sample the dataset using a uniform distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
portion
|
float
|
Portion of the dataset to be sampled. |
None
|
number
|
int
|
Number of points to be sampled. |
None
|
nan
|
SamplingNaNPolicy | str
|
Policy for handling NaN values. |
'ignore'
|
Notes
At least one of portion
or number
must be provided.
Cannot be provided both at the same time.
Warns:
Type | Description |
---|---|
TooLargeSamplePortionWarning
|
If the portion exceeds 1.0 or number of points exceeds the number of spatial points in the Domain |
Raises:
Type | Description |
---|---|
ValueError
|
If the dataset contains NaN values and |
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc")
>>> sparse_dset = dset.cm.sample_uniform(portion=0.1)
sample_normal(portion=None, number=None, center_point=None, sigma=10.0, nan='ignore')
¶
Sample the dataset using a normal distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
portion
|
float
|
Portion of the dataset to be sampled. |
None
|
number
|
int
|
Number of points to be sampled. |
None
|
center_point
|
tuple[Longitude, Latitude]
|
Center point for the normal distribution. |
None
|
sigma
|
float
|
Standard deviation for the normal distribution. |
10.0
|
nan
|
SamplingNaNPolicy | str
|
Policy for handling NaN values. |
'ignore'
|
Notes
At least one of portion
or number
must be provided.
Cannot be provided both at the same time.
Warns:
Type | Description |
---|---|
TooLargeSamplePortionWarning
|
If the portion exceeds 1.0 or number of points exceeds the number of spatial points in the Domain |
Raises:
Type | Description |
---|---|
ValueError
|
If the dataset contains NaN values and |
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc")
>>> sparse_dset = dset.cm.sample_normal(
... number=1_000,
... center_point=(10.0, 20.0),
... sigma=5.0,
... )
reconstruct(target, *, method, **recon_kwargs)
¶
Reconstruct the dataset to a target domain.
If target domain is sparse, the reconstruction will be sparse
too. If target domain is dense, the reconstruction will be dense
too. The reconstruction will be done using the method specified
in the method
argument.
The method can be one of the following:
Inverse Distance Weightining (idw
),
Ordinary Kriging (ok
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
Domain
|
The target domain to reconstruct the dataset to. |
required |
method
|
ReconstructionType | str
|
The method to use for reconstruction. Can be one of the following: 'idw', 'ok'. |
required |
recon_kwargs
|
dict
|
Additional keyword arguments to pass to the reconstruction method. |
{}
|
See Also
Returns:
Type | Description |
---|---|
Self
|
The reconstructed dataset. |
plot(title=None, target=None, show=True, **kwargs)
¶
Plot the dataset on a map.
The dataset is plotted using Cartopy and Matplotlib.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
title
|
str
|
Title of the plot. If not provided, the name of the dataset will be used. If the dataset has no name, "Climatrix Dataset" will be used. |
None
|
target
|
str, os.PathLike, Path, or None
|
Path to save the plot. If not provided, the plot will not be saved. |
None
|
show
|
bool
|
Whether to show the plot. Default is True. |
True
|
**kwargs
|
dict
|
Additional keyword arguments to pass to the plotting function.
|
{}
|
Returns:
Type | Description |
---|---|
Axes
|
The axes object containing the plot. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If the dataset is dynamic (contains time dimension with more than one value). |
transpose(*axes)
¶
Transpose the dataset along the specified dimensions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*axes
|
AxisType or str
|
The axes along which to transpose the dataset. |
()
|
Returns:
Type | Description |
---|---|
Self
|
The transposed dataset. |
Examples:
>>> import climatrix as cm
>>> dset = xr.open_dataset("path/to/dataset.nc").cm
>>> dset2 = dset.transpose("longitude", "latitude")
๐ Domain¶
climatrix.dataset.domain.Domain
¶
Base class for domain objects.
Attributes:
Name | Type | Description |
---|---|---|
is_sparse |
ClassVar[bool]
|
Indicates if the domain is sparse or dense. |
_axes |
dict[AxisType, Axis]
|
Mapping of |
dims
property
¶
Get the dimensions of the dataset.
Returns:
Type | Description |
---|---|
tuple[AxisType, ...]
|
A tuple of |
Notes
The dimensions are determined by the axes that are marked as
dimensions in the domain. E.g. if underlying dataset has
shape (5, 10, 20)
, it means there are 3 dimensional axes.
latitude
property
¶
Latitude axis
longitude
property
¶
Longitude axis
time
property
¶
Time axis
point
property
¶
Point axis
vertical
property
¶
Vertical axis
is_dynamic
property
¶
If the domain is dynamic.
is_sparse
class-attribute
¶
size
property
¶
Domain size.
all_axes_types
property
¶
All axis types in the domain.
from_lat_lon(lat=slice(-90, 90, _DEFAULT_LAT_RESOLUTION), lon=slice(-180, 180, _DEFAULT_LON_RESOLUTION), kind='dense')
classmethod
¶
Create a domain from latitude and longitude coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lat
|
slice or ndarray
|
Latitude coordinates. If a slice is provided, it will be converted to a numpy array using the specified step. |
slice(-90, 90, _DEFAULT_LAT_RESOLUTION)
|
lon
|
slice or ndarray
|
Longitude coordinates. If a slice is provided, it will be converted to a numpy array using the specified step. |
slice(-180, 180, _DEFAULT_LON_RESOLUTION)
|
kind
|
str
|
Type of domain to create. Can be either "dense" or "sparse". Default is "dense". |
'dense'
|
Returns:
Type | Description |
---|---|
Domain
|
An instance of the Domain class with the specified latitude and longitude coordinates. |
from_axes()
classmethod
¶
Create a domain builder for configuring domains with multiple axes.
Returns:
Type | Description |
---|---|
DomainBuilder
|
A builder instance for creating domains with various axes. |
Examples:
>>> domain = (Domain.from_axes()
... .vertical(depth=slice(10, 100, 1))
... .lat(latitude=[1,2,3,4])
... .lon(longitude=[1,2,3,4])
... .sparse())
>>> domain = (Domain.from_axes()
... .lat(lat=slice(-90, 90, 1))
... .lon(lon=slice(-180, 180, 1))
... .time(time=['2020-01-01', '2020-01-02'])
... .dense())
get_size(axis)
¶
Get the size of the specified axis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis
|
AxisType
|
The axis for which to get the size. |
required |
Returns:
Type | Description |
---|---|
int
|
The size of the specified axis. |
has_axis(axis)
¶
Check if the specified axis exists in the domain.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis
|
AxisType
|
The axis type to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the axis exists, False otherwise. |
get_axis(axis)
¶
get_all_spatial_points()
abstractmethod
¶
to_xarray(values, name=None)
abstractmethod
¶
climatrix.dataset.domain.SparseDomain
¶
Bases: Domain
Sparse domain class.
Supports operations on sparse spatial domain.
to_xarray(values, name=None)
¶
Convert domain to sparse xarray.DataArray.
The method applies values
and (optionally) name
to
create a new xarray.DataArray object based on the domain.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
values
|
ndarray
|
The values to be assigned to the DataArray variable. |
required |
name
|
str
|
The name of the DataArray variable. |
None
|
Returns:
Type | Description |
---|---|
DataArray
|
The xarray.DataArray single variable object. |
Raises:
Type | Description |
---|---|
ValueError
|
If the shape of |
Examples:
>>> domain = Domain.from_lat_lon()
>>> values = np.random.rand(5, 5)
>>> da = domain.to_xarray(values, name="example")
>>> isinstance(da, xr.DataArray)
True
>>> da.name
'example'
get_all_spatial_points()
¶
Get all spatial points in the domain.
Returns:
Type | Description |
---|---|
ndarray
|
An array of shape (n_points, 2) containing the latitude and longitude coordinates of all points in the domain. |
Examples:
>>> points = domain.get_all_spatial_points()
>>> points
array([[ 0. , -0.1],
[ 0. , 0. ],
[ 0. , 0.1],
...
climatrix.dataset.domain.DenseDomain
¶
Bases: Domain
Dense domain class.
Supports operations on dense spatial domain.
to_xarray(values, name=None)
¶
Convert domain to dense xarray.DataArray.
The method applies values
and (optionally) name
to
create a new xarray.DataArray object based on the domain.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
values
|
ndarray
|
The values to be assigned to the DataArray variable. |
required |
name
|
str
|
The name of the DataArray variable. |
None
|
Returns:
Type | Description |
---|---|
DataArray
|
The xarray.DataArray single variable object. |
Raises:
Type | Description |
---|---|
ValueError
|
If the shape of |
Examples:
>>> domain = Domain.from_lat_lon()
>>> values = np.random.rand(5, 5)
>>> da = domain.to_xarray(values, name="example")
>>> isinstance(da, xr.DataArray)
True
>>> da.name
'example'
get_all_spatial_points()
¶
Get all spatial points in the domain.
Returns:
Type | Description |
---|---|
ndarray
|
An array of shape (n_points, 2) containing the latitude and longitude coordinates of all points in the domain. |
Examples:
>>> points = domain.get_all_spatial_points()
>>> points
array([[ 0. , -0.1],
[ 0. , 0. ],
[ 0. , 0.1],
...
๐ Interactive Plotting¶
๐ Reconstructors¶
climatrix.reconstruct.base.BaseReconstructor
¶
Bases: ABC
Base class for all dataset reconstruction methods.
Attributes:
Name | Type | Description |
---|---|---|
dataset |
BaseClimatrixDataset
|
The dataset to be reconstructed. |
target_domain |
Domain
|
The target domain for the reconstruction. |
__init_subclass__(**kwargs)
¶
Register subclasses automatically.
get(method)
classmethod
¶
Get a reconstruction class by method name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
str
|
The reconstruction method name (e.g., 'idw', 'ok', 'sinet', 'siren'). |
required |
Returns:
Type | Description |
---|---|
type[BaseReconstructor]
|
The reconstruction class. |
Raises:
Type | Description |
---|---|
ValueError
|
If the method is not supported. |
Notes
The method
parameter should reflect the NAME
class attribute
of the selected reconstructor class.
get_available_methods()
classmethod
¶
Get a list of available reconstruction methods.
Returns:
Type | Description |
---|---|
list[str]
|
List of method names (e.g., 'idw', 'ok', 'sinet', 'siren'). |
get_hparams()
classmethod
¶
Get hyperparameter definitions from Hyperparameter descriptors.
Returns:
Type | Description |
---|---|
dict[str, dict[str, any]]
|
Dictionary mapping parameter names to their definitions. Each parameter definition contains: - 'type': the parameter type - 'bounds': tuple of (min, max) for numeric parameters (if defined) - 'values': list of valid values for categorical parameters (if defined) - 'default': default value (if defined) |
reconstruct()
abstractmethod
¶
Reconstruct the dataset using the specified method.
This is an abstract method that must be implemented by subclasses.
The data are reconstructed for the target domain, passed in the initializer.
Returns:
Type | Description |
---|---|
BaseClimatrixDataset
|
The reconstructed dataset. |
update_bounds(bounds=None, values=None)
classmethod
¶
Update the bounds of hyperparameters in the class.
If bound is defined as tuple, it represents a range (min, max). If as a list, it represents a set of valid values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**bounds
|
dict[str, tuple]
|
Keyword arguments where keys are hyperparameter names and values are tuples defining new bounds. |
None
|
climatrix.reconstruct.idw.IDWReconstructor
¶
Bases: BaseReconstructor
Inverse Distance Weighting Reconstructor
This class performs spatial interpolation using inverse distance weighting, where the influence of each known data point on the interpolated value decreases with distance according to a power function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
BaseClimatrixDataset
|
The input dataset to reconstruct. |
required |
target_domain
|
Domain
|
The target domain for reconstruction. |
required |
power
|
float
|
The power to raise the distance to (default is 2.0).
Controls the rate of decrease of influence with distance.
Type: float, bounds: |
None
|
k
|
int
|
The number of nearest neighbors to consider (default is 5). Type: int, bounds: (1, ...), default: 5 |
None
|
k_min
|
int
|
The minimum number of nearest neighbors to consider (if k < k_min) NaN values will be put (default is 2). Type: int, bounds: (1, ...)>, default: 2 |
None
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
If the input dataset is dynamic, as IDW reconstruction is not yet supported for dynamic datasets. |
ValueError
|
If k_min is greater than k or if k is less than 1. |
Notes
Hyperparameters for optimization: - power: float in (1e-10, 5.0), default=2.0 - k: int in (1, 50), default=5 - k_min: int in (1, 40), default=2
reconstruct()
¶
Perform Inverse Distance Weighting (IDW) reconstruction.
This method reconstructs the sparse dataset using IDW, taking into account the specified number of nearest neighbors and the power to which distances are raised. The reconstructed data is returned as a dense dataset, either static or dynamic based on the input dataset.
Returns:
Type | Description |
---|---|
BaseClimatrixDataset
|
The reconstructed dataset on the target domain. |
Notes
- If fewer than
self.k_min
neighbors are available, NaN values are assigned to the corresponding points in the output.
climatrix.reconstruct.kriging.OrdinaryKrigingReconstructor
¶
Bases: BaseReconstructor
Reconstruct a sparse dataset using Ordinary Kriging.
This class performs spatial interpolation using ordinary kriging, a geostatistical method that provides optimal linear unbiased estimation by modeling spatial correlation through variograms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
SparseDataset
|
The sparse dataset to reconstruct. |
required |
target_domain
|
Domain
|
The target domain for reconstruction. |
required |
backend
|
Literal['vectorized', 'loop'] | None
|
The backend to use for kriging (default is None). |
None
|
nlags
|
int | None
|
Number of lags for variogram computation (default is 6). Type: int, bounds: (0, ...), default: 6 |
None
|
anisotropy_scaling
|
float | None
|
Anisotropy scaling factor (default is 1e-6).
Type: float, bounds: |
None
|
coordinates_type
|
str | None
|
Type of coordinate system (default is "euclidean"). Type: str, values: ["euclidean", "geographic"], default: "euclidean" |
None
|
variogram_model
|
str | None
|
Variogram model to use (default is "linear"). Type: str, values: ["linear", "power", "gaussian", "spherical", "exponential"], default: "linear" |
None
|
pseudo_inv
|
bool
|
Whether to use pseudo-inverse for matrix operations (default is False). |
False
|
Attributes:
Name | Type | Description |
---|---|---|
dataset |
SparseDataset
|
The sparse dataset to reconstruct. |
domain |
Domain
|
The target domain for reconstruction. |
pykrige_kwargs |
dict
|
Additional keyword arguments to pass to pykrige. |
backend |
Literal['vectorized', 'loop'] | None
|
The backend to use for kriging. |
_MAX_VECTORIZED_SIZE |
ClassVar[int]
|
The maximum size for vectorized kriging.
If the dataset is larger than this size, loop kriging
will be used (if |
Notes
Hyperparameters for optimization: - nlags: int in (4, 20), default=6 - anisotropy_scaling: float in (1e-6, 5.0), default=1e-6 - coordinates_type: str in ["euclidean", "geographic"], default="euclidean" - variogram_model: str in ["linear", "power", "gaussian", "spherical", "exponential"], default="linear"
reconstruct()
¶
Perform Ordinary Kriging reconstruction of the dataset.
Returns:
Type | Description |
---|---|
BaseClimatrixDataset
|
The dataset reconstructed on the target domain. |
Notes
- The backend is chosen based on the size of the dataset. If the dataset is larger than the maximum size, the loop backend is used.
climatrix.reconstruct.siren.siren.SIRENReconstructor
¶
Bases: BaseReconstructor
A reconstructor that uses SIREN to reconstruct fields.
SIREN (Sinusoidal Representation Networks) uses sinusoidal activation functions to learn continuous implicit neural representations of spatial fields from sparse observations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
BaseClimatrixDataset
|
Source dataset to reconstruct from. |
required |
target_domain
|
Domain
|
Target domain to reconstruct onto. |
required |
on_surface_points
|
int
|
Number of points to sample on the surface for training. |
1024
|
hidden_features
|
int
|
Number of features in each hidden layer. |
256
|
hidden_layers
|
int
|
Number of hidden layers in the SIREN model. |
4
|
omega_0
|
float
|
Frequency multiplier for the first layer. |
30.0
|
omega_hidden
|
float
|
Frequency multiplier for hidden layers. |
30.0
|
lr
|
float
|
Learning rate for the optimizer.
Type: float, bounds: |
1e-4
|
batch_size
|
int
|
Batch size for training.
Type: int, bounds: |
256
|
num_epochs
|
int
|
Number of epochs to train for.
Type: int, bounds: |
100
|
hidden_dim
|
int
|
Hidden layer dimensions.
Type: int, bounds: |
256
|
num_layers
|
int
|
Number of hidden layers.
Type: int, bounds: |
4
|
num_workers
|
int
|
Number of worker processes for the dataloader. |
0
|
device
|
str
|
Device to run the model on ("cuda" or "cpu"). |
"cuda"
|
gradient_clipping_value
|
float or None
|
Value for gradient clipping (None to disable).
Type: float, bounds: |
None
|
checkpoint
|
str or PathLike or Path or None
|
Path to save/load model checkpoint from. |
None
|
sdf_loss_weight
|
float
|
Weight for the SDF constraint loss. |
3000.0
|
inter_loss_weight
|
float
|
Weight for the interpolation consistency loss. |
100.0
|
normal_loss_weight
|
float
|
Weight for the surface normal loss. |
100.0
|
grad_loss_weight
|
float
|
Weight for the gradient regularization loss. |
50.0
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
If trying to use SIREN with a dynamic dataset. |
Notes
Hyperparameters for optimization: - lr: float in (1e-5, 1e-2), default=1e-3 - batch_size: int in (64, 1024), default=256 - num_epochs: int in (100, 10_000), default=5_000 - hidden_dim: int in (128, 512), default=256 - num_layers: int in (3, 8), default=4 - gradient_clipping_value: float in (0.1, 10.0), default=1.0
configure_optimizer(model)
¶
Configure the optimizer for the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Module
|
The model to optimize. |
required |
Returns:
Type | Description |
---|---|
Optimizer
|
Configured Adam optimizer. |
init_model()
¶
Initialize the 3D SIREN model.
Returns:
Type | Description |
---|---|
Module
|
Initialized SIREN model on the appropriate device. |
reconstruct()
¶
Train (if necessary) and use a SIREN model to reconstruct the field.
This method is the main entry point for using the SIREN reconstructor. It will train a new model if no checkpoint was loaded, and then use the model to reconstruct the field on the target domain.
Returns:
Type | Description |
---|---|
BaseClimatrixDataset
|
A dataset containing the reconstructed field. |
Raises:
Type | Description |
---|---|
ImportError
|
If required dependencies are not installed. |
โ๏ธ Evaluation¶
climatrix.comparison.Comparison
¶
Class for comparing two datasets (dense or sparse).
For sparse domains, uses nearest neighbor matching with optional distance thresholds to find corresponding observations.
Attributes:
Name | Type | Description |
---|---|---|
predicted_dataset |
BaseClimatrixDataset
|
The predicted/source dataset. |
true_dataset |
BaseClimatrixDataset
|
The true/target dataset. |
diff |
BaseClimatrixDataset
|
The difference between the predicted and true datasets. |
distance_threshold |
(float, optional)
|
Maximum distance for point correspondence in sparse domains. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted_dataset
|
BaseClimatrixDataset
|
The predicted/source dataset. |
required |
true_dataset
|
BaseClimatrixDataset
|
The true/target dataset. |
required |
map_nan_from_source
|
bool
|
If True, the NaN values from the source dataset will be
mapped to the target dataset. If False, the NaN values
from the target dataset will be used. Default is None,
which means |
None
|
distance_threshold
|
float
|
For sparse domains, maximum distance threshold for considering points as corresponding. If None, closest points are always matched. Only used when both datasets have sparse domains. |
None
|
plot_diff(title=None, target=None, show=False, ax=None, **kwargs)
¶
Plot the difference between the source and target datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
title
|
str
|
Title of the plot. If not provided, the name of the dataset will be used. If the dataset has no name, "Climatrix Dataset" will be used. |
None
|
target
|
str, os.PathLike, Path, or None
|
Path to save the plot. If not provided, the plot will not be saved. |
None
|
show
|
bool
|
Whether to show the plot. Default is False. |
False
|
ax
|
Axes
|
Axes to plot on. If not provided, a new figure and axes will be created. |
None
|
**kwargs
|
dict
|
Additional keyword arguments to pass to the plotting function.
|
{}
|
Returns:
Type | Description |
---|---|
Axes
|
The matplotlib axes containing the plot of the difference. |
plot_signed_diff_hist(ax=None, n_bins=50, limits=None, label=None, alpha=1.0)
¶
Plot the histogram of signed difference between datasets.
The signed difference is a dataset where positive values represent areas where the source dataset is larger than the target dataset and negative values represent areas where the source dataset is smaller than the target dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ax
|
Axes
|
The matplotlib axes on which to plot the histogram. If None, a new set of axes will be created. |
None
|
n_bins
|
int
|
The number of bins to use in the histogram (default is 50). |
50
|
limits
|
tuple[float]
|
The limits of values to include in the histogram (default is None). |
None
|
Returns:
Type | Description |
---|---|
Axes
|
The matplotlib axes containing the plot of the signed difference. |
compute_rmse()
¶
Compute the RMSE between the source and target datasets.
Returns:
Type | Description |
---|---|
float
|
The RMSE between the source and target datasets. |
compute_mae()
¶
Compute the MAE between the source and target datasets.
Returns:
Type | Description |
---|---|
float
|
The mean absolute error between the source and target datasets. |
compute_r2()
¶
Compute the R^2 between the source and target datasets.
Returns:
Type | Description |
---|---|
float
|
The R^2 between the source and target datasets. |
compute_max_abs_error()
¶
Compute the maximum absolute error between datasets.
Returns:
Type | Description |
---|---|
float
|
The maximum absolute error between the source and target datasets. |
compute_report()
¶
save_report(target_dir)
¶
Save a report of the comparison between passed datasets.
This method will create a directory at the specified path and save a report of the comparison between the source and target datasets in that directory. The report will include plots of the difference and signed difference between the datasets, as well as a csv file with metrics such as the RMSE, MAE, and maximum absolute error.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_dir
|
str | PathLike | Path
|
The path to the directory where the report should be saved. |
required |
๐ง Hyperparameter Optimization¶
Climatrix provides automated hyperparameter optimization for all reconstruction methods using Bayesian optimization.
Installation¶
To use hyperparameter optimization, install climatrix with the optimization extras:
pip install climatrix[optim]
This installs the required bayesian-optimization
package dependency.
HParamFinder¶
climatrix.optim.HParamFinder
¶
Bayesian hyperparameter optimization for reconstruction methods.
This class uses Bayesian optimization to find optimal hyperparameters for various reconstruction methods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method
|
str
|
Reconstruction method to optimize. |
required |
train_dset
|
BaseClimatrixDataset
|
Training dataset used for optimization. |
required |
val_dset
|
BaseClimatrixDataset
|
Validation dataset used for optimization. |
required |
metric
|
str
|
Evaluation metric to optimize. Default is "mae". Supported metrics: "mae", "mse", "rmse". |
'mae'
|
exclude
|
str or Collection[str]
|
Parameter(s) to exclude from optimization. |
None
|
include
|
str or Collection[str]
|
Parameter(s) to include in optimization. If specified, only these parameters will be optimized. |
None
|
n_iters
|
int
|
Total number of optimization iterations. Default is 100. |
100
|
bounds
|
dict
|
Custom parameter bounds. Overrides default bounds for the method. |
None
|
random_seed
|
int
|
Random seed for reproducible optimization. Default is 42. |
42
|
Attributes:
Name | Type | Description |
---|---|---|
train_dset |
BaseClimatrixDataset
|
Training dataset. |
val_dset |
BaseClimatrixDataset
|
Validation dataset. |
metric |
MetricType
|
Evaluation metric. |
method |
str
|
Reconstruction method. |
bounds |
dict
|
Parameter bounds for optimization. |
n_iter |
int
|
Number of optimization iterations. |
random_seed |
int
|
Random seed for optimization. |
verbose |
int
|
Verbosity level for logging (0 - silent, 1 - info, 2 - debug). |
n_startup_trials |
int
|
Number of startup trials for the optimizer. |
n_warmup_steps |
int
|
Number of warmup steps before starting optimization. |
result |
dict
|
Dictionary containing optimization results: - 'best_params': Best hyperparameters found (with correct types) - 'best_score': Best score achieved (negative metric value) - 'metric_name': Name of the optimized metric - 'method': Reconstruction method used - 'n_trials': Total number of trials performed |
optimize()
¶
Run Bayesian optimization to find optimal hyperparameters.
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary containing: - 'best_params': Best hyperparameters found (with correct types) - 'best_score': Best score achieved (negative metric value) - 'history': Optimization history - 'metric_name': Name of the optimized metric - 'method': Reconstruction method used |
Supported Methods and Parameters¶
The hyperparameter optimizer supports all reconstruction methods available in Climatrix. For detailed information about each method's hyperparameters, including their types, bounds, and default values, see the Reference documentation for each reconstruction class:
- IDWReconstructor - Inverse Distance Weighting
- OrdinaryKrigingReconstructor - Ordinary Kriging
- SiNETReconstructor - Spatial Interpolation NET
- SIRENReconstructor - Sinusoidal INR