Web sources internal API
These modules are abstracted by kadlu.load
, which
should generally be used instead for fetching and loading from web sources.
These docs are provided for contributors wishing to add new web data source
modules that can be accessed by kadlu.load
.
kadlu.load
works
by mapping strings to function calls in kadlu.geospatial.data_sources.source_map
as a convenience by passing the source and variable name; new data source
modules should have a corresponding entry within the fetch and load maps there.
A general pattern for adding fetch/load modules is that fetching is an implicit action of the load function. each fetch and load function should accept a set of boundary arguments equal to or subsetting the following function signature, such that keyword args can be passed as a dict by fetch_handler:
(south=-90, west=-180, north=90, east=180, top=0, bottom=5000, start=datetime(2000,1,1), end=datetime(2000,1,1,1))
Fetch Handler
used for generating fetch requests for web data sources
Source Map
used for generating fetch requests for web data sources
data fetching utils, function maps, and constant variables collection of objects
CHS
ERA5
API for Era5 dataset from Copernicus Climate Datastore.
Note: Initial release data (ERA5T) are available about 5 days behind real time.
- Dataset summary:
https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview
- Detailed information about the dataset:
- API documentation:
- class kadlu.geospatial.data_sources.era5.Era5[source]
Bases:
object
Collection of module functions for fetching and loading.
The functions return (values, lat, lon, epoch) numpy arrays with shape (num_points, 4) where epoch is the number of hours since 2000-01-01.
- kadlu.geospatial.data_sources.era5.clear_cache_era5()[source]
Removes all files with the filename pattern ERA5_*.grb2 in the Kadlu storage directory
- kadlu.geospatial.data_sources.era5.fetch_era5(var, *, west, east, south, north, start, **_)[source]
Fetch global ERA5 data for specified variable, geographic region, and time range.
Downloads 24-hours of global data on the specified day, and saves these data to a *.grb2 file in the kadlu data storage directory, using the recommended spatial resolution of 0.25 x 0.25 degrees.
The *.grb2 file can be deleted manually by calling the clear_cache_era5 function to save disk space, if necessary.
Only data within the specified geographic boundaries (west, east, south, north) are inserted into the kadlu geospatial.db database.
- args:
- var: string
The variable short name of desired wave parameter according to ERA5 docs. The complete list can be found here (table 7 for wave params): https://confluence.ecmwf.int/display/CKB/ERA5+data+documentation#ERA5datadocumentation-Temporalfrequency
- west,east,south,north: float
Geographic boundaries of the data request
- start: datetime.datetime
UTC date of the data request. 24-hours of data will be fetched.
- return:
True if new data was fetched, else False
- kadlu.geospatial.data_sources.era5.initdb()[source]
Create tables in kadlu’s geospatial.db database for storing ERA5 data
- kadlu.geospatial.data_sources.era5.load_era5(var, *, west, east, south, north, start, end, fetch=True, **_)[source]
Load ERA5 data from local geospatial.db database
- Args:
- var: str
Variable to be fetched
- west,east,south,north: float
Geographic boundaries of the data request
- start: datetime.datetime
UTC start time for the data request.
- end: datetime.datetime
UTC end time for the data request.
- fetch: bool
If the data have not already been downloaded and inserted into Kadlu’s local geospatial database, fetch data from the Copernicus Climate Data Store (CDS) automatically using the CDS API. Default is True.
- Returns:
- values:
values of the fetched var
- lat:
y grid coordinates
- lon:
x grid coordinates
- epoch:
timestamps in epoch hours since jan 1 2000
GEBCO
HYCOM
- data source:
- web interface for manual hycom data retrieval:
- GLBy0.08/expt_93.0
Time range: Dec 4th, 2018 to Present Longitude convention: 0 to +360 degrees Depth coordinates are positive below the sea surface Times are in epoch hours since 2000-01-01 00:00 UTC
- API limitations:
Do not subset more than 1 day at a time using ncss.hycom.org
Do not use more than 2 concurrent connections per IP address when downloading data from ncss.hycom.org
- class kadlu.geospatial.data_sources.hycom.Hycom[source]
Bases:
object
Collection of module functions for fetching and loading HYCOM data.
- Attributes:
- lat, lon: arrays
Lat/lon coordinates.
- epoch: array
Time coordinates. Times are formatted as epoch hours since 2000-01-01 00:00
- depth: array
Depth coordinates.
- callback(var, max_attempts=3, **kwargs)[source]
Builds indices for query, fetches data from HYCOM, and inserts into local database.
Note: Null/NaN values are removed before the data is inserted into the local database. Null/NaN values occur when the grid overlaps with land or extends below the seafloor.
- TODO: Add download progress bar, e.g., using the approach described here:
- Args:
- var: string
Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html
- max_attempts: int
Maximum number of request attempts. Default is 3. Each request has a timeout of 120 s.
- kwargs: dict
boundaries as keyword arguments
- fetch_hycom(var, kwargs, max_attempts=3)[source]
Fetch data from the HYCOM server.
Kadlu’s index class is used for ‘binning’ the data requests into requests that span 1 degree in lat/lon, 24 hours in time (1 day), and 0-5000 m in depth.
A data request that spans multiple lat/lon degrees, multiple days, or includes depths greater than 5000m is split into multiple such ‘binned’ requests.
Conversely, smaller data requests are ‘inflated’ to the size of one ‘bin’.
- Args:
- var: string
Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html
- kwargs: dict
boundaries as keyword arguments
- max_attempts: int
Maximum number of request attempts. Default is 3. Each request has a timeout of 120 s.
- load_hycom(var, kwargs)[source]
Load HYCOM data from local database.
If data is not present, attempts to fetch it from the HYCOM server.
Although HYCOM uses the 0 to +360 degree longitude convention, the longitude coordinates returned by this method adhere to the -180 to +180 convention used everywhere else in Kadlu.
- Args:
- var:
Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html
- south, north: float
ymin, ymax coordinate values. range: -90, 90
- kwargs: dict (boundaries as keyword arguments)
- west, east: float
xmin, xmax coordinate values. range: -180, 180
- start, end: datetime
temporal boundaries in datetime format
- Returns:
- values: array
values of the fetched variable
- lat: array
y grid coordinates
- lon: array
x grid coordinates
- epoch: array
timestamps in epoch hours since jan 1 2000
- depth: array
measured in meters
- load_water_uv(**kwargs)[source]
Load water speed, computed as sqrt(vu^2 + vv^2)
If data is not present, attempts to fetch it from the HYCOM server.
- Args:
- var:
Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html
- south, north: float
ymin, ymax coordinate values. range: -90, 90
- kwargs: dict (boundaries as keyword arguments)
- west, east: float
xmin, xmax coordinate values. range: -180, 180
- start, end: datetime
temporal boundaries in datetime format
- Returns:
- values: array
Water speed values, in m/s
- lat: array
y grid coordinates
- lon: array
x grid coordinates
- epoch: array
timestamps in epoch hours since jan 1 2000
- depth: array
measured in meters
- kadlu.geospatial.data_sources.hycom.fetch_grid(**_)[source]
Download HYCOM lat/lon/time/depth arrays for grid indexing.
Times are formatted as epoch hours since 2000-01-01 00:00.
- Returns:
- lat, lon, epoch, depth: numpy array
The coordinate arrays
- kadlu.geospatial.data_sources.hycom.initdb()[source]
Create tables in kadlu’s geospatial.db database for storing HYCOM data
- kadlu.geospatial.data_sources.hycom.slices_str(var, slices, steps=(1, 1, 1, 1))
IFREMER
WWIII
Kadlu API for the NOAA WaveWatch III Datastore
- User guides:
https://github.com/NOAA-EMC/WW3/wiki/WAVEWATCH-III-User-Guide
- Data model description (boundary definitions, map visualizations, etc)
- class kadlu.geospatial.data_sources.wwiii.Wwiii[source]
Bases:
object
collection of module functions for fetching and loading
- kadlu.geospatial.data_sources.wwiii.fetch_wwiii(var, **kwargs)[source]
download wwiii data and return associated filepaths
- args:
- var: string
the variable name of desired parameter according to WWIII docs the complete list of variables can be found at the following URL under ‘model output’ https://polar.ncep.noaa.gov/waves/implementations.php
- south, north: float
ymin, ymax coordinate boundaries (latitude). range: -90, 90
- west, east: float
xmin, xmax coordinate boundaries (longitude). range: -180, 180
- start: datetime
the start of the desired time range
- end: datetime
the end of the desired time range
- return:
True if new data was fetched, else False
- kadlu.geospatial.data_sources.wwiii.insert(table, agg)[source]
insert parsed data into local database
- kadlu.geospatial.data_sources.wwiii.load_wwiii(var, kwargs)[source]
return downloaded wwiii data for specified wavevar according to given time, lat, lon boundaries
- args:
- var: string
the variable short name of desired wave parameter according to WWIII docs the complete list of variable short names can be found here (under ‘model output’) https://polar.ncep.noaa.gov/waves/implementations.php
- south, north: float
ymin, ymax coordinate boundaries (latitude). range: -90, 90
- west, east: float
xmin, xmax coordinate boundaries (longitude). range: -180, 180
- start: datetime
the start of the desired time range
- end: datetime
the end of the desired time range
- return:
val, lat, lon, epoch as np arrays of floats
Data Utils
additional tools and utils used across data loading modules
- class kadlu.geospatial.data_sources.data_util.Boundary(south, north, west, east, fetchvar='', **_)[source]
Bases:
object
compute intersecting boundaries with separating axis theorem
- kadlu.geospatial.data_sources.data_util.database_cfg()[source]
configure and connect to sqlite database
time is stored as an integer in the database, where each value is epoch hours since 2000-01-01 00:00
- returns:
- conn:
database connection object
- db:
connection cursor object
- kadlu.geospatial.data_sources.data_util.dt_2_epoch(dt_arr, t0=datetime.datetime(2000, 1, 1, 0, 0))[source]
convert datetimes to epoch hours
- kadlu.geospatial.data_sources.data_util.epoch_2_dt(ep_arr, t0=datetime.datetime(2000, 1, 1, 0, 0), unit='hours')[source]
convert epoch hours to datetimes
- kadlu.geospatial.data_sources.data_util.ext(filepath, extensions)
- kadlu.geospatial.data_sources.data_util.flatten(cols, frames)[source]
dimensional reduction by taking average of time frames
- kadlu.geospatial.data_sources.data_util.fmt_coords(kwargs)[source]
Formats spatial coordinates as a human readable character string
- Args:
- kwargs: dict
Must have keys south, north, east, west and optionally also top and bottom
- Returns:
- : str
Nicely formatted string
- kadlu.geospatial.data_sources.data_util.fmt_time(kwargs)[source]
Formats time window as a human readable character string
- Args:
- kwargs: dict
Must have keys start and end
- Returns:
- : str
Nicely formatted string
- kadlu.geospatial.data_sources.data_util.index_arr(val, sorted_arr)[source]
converts value in coordinate array to grid index
- kadlu.geospatial.data_sources.data_util.ll_2_regionstr(south, north, west, east, regions, default=[])[source]
convert input bounds to region strings with seperating axis theorem
- kadlu.geospatial.data_sources.data_util.logmsg(source, var, ntup=(), **kwargs)[source]
Log message informing that data was inserted into the kadlu database
The message includes
the name of the data source
the name of the variable
the number of data points added to the database
the geographical region
the time period
- Args:
- source: str
Data source name
- var: str
Variable name
- ntup: tuple
… If None, the message will state that no data was found.
- Keyword args:
- south,north,east,west: float
Geographic region
- top,bottom: float
Depth range
- start,end: datetime.datetime
Time window
- kadlu.geospatial.data_sources.data_util.logmsg_nodata(source, var, **kwargs)[source]
Log message informing that data was inserted in the database
- kadlu.geospatial.data_sources.data_util.reshape_3D_gridded(cols)[source]
prepare loaded data for interpolation
TODO: review and cleanup this function
- args:
- cols: flattened numpy array of shape (4, n)
cols[0]: values cols[1]: latitude cols[2]: longitude cols[3]: depth
- return: gridded
dict(values=gridspace, lats=ygrid, lons=xgrid, depths=zgrid)
- kadlu.geospatial.data_sources.data_util.reshape_4D(cols)[source]
prepare loaded data for interpolation
- TODO: review, validate and cleanup this function
especially the replacement of nan values!
- args:
- cols: flattened numpy array of shape (5, n)
cols[0]: values cols[1]: latitude cols[2]: longitude cols[3]: depth cols[4]: time (hours since )
- return: gridded
dict(values=gridspace, lats=ygrid, lons=xgrid, depths=zgrid, times=tgrid)
- class kadlu.geospatial.data_sources.data_util.storage_cfg(setdir=None)[source]
Bases:
PathLike
return filepath containing storage configuration string
first checks the config.ini file in kadlu root folder, then defaults to kadlu/storage
- cfg = <configparser.ConfigParser object>