Web sources internal API

These modules are abstracted by kadlu.load, which should generally be used instead for fetching and loading from web sources. These docs are provided for contributors wishing to add new web data source modules that can be accessed by kadlu.load.

kadlu.load works by mapping strings to function calls in kadlu.geospatial.data_sources.source_map as a convenience by passing the source and variable name; new data source modules should have a corresponding entry within the fetch and load maps there.

A general pattern for adding fetch/load modules is that fetching is an implicit action of the load function. each fetch and load function should accept a set of boundary arguments equal to or subsetting the following function signature, such that keyword args can be passed as a dict by fetch_handler:

(south=-90, west=-180, north=90, east=180, top=0, bottom=5000, start=datetime(2000,1,1), end=datetime(2000,1,1,1))


Fetch Handler

used for generating fetch requests for web data sources

Source Map

used for generating fetch requests for web data sources

data fetching utils, function maps, and constant variables collection of objects

class kadlu.geospatial.data_sources.source_map.SourceMap[source]

Bases: object

default_val()[source]
load_map()[source]
source_map()[source]
var3d()[source]

CHS

ERA5

API for Era5 dataset from Copernicus Climate Datastore.

Note: Initial release data (ERA5T) are available about 5 days behind real time.

Dataset summary:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview

Detailed information about the dataset:

https://confluence.ecmwf.int/display/CKB/ERA5

API documentation:

https://cds.climate.copernicus.eu/toolbox/doc/api.html

class kadlu.geospatial.data_sources.era5.Era5[source]

Bases: object

Collection of module functions for fetching and loading.

The functions return (values, lat, lon, epoch) numpy arrays with shape (num_points, 4) where epoch is the number of hours since 2000-01-01.

load_flux_ocean(**kwargs)[source]
load_flux_waves(**kwargs)[source]
load_insolation(**kwargs)[source]
load_irradiance(**kwargs)[source]
load_precip_type(**kwargs)[source]
load_precipitation(**kwargs)[source]
load_snowfall(**kwargs)[source]
load_stress_ocean(**kwargs)[source]
load_wavedirection(**kwargs)[source]
load_waveperiod(**kwargs)[source]
load_wind_u(**kwargs)[source]
load_wind_uv(fetch=True, **kwargs)[source]

Loads wind speed computed as sqrt(wind_u^2 + wind_v^2)

load_wind_v(**kwargs)[source]
load_windwaveswellheight(**kwargs)[source]
kadlu.geospatial.data_sources.era5.clear_cache_era5()[source]

Removes all files with the filename pattern ERA5_*.grb2 in the Kadlu storage directory

kadlu.geospatial.data_sources.era5.fetch_era5(var, *, west, east, south, north, start, **_)[source]

Fetch global ERA5 data for specified variable, geographic region, and time range.

Downloads 24-hours of global data on the specified day, and saves these data to a *.grb2 file in the kadlu data storage directory, using the recommended spatial resolution of 0.25 x 0.25 degrees.

The *.grb2 file can be deleted manually by calling the clear_cache_era5 function to save disk space, if necessary.

Only data within the specified geographic boundaries (west, east, south, north) are inserted into the kadlu geospatial.db database.

args:
var: string

The variable short name of desired wave parameter according to ERA5 docs. The complete list can be found here (table 7 for wave params): https://confluence.ecmwf.int/display/CKB/ERA5+data+documentation#ERA5datadocumentation-Temporalfrequency

west,east,south,north: float

Geographic boundaries of the data request

start: datetime.datetime

UTC date of the data request. 24-hours of data will be fetched.

return:

True if new data was fetched, else False

kadlu.geospatial.data_sources.era5.initdb()[source]

Create tables in kadlu’s geospatial.db database for storing ERA5 data

kadlu.geospatial.data_sources.era5.load_era5(var, *, west, east, south, north, start, end, fetch=True, **_)[source]

Load ERA5 data from local geospatial.db database

Args:
var: str

Variable to be fetched

west,east,south,north: float

Geographic boundaries of the data request

start: datetime.datetime

UTC start time for the data request.

end: datetime.datetime

UTC end time for the data request.

fetch: bool

If the data have not already been downloaded and inserted into Kadlu’s local geospatial database, fetch data from the Copernicus Climate Data Store (CDS) automatically using the CDS API. Default is True.

Returns:
values:

values of the fetched var

lat:

y grid coordinates

lon:

x grid coordinates

epoch:

timestamps in epoch hours since jan 1 2000

GEBCO

class kadlu.geospatial.data_sources.gebco.Gebco[source]

Bases: object

fetch_bathymetry_grid()[source]

download netcdf archive and extract it

fetch_callback(south, north, west, east, top=None, bottom=None, start=None, end=None)[source]

build data grid indexes from .nc file and insert into database

load_bathymetry(south, north, west, east, **_)[source]

load gebco bathymetry data

kadlu.geospatial.data_sources.gebco.initdb()[source]

HYCOM

data source:

https://www.hycom.org/dataserver/gofs-3pt1/analysis

web interface for manual hycom data retrieval:

https://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0.html

GLBy0.08/expt_93.0

Time range: Dec 4th, 2018 to Present Longitude convention: 0 to +360 degrees Depth coordinates are positive below the sea surface Times are in epoch hours since 2000-01-01 00:00 UTC

API limitations:
  • Do not subset more than 1 day at a time using ncss.hycom.org

  • Do not use more than 2 concurrent connections per IP address when downloading data from ncss.hycom.org

class kadlu.geospatial.data_sources.hycom.Hycom[source]

Bases: object

Collection of module functions for fetching and loading HYCOM data.

Attributes:
lat, lon: arrays

Lat/lon coordinates.

epoch: array

Time coordinates. Times are formatted as epoch hours since 2000-01-01 00:00

depth: array

Depth coordinates.

callback(var, max_attempts=3, **kwargs)[source]

Builds indices for query, fetches data from HYCOM, and inserts into local database.

Note: Null/NaN values are removed before the data is inserted into the local database. Null/NaN values occur when the grid overlaps with land or extends below the seafloor.

TODO: Add download progress bar, e.g., using the approach described here:

https://stackoverflow.com/questions/37573483/progress-bar-while-download-file-over-http-with-requests

Args:
var: string

Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html

max_attempts: int

Maximum number of request attempts. Default is 3. Each request has a timeout of 120 s.

kwargs: dict

boundaries as keyword arguments

fetch_hycom(var, kwargs, max_attempts=3)[source]

Fetch data from the HYCOM server.

Kadlu’s index class is used for ‘binning’ the data requests into requests that span 1 degree in lat/lon, 24 hours in time (1 day), and 0-5000 m in depth.

A data request that spans multiple lat/lon degrees, multiple days, or includes depths greater than 5000m is split into multiple such ‘binned’ requests.

Conversely, smaller data requests are ‘inflated’ to the size of one ‘bin’.

Args:
var: string

Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html

kwargs: dict

boundaries as keyword arguments

max_attempts: int

Maximum number of request attempts. Default is 3. Each request has a timeout of 120 s.

load_hycom(var, kwargs)[source]

Load HYCOM data from local database.

If data is not present, attempts to fetch it from the HYCOM server.

Although HYCOM uses the 0 to +360 degree longitude convention, the longitude coordinates returned by this method adhere to the -180 to +180 convention used everywhere else in Kadlu.

Args:
var:

Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html

south, north: float

ymin, ymax coordinate values. range: -90, 90

kwargs: dict (boundaries as keyword arguments)
west, east: float

xmin, xmax coordinate values. range: -180, 180

start, end: datetime

temporal boundaries in datetime format

Returns:
values: array

values of the fetched variable

lat: array

y grid coordinates

lon: array

x grid coordinates

epoch: array

timestamps in epoch hours since jan 1 2000

depth: array

measured in meters

load_salinity(**kwargs)[source]
load_temp(**kwargs)[source]
load_water_u(**kwargs)[source]
load_water_uv(**kwargs)[source]

Load water speed, computed as sqrt(vu^2 + vv^2)

If data is not present, attempts to fetch it from the HYCOM server.

Args:
var:

Variable to be fetched. complete list of variables here https://tds.hycom.org/thredds/dodsC/GLBv0.08/expt_53.X/data/2015.html

south, north: float

ymin, ymax coordinate values. range: -90, 90

kwargs: dict (boundaries as keyword arguments)
west, east: float

xmin, xmax coordinate values. range: -180, 180

start, end: datetime

temporal boundaries in datetime format

Returns:
values: array

Water speed values, in m/s

lat: array

y grid coordinates

lon: array

x grid coordinates

epoch: array

timestamps in epoch hours since jan 1 2000

depth: array

measured in meters

load_water_v(**kwargs)[source]
kadlu.geospatial.data_sources.hycom.fetch_grid(**_)[source]

Download HYCOM lat/lon/time/depth arrays for grid indexing.

Times are formatted as epoch hours since 2000-01-01 00:00.

Returns:
lat, lon, epoch, depth: numpy array

The coordinate arrays

kadlu.geospatial.data_sources.hycom.initdb()[source]

Create tables in kadlu’s geospatial.db database for storing HYCOM data

kadlu.geospatial.data_sources.hycom.load_grid()[source]

Put spatial grid into memory

kadlu.geospatial.data_sources.hycom.slices_str(var, slices, steps=(1, 1, 1, 1))

IFREMER

WWIII

Kadlu API for the NOAA WaveWatch III Datastore

User guides:

https://github.com/NOAA-EMC/WW3/wiki/WAVEWATCH-III-User-Guide

Data model description (boundary definitions, map visualizations, etc)

https://polar.ncep.noaa.gov/waves/implementations.php

class kadlu.geospatial.data_sources.wwiii.Wwiii[source]

Bases: object

collection of module functions for fetching and loading

load_wavedirection(**kwargs)[source]
load_waveperiod(**kwargs)[source]
load_wind_u(**kwargs)[source]
load_wind_uv(**kwargs)[source]
load_wind_v(**kwargs)[source]
load_windwaveheight(**kwargs)[source]
kadlu.geospatial.data_sources.wwiii.fetch_wwiii(var, **kwargs)[source]

download wwiii data and return associated filepaths

args:
var: string

the variable name of desired parameter according to WWIII docs the complete list of variables can be found at the following URL under ‘model output’ https://polar.ncep.noaa.gov/waves/implementations.php

south, north: float

ymin, ymax coordinate boundaries (latitude). range: -90, 90

west, east: float

xmin, xmax coordinate boundaries (longitude). range: -180, 180

start: datetime

the start of the desired time range

end: datetime

the end of the desired time range

return:

True if new data was fetched, else False

kadlu.geospatial.data_sources.wwiii.initdb()[source]
kadlu.geospatial.data_sources.wwiii.insert(table, agg)[source]

insert parsed data into local database

kadlu.geospatial.data_sources.wwiii.load_wwiii(var, kwargs)[source]

return downloaded wwiii data for specified wavevar according to given time, lat, lon boundaries

args:
var: string

the variable short name of desired wave parameter according to WWIII docs the complete list of variable short names can be found here (under ‘model output’) https://polar.ncep.noaa.gov/waves/implementations.php

south, north: float

ymin, ymax coordinate boundaries (latitude). range: -90, 90

west, east: float

xmin, xmax coordinate boundaries (longitude). range: -180, 180

start: datetime

the start of the desired time range

end: datetime

the end of the desired time range

return:

val, lat, lon, epoch as np arrays of floats


Data Utils

additional tools and utils used across data loading modules

class kadlu.geospatial.data_sources.data_util.Boundary(south, north, west, east, fetchvar='', **_)[source]

Bases: object

compute intersecting boundaries with separating axis theorem

intersects(other)[source]
kadlu.geospatial.data_sources.data_util.database_cfg()[source]

configure and connect to sqlite database

time is stored as an integer in the database, where each value is epoch hours since 2000-01-01 00:00

returns:
conn:

database connection object

db:

connection cursor object

kadlu.geospatial.data_sources.data_util.dt_2_epoch(dt_arr, t0=datetime.datetime(2000, 1, 1, 0, 0))[source]

convert datetimes to epoch hours

kadlu.geospatial.data_sources.data_util.epoch_2_dt(ep_arr, t0=datetime.datetime(2000, 1, 1, 0, 0), unit='hours')[source]

convert epoch hours to datetimes

kadlu.geospatial.data_sources.data_util.ext(filepath, extensions)
kadlu.geospatial.data_sources.data_util.flatten(cols, frames)[source]

dimensional reduction by taking average of time frames

kadlu.geospatial.data_sources.data_util.fmt_coords(kwargs)[source]

Formats spatial coordinates as a human readable character string

Args:
kwargs: dict

Must have keys south, north, east, west and optionally also top and bottom

Returns:
: str

Nicely formatted string

kadlu.geospatial.data_sources.data_util.fmt_time(kwargs)[source]

Formats time window as a human readable character string

Args:
kwargs: dict

Must have keys start and end

Returns:
: str

Nicely formatted string

kadlu.geospatial.data_sources.data_util.index_arr(val, sorted_arr)[source]

converts value in coordinate array to grid index

kadlu.geospatial.data_sources.data_util.ll_2_regionstr(south, north, west, east, regions, default=[])[source]

convert input bounds to region strings with seperating axis theorem

kadlu.geospatial.data_sources.data_util.logmsg(source, var, ntup=(), **kwargs)[source]

Log message informing that data was inserted into the kadlu database

The message includes

  • the name of the data source

  • the name of the variable

  • the number of data points added to the database

  • the geographical region

  • the time period

Args:
source: str

Data source name

var: str

Variable name

ntup: tuple

… If None, the message will state that no data was found.

Keyword args:
south,north,east,west: float

Geographic region

top,bottom: float

Depth range

start,end: datetime.datetime

Time window

kadlu.geospatial.data_sources.data_util.logmsg_nodata(source, var, **kwargs)[source]

Log message informing that data was inserted in the database

kadlu.geospatial.data_sources.data_util.reshape_2D(cols)[source]
kadlu.geospatial.data_sources.data_util.reshape_3D(cols)[source]
kadlu.geospatial.data_sources.data_util.reshape_3D_gridded(cols)[source]

prepare loaded data for interpolation

TODO: review and cleanup this function

args:
cols: flattened numpy array of shape (4, n)

cols[0]: values cols[1]: latitude cols[2]: longitude cols[3]: depth

return: gridded

dict(values=gridspace, lats=ygrid, lons=xgrid, depths=zgrid)

kadlu.geospatial.data_sources.data_util.reshape_4D(cols)[source]

prepare loaded data for interpolation

TODO: review, validate and cleanup this function

especially the replacement of nan values!

args:
cols: flattened numpy array of shape (5, n)

cols[0]: values cols[1]: latitude cols[2]: longitude cols[3]: depth cols[4]: time (hours since )

return: gridded

dict(values=gridspace, lats=ygrid, lons=xgrid, depths=zgrid, times=tgrid)

class kadlu.geospatial.data_sources.data_util.storage_cfg(setdir=None)[source]

Bases: PathLike

return filepath containing storage configuration string

first checks the config.ini file in kadlu root folder, then defaults to kadlu/storage

cfg = <configparser.ConfigParser object>
default_storage(msg)[source]

helper function for storage_cfg()

kadlu.geospatial.data_sources.data_util.str_def(self, info, args)[source]

builds string definition for data source class objects

kadlu.geospatial.data_sources.data_util.verbosity(set_verbosity=None)[source]

__file__ = ‘/home/matt/kadlu/kadlu/geospatial/data_sources/data_util.py’