The pdp_util module

The pdp_util module

pdp_util.get_session(dsn)

Function which provides module-level database sessions.

This function creates an sqlalchemy database engine and session factory for a given dsn. If the session factory has not yet been created for this invocation of the program, it will create it and store it at the module level. Subsequent invocations of this function with the same arguments will return a new session factory for the same engine. The engine takes care of connection pooling and everything. See sqlalchemy docs for more details.

Example:

from pcic import get_session
conn_params = {'database': 'crmp', 'user': 'hiebert', 'host': 'monsoon.pcic.uvic.ca'}
session_factory = get_session(conn_params)
session = session_factory()
query = session.execute("SELECT sum(count) FROM climo_obs_count_mv NATURAL JOIN meta_history WHERE ARRAY[station_id] <@ :stns", {'stns': [100, 203]})
query.fetchone()

next_session_factory = get_session(conn_params)
next_session_factory == session_factory # Returns True
next_session_factory() == session # Returns False
Parameters:dsn – dict or sqlalchemy-style dns string
Return type:session factory
pdp_util.ping_connection(dbapi_connection, connection_record, connection_proxy)

This function is an event listener that “pings” (runs an inexpensive query and discards the results) each time a connection is checked out from the connection pool. If the ping fails, this method raises a DisconnectionError which forces the current connection to be disposed See: http://docs.sqlalchemy.org/en/rel_0_9/core/events.html for further details.

pdp_util.session_scope(*args, **kwds)

Provide a transactional scope around a series of operations.

Miscellaneous Apps

class pdp_util.map.MapApp(**kwargs)
__call__(environ, start_response)

Call the MapApp, start the response, and generate the content

__init__(**kwargs)

Initialize the MapApp

Parameters:
  • root – The absolute URL of where this application lives
  • gs_url – Absolute URL to a GeoServer instance
  • templates – filesystem path to where the templates live
  • version – project version string
Return type:

MapApp

__weakref__

list of weak references to the object (if defined)

class pdp_util.counts.CountStationsApp(session_scope_factory=None)

Application for counting the number of stations that meet the query parameters

class pdp_util.counts.CountRecordLengthApp(session_scope_factory, max_stns)

Applications for estimating the length of the dataset which would be returned by the stations which meet the given criteria

class pdp_util.legend.LegendApp(conn_params)

WSGI app that creates symbols for the network legend

Each station on the PCDS map is colored by the network attribute and the colors are stored in the crmp database. This app queries the color table on instantiation and then responds to requests with a png with the appropriate color. As such, if the database is updated during run time, the changes will not take effect. The app will always set the Last-Modified header to be the time of instantiation.

Network name is determined from the PATH_INFO matching [network_name].png. network_name must be the lower case of the actual network_name attribute. For example PATH_INFO = motie.png will return the symbol for the MoTIe network. If the network is not found, a white symbol is returned.

This app checks for the HTTP If-Modified-Since header and returns a 304 Not Modified response if possible.

__call__(...) <==> x(...)
__init__(conn_params)
Parameters:conn_params (dict) –
__weakref__

list of weak references to the object (if defined)

class pdp_util.legend.LegendApp(conn_params)

WSGI app that creates symbols for the network legend

Each station on the PCDS map is colored by the network attribute and the colors are stored in the crmp database. This app queries the color table on instantiation and then responds to requests with a png with the appropriate color. As such, if the database is updated during run time, the changes will not take effect. The app will always set the Last-Modified header to be the time of instantiation.

Network name is determined from the PATH_INFO matching [network_name].png. network_name must be the lower case of the actual network_name attribute. For example PATH_INFO = motie.png will return the symbol for the MoTIe network. If the network is not found, a white symbol is returned.

This app checks for the HTTP If-Modified-Since header and returns a 304 Not Modified response if possible.

__call__(...) <==> x(...)
__init__(conn_params)
Parameters:conn_params (dict) –
__weakref__

list of weak references to the object (if defined)

Modules for dispatching to Pydap

pdp_util.pcds_dispatch

class pdp_util.pcds_dispatch.PcdsDispatcher(**kwargs)

This class is a WSGI app which interprets parts of a URL and routes the request to one of several handlers

It is assumed that the URL points to something like http://tools.pacificclimate.org/data_portal/pydap/pcds/raw/MoE/0260011/

In this case PATH_INFO will be /raw/MoE/0260011/

The dispatcher breaks the url pieces into three parts:

  1. is_climo = (raw|climo) i.e. should the app be looking for climatologies or raw observations
  2. network: the short network abbreviation
  3. station: this is the native_id in the database

If is_climo is unspecified, the app will route to pcic.pcds_index.PcdsIsClimoIndex

If is_climo is incorrectly specified, the app will return a 404 HTTPNotFound

If network is unspecified, the app will route to pcic.pcds_index.PcdsNetworkIndex

If network is specified as a non-existent network, it will just show an empty network listing

If station is unspecified, the app will route to pcic.pcds_index.PcdsStationIndex

If station is specified as a non-existant station, it will return a 404 HTTPNotFound

If any extra garbage is found on the end of an otherwise valid path, the app will redirect with an HTTPSeeOther to the pcic.pcds_index.PcdsStationIndex for the specified station

__call__(...) <==> x(...)
__init__(**kwargs)

Initialize the app. Generally these arguments will all come out of the global config.

Parameters:
  • templates
  • app_root
  • ol_path
  • conn_params
__weakref__

list of weak references to the object (if defined)

pdp_util.pcds_index

class pdp_util.pcds_index.PcdsIndex(**kwargs)

WSGI app which is a base class for templating database dependent listings

The app should be configured with local args conn_params so that it can make a database connection

Subclasses must implement the get_elements() method which returns an iterable of 2-tuples (the things to list)

Subclasses may set the options in kwargs: title, short_name, long_name

Parameters:
  • conn_params (dict) –
  • app_root (str) – The absolute URL of where this application lives
  • templates (str) – filesystem path to where the templates live
__call__(...) <==> x(...)
__init__(**kwargs)

x.__init__(…) initializes x; see help(type(x)) for signature

__weakref__

list of weak references to the object (if defined)

get_elements(sesh)

Stub function

Raises:NotImplementedError
render(**kwargs)

Loads and renders the index page template and returns an HTML stream

Parameters:elements (list) – a list of (name, description) pairs which will be listed on the index page
Return type:str
class pdp_util.pcds_index.PcdsIsClimoIndex(**kwargs)

WSGI app which renders an index page just showing “climo” and “raw”. Super simple.

__init__(**kwargs)
Parameters:
  • title (str) – Title for the index page
  • short_name (str:) – First column header (usually a short name)
  • long_name (str) – Second column header (usually a longer description)
get_elements(sesh)

Stub function

Raises:NotImplementedError
class pdp_util.pcds_index.PcdsNetworkIndex(**kwargs)

WSGI app which renders an index page for all of the networks in the PCDS

__init__(**kwargs)
Parameters:
  • title (str) – Title for the index page
  • short_name (str:) – First column header (usually a short name)
  • long_name (str) – Second column header (usually a longer description)
  • is_climo (Boolean) – Is this an index for climatolies rather than raw data?
get_elements(sesh)

Runs a database query and returns a list of (network_name, network_description) pairs for which there exists either climo or raw data.

class pdp_util.pcds_index.PcdsStationIndex(**kwargs)

WSGI app which renders an index page for all of the stations in a given PCDS network

__init__(**kwargs)
Parameters:
  • title (str) – Title for the index page
  • short_name (str:) – First column header (usually a short name)
  • long_name (str) – Second column header (usually a longer description)
get_elements(sesh)

Runs a database query and returns a list of (native_id, station_name) pairs which are in the given PCDS network.

Data Delivery

pdp_util.agg

This module provides aggregation utilities to translate a single HTTP request into multiple OPeNDAP requests, returning a single response

class pdp_util.agg.PcdsZipApp(dsn, sesh=None)

WSGI application which accepts a set of PCDS filters in the request and responds with a generator which streams the OPeNDAP responses one by one

__call__(environ, start_response)

Fire off pydap requests and return an iterable (from ziperator())

__init__(dsn, sesh=None)

x.__init__(…) initializes x; see help(type(x)) for signature

__weakref__

list of weak references to the object (if defined)

pdp_util.agg.agg_generator(global_conf, **kwargs)

Factory function for the PcdsZipApp

Parameters:
  • global_conf – dict containing the key conn_params which is passed on to PcdsZipApp. Everything else is ignored.
  • kwargs – ignored
pdp_util.agg.get_all_metadata_index_responders(sesh, stations, climo=False)

This function is a generator which yields (name, generator) pairs where name is the filename (e.g. [network_name].csv) and generator streams a csv file with information on the network’s variables

Parameters:
  • stations – A list of (network_name, native_id) pairs representing the stations for which this response should include variable metadata
  • climo (bool) – Should these be climatological variables?
Return type:

iterator

pdp_util.agg.get_pcds_responders(dsn, stns, extension, clip_dates, environ)

Iterator object which coalesces a list of stations, compresses them, and returns the data for the response

Parameters:
  • dsn
  • stations – A list of (network_name, native_id) pairs representing the stations for which this response should include variable metadata
  • extension (str) – extension representing the response file type which should be appended to the request
  • clip_dates – pair datetime.datetime objects representing the start and end times for which data should be returned (inclusive)
  • environ (dict) – WSGI environment variables which optionally set the download-climatology field
Return type:

iterator

pdp_util.agg.metadata_index_responder(sesh, network, climo=False)

The function creates a pydap csv response which lists variable metadata out of the database. It returns an generator for the contents of the file

Parameters:
  • sesh (sqlalchemy.orm.session.Session) – database session
  • network (str) – Name of the network for which variables should be listed
Return type:

generator

pdp_util.agg.ziperator(responders)

This method creates and returns an iterator which yields bytes for a ZipFile that contains a set of files from OPeNDAP requests. The method will spool the first one gigabyte in memory using a SpooledTemporaryFile, after which it will use disk.

Parameters:responders – A list of (name, generator) pairs where name is the filename to use in the zip archive and generator should yield all bytes for a single file.
Return type:iterator

Raster Stuff

class pdp_util.raster.EnsembleCatalog(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})

WSGI app to list an ensemble catalog

__call__(...) <==> x(...)
__init__(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})

x.__init__(…) initializes x; see help(type(x)) for signature

__weakref__

list of weak references to the object (if defined)

class pdp_util.raster.RasterCatalog(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})

WSGI app which is a subclass of RasterServer. Filters the urls on call to permit only MetaData requests

__call__(environ, start_response)

An override of RasterServer’s __call__ which allows only MetaData requests

class pdp_util.raster.RasterMetadata(dsn)

WSGI app to query metadata from the MDDB.

__call__(environ, start_response)

Handle requests for metadata

__init__(dsn)

Initialize the application

Parameters:dsn – sqlalchemy-style dns string with database dialect and connection options. Example: “postgresql://scott:tiger@localhost/test”
__weakref__

list of weak references to the object (if defined)

class pdp_util.raster.RasterServer(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})

WSGI app which is a subclass of PyDap’s DapServer to do dynamic (non-filebased) configuration, for serving rasters

__call__(environ, start_response)

An override of Pydap’s __call__ which overrides catalog requests, but defers to pydap for data requests

__init__(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})

Initialize the application

Parameters:config (dict) – A config dict that can be read by yaml.load() and includes the key handlers. handlers must be a list of dicts each containing the keys: url and file.
pdp_util.raster.db_raster_catalog(session, ensemble, root_url)

A function which queries the database for all of the raster files belonging to a given ensemble. Returns a dict where keys are the dataset unique ids and the value is the filename for the dataset.

Parameters:
  • session – SQLAlchemy session for the pcic_meta database
  • ensemble – Name of the ensemble for which member files should be listed
  • root_url – Base URL which should be prepended to the beginning of each dataset ID
Return type:

dict

pdp_util.raster.db_raster_configurator(session, name, version, api_version, ensemble, root_url='/')

A function to construct a config dict which is usable for configuring Pydap for serving rasters

Parameters:
  • session – SQLAlchemy session for the pcic_meta database
  • name – Name of this server e.g. my-raster-server
  • version – Version of the server application
  • api_version – OPeNDAP API version?
  • ensemble – The identifier for the PCIC MetaData DataBase (Mddb) ensemble to configure
  • root_url – URL to prepend to all of the dataset ids

Random Stuff

class pdp_util.filters.FormFilter

A simple class for validating form input and mapping the input to a database constraint on the crmp_network_geoserver table.

NOTE regarding filtering on variables:

Filtering on variables (parameters input-var and input-vars), is based on the column crmp_network_geoserver.vars. This column contains a comma- separated (actually: ‘, ‘-separated) list of variable identifiers aggregated over the history_id.

A variable identifier is formed from a row of pycds.Variable (table meta_vars) by concatenating (without separator) the column standard_name and the string derived from column cell_method by replacing all occurrences of the string ‘time: ‘ with ‘_’. It is unknown at the date of this writing why this replacement is performed. For reference, an identifier is formed by the following PostgreSQL expression (see CRMP database view collapsed_vars_v, column vars): ` array_to_string(array_agg(meta_vars.standard_name::text || regexp_replace(meta_vars.cell_method::text, 'time: '::text, '_'::text, 'g'::text)), ', '::text) `

pdp_util.filters.validate_vars(environ)

Iterate over the POST variables and convert them to SQL constraints

Parameters:environ – dict which can include:
  • from-date
  • to-date
  • network-name
  • input-var
  • input-freq
  • input-polygon
  • only-with-climatology
Return type:list of callables or text SQL constraints