Contents
The pdp_util module¶
The pdp_util module
-
pdp_util.
get_session
(dsn)¶ Function which provides module-level database sessions.
This function creates an sqlalchemy database engine and session factory for a given dsn. If the session factory has not yet been created for this invocation of the program, it will create it and store it at the module level. Subsequent invocations of this function with the same arguments will return a new session factory for the same engine. The engine takes care of connection pooling and everything. See sqlalchemy docs for more details.
Example:
from pcic import get_session conn_params = {'database': 'crmp', 'user': 'hiebert', 'host': 'monsoon.pcic.uvic.ca'} session_factory = get_session(conn_params) session = session_factory() query = session.execute("SELECT sum(count) FROM climo_obs_count_mv NATURAL JOIN meta_history WHERE ARRAY[station_id] <@ :stns", {'stns': [100, 203]}) query.fetchone() next_session_factory = get_session(conn_params) next_session_factory == session_factory # Returns True next_session_factory() == session # Returns False
Parameters: dsn – dict or sqlalchemy-style dns string Return type: session factory
-
pdp_util.
ping_connection
(dbapi_connection, connection_record, connection_proxy)¶ This function is an event listener that “pings” (runs an inexpensive query and discards the results) each time a connection is checked out from the connection pool. If the ping fails, this method raises a DisconnectionError which forces the current connection to be disposed See: http://docs.sqlalchemy.org/en/rel_0_9/core/events.html for further details.
-
pdp_util.
session_scope
(*args, **kwds)¶ Provide a transactional scope around a series of operations.
Miscellaneous Apps¶
-
class
pdp_util.map.
MapApp
(**kwargs)¶ -
__call__
(environ, start_response)¶ Call the MapApp, start the response, and generate the content
-
__init__
(**kwargs)¶ Initialize the MapApp
Parameters: - root – The absolute URL of where this application lives
- gs_url – Absolute URL to a GeoServer instance
- templates – filesystem path to where the templates live
- version – project version string
Return type:
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
pdp_util.counts.
CountStationsApp
(session_scope_factory=None)¶ Application for counting the number of stations that meet the query parameters
-
class
pdp_util.counts.
CountRecordLengthApp
(session_scope_factory, max_stns)¶ Applications for estimating the length of the dataset which would be returned by the stations which meet the given criteria
-
class
pdp_util.legend.
LegendApp
(conn_params)¶ WSGI app that creates symbols for the network legend
Each station on the PCDS map is colored by the network attribute and the colors are stored in the crmp database. This app queries the color table on instantiation and then responds to requests with a png with the appropriate color. As such, if the database is updated during run time, the changes will not take effect. The app will always set the Last-Modified header to be the time of instantiation.
Network name is determined from the PATH_INFO matching
[network_name].png
.network_name
must be the lower case of the actual network_name attribute. For examplePATH_INFO = motie.png
will return the symbol for the MoTIe network. If the network is not found, a white symbol is returned.This app checks for the HTTP If-Modified-Since header and returns a 304 Not Modified response if possible.
-
__call__
(...) <==> x(...)¶
-
__init__
(conn_params)¶ Parameters: conn_params (dict) –
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
pdp_util.legend.
LegendApp
(conn_params) WSGI app that creates symbols for the network legend
Each station on the PCDS map is colored by the network attribute and the colors are stored in the crmp database. This app queries the color table on instantiation and then responds to requests with a png with the appropriate color. As such, if the database is updated during run time, the changes will not take effect. The app will always set the Last-Modified header to be the time of instantiation.
Network name is determined from the PATH_INFO matching
[network_name].png
.network_name
must be the lower case of the actual network_name attribute. For examplePATH_INFO = motie.png
will return the symbol for the MoTIe network. If the network is not found, a white symbol is returned.This app checks for the HTTP If-Modified-Since header and returns a 304 Not Modified response if possible.
-
__call__
(...) <==> x(...)
-
__init__
(conn_params) Parameters: conn_params (dict) –
-
__weakref__
list of weak references to the object (if defined)
-
Modules for dispatching to Pydap¶
pdp_util.pcds_dispatch¶
-
class
pdp_util.pcds_dispatch.
PcdsDispatcher
(**kwargs)¶ This class is a WSGI app which interprets parts of a URL and routes the request to one of several handlers
It is assumed that the URL points to something like http://tools.pacificclimate.org/data_portal/pydap/pcds/raw/MoE/0260011/
In this case
PATH_INFO
will be /raw/MoE/0260011/The dispatcher breaks the url pieces into three parts:
is_climo
= (raw|climo) i.e. should the app be looking for climatologies or raw observationsnetwork
: the short network abbreviationstation
: this is the native_id in the database
If
is_climo
is unspecified, the app will route topcic.pcds_index.PcdsIsClimoIndex
If
is_climo
is incorrectly specified, the app will return a 404HTTPNotFound
If
network
is unspecified, the app will route topcic.pcds_index.PcdsNetworkIndex
If
network
is specified as a non-existent network, it will just show an empty network listingIf
station
is unspecified, the app will route topcic.pcds_index.PcdsStationIndex
If
station
is specified as a non-existant station, it will return a 404HTTPNotFound
If any extra garbage is found on the end of an otherwise valid path, the app will redirect with an
HTTPSeeOther
to thepcic.pcds_index.PcdsStationIndex
for the specified station-
__call__
(...) <==> x(...)¶
-
__init__
(**kwargs)¶ Initialize the app. Generally these arguments will all come out of the global config.
Parameters: - templates –
- app_root –
- ol_path –
- conn_params –
-
__weakref__
¶ list of weak references to the object (if defined)
pdp_util.pcds_index¶
-
class
pdp_util.pcds_index.
PcdsIndex
(**kwargs)¶ WSGI app which is a base class for templating database dependent listings
The app should be configured with local args
conn_params
so that it can make a database connectionSubclasses must implement the
get_elements()
method which returns an iterable of 2-tuples (the things to list)Subclasses may set the options in
kwargs
:title
,short_name
,long_name
Parameters: - conn_params (dict) –
- app_root (str) – The absolute URL of where this application lives
- templates (str) – filesystem path to where the templates live
-
__call__
(...) <==> x(...)¶
-
__init__
(**kwargs)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_elements
(sesh)¶ Stub function
Raises: NotImplementedError
-
render
(**kwargs)¶ Loads and renders the index page template and returns an HTML stream
Parameters: elements (list) – a list of ( name
,description
) pairs which will be listed on the index pageReturn type: str
-
class
pdp_util.pcds_index.
PcdsIsClimoIndex
(**kwargs)¶ WSGI app which renders an index page just showing “climo” and “raw”. Super simple.
-
__init__
(**kwargs)¶ Parameters: - title (str) – Title for the index page
- short_name (str:) – First column header (usually a short name)
- long_name (str) – Second column header (usually a longer description)
-
get_elements
(sesh)¶ Stub function
Raises: NotImplementedError
-
-
class
pdp_util.pcds_index.
PcdsNetworkIndex
(**kwargs)¶ WSGI app which renders an index page for all of the networks in the PCDS
-
__init__
(**kwargs)¶ Parameters: - title (str) – Title for the index page
- short_name (str:) – First column header (usually a short name)
- long_name (str) – Second column header (usually a longer description)
- is_climo (Boolean) – Is this an index for climatolies rather than raw data?
-
get_elements
(sesh)¶ Runs a database query and returns a list of (
network_name
,network_description
) pairs for which there exists either climo or raw data.
-
-
class
pdp_util.pcds_index.
PcdsStationIndex
(**kwargs)¶ WSGI app which renders an index page for all of the stations in a given PCDS network
-
__init__
(**kwargs)¶ Parameters: - title (str) – Title for the index page
- short_name (str:) – First column header (usually a short name)
- long_name (str) – Second column header (usually a longer description)
-
get_elements
(sesh)¶ Runs a database query and returns a list of (
native_id
,station_name
) pairs which are in the given PCDS network.
-
Data Delivery¶
pdp_util.agg¶
This module provides aggregation utilities to translate a single HTTP request into multiple OPeNDAP requests, returning a single response
-
class
pdp_util.agg.
PcdsZipApp
(dsn, sesh=None)¶ WSGI application which accepts a set of PCDS filters in the request and responds with a generator which streams the OPeNDAP responses one by one
-
__call__
(environ, start_response)¶ Fire off pydap requests and return an iterable (from
ziperator()
)
-
__init__
(dsn, sesh=None)¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
pdp_util.agg.
agg_generator
(global_conf, **kwargs)¶ Factory function for the
PcdsZipApp
Parameters: - global_conf – dict containing the key conn_params which is passed on to
PcdsZipApp
. Everything else is ignored. - kwargs – ignored
- global_conf – dict containing the key conn_params which is passed on to
-
pdp_util.agg.
get_all_metadata_index_responders
(sesh, stations, climo=False)¶ This function is a generator which yields (
name
,generator
) pairs wherename
is the filename (e.g. [network_name
].csv) andgenerator
streams a csv file with information on the network’s variablesParameters: - stations – A list of (
network_name
,native_id
) pairs representing the stations for which this response should include variable metadata - climo (bool) – Should these be climatological variables?
Return type: iterator
- stations – A list of (
-
pdp_util.agg.
get_pcds_responders
(dsn, stns, extension, clip_dates, environ)¶ Iterator object which coalesces a list of stations, compresses them, and returns the data for the response
Parameters: - dsn –
- stations – A list of (
network_name
,native_id
) pairs representing the stations for which this response should include variable metadata - extension (str) – extension representing the response file type which should be appended to the request
- clip_dates – pair datetime.datetime objects representing the start and end times for which data should be returned (inclusive)
- environ (dict) – WSGI environment variables which optionally set the
download-climatology
field
Return type: iterator
-
pdp_util.agg.
metadata_index_responder
(sesh, network, climo=False)¶ The function creates a pydap csv response which lists variable metadata out of the database. It returns an generator for the contents of the file
Parameters: - sesh (sqlalchemy.orm.session.Session) – database session
- network (str) – Name of the network for which variables should be listed
Return type: generator
-
pdp_util.agg.
ziperator
(responders)¶ This method creates and returns an iterator which yields bytes for a
ZipFile
that contains a set of files from OPeNDAP requests. The method will spool the first one gigabyte in memory using aSpooledTemporaryFile
, after which it will use disk.Parameters: responders – A list of ( name
,generator
) pairs wherename
is the filename to use in the zip archive andgenerator
should yield all bytes for a single file.Return type: iterator
Raster Stuff¶
-
class
pdp_util.raster.
EnsembleCatalog
(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})¶ WSGI app to list an ensemble catalog
-
__call__
(...) <==> x(...)¶
-
__init__
(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})¶ x.__init__(…) initializes x; see help(type(x)) for signature
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
pdp_util.raster.
RasterCatalog
(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})¶ WSGI app which is a subclass of RasterServer. Filters the urls on call to permit only MetaData requests
-
__call__
(environ, start_response)¶ An override of RasterServer’s __call__ which allows only MetaData requests
-
-
class
pdp_util.raster.
RasterMetadata
(dsn)¶ WSGI app to query metadata from the MDDB.
-
__call__
(environ, start_response)¶ Handle requests for metadata
-
__init__
(dsn)¶ Initialize the application
Parameters: dsn – sqlalchemy-style dns string with database dialect and connection options. Example: “postgresql://scott:tiger@localhost/test”
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
pdp_util.raster.
RasterServer
(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})¶ WSGI app which is a subclass of PyDap’s DapServer to do dynamic (non-filebased) configuration, for serving rasters
-
__call__
(environ, start_response)¶ An override of Pydap’s __call__ which overrides catalog requests, but defers to pydap for data requests
-
__init__
(dsn, config={'api_version': 0, 'handlers': [{'url': '/my.nc', 'file': '/opt/dockremap/miroc_3.2_20c_A1B_daily_nc3_0_100.nc'}, {'url': '/my.h5', 'file': '/opt/dockremap/pr+tasmax+tasmin_day_BCCA+ANUSPLIN300+CanESM2_historical+rcp26_r1i1p1_19500101-21001231.h5'}, {'url': '/stuff/', 'dir': '/home/data/climate/downscale/CMIP5/anusplin_downscaling_cmip5/downscaling_outputs/'}], 'name': 'testing-server', 'version': 0})¶ Initialize the application
Parameters: config (dict) – A config dict that can be read by yaml.load()
and includes the key handlers. handlers must be a list of dicts each containing the keys: url and file.
-
-
pdp_util.raster.
db_raster_catalog
(session, ensemble, root_url)¶ A function which queries the database for all of the raster files belonging to a given ensemble. Returns a dict where keys are the dataset unique ids and the value is the filename for the dataset.
Parameters: - session – SQLAlchemy session for the pcic_meta database
- ensemble – Name of the ensemble for which member files should be listed
- root_url – Base URL which should be prepended to the beginning of each dataset ID
Return type: dict
-
pdp_util.raster.
db_raster_configurator
(session, name, version, api_version, ensemble, root_url='/')¶ A function to construct a config dict which is usable for configuring Pydap for serving rasters
Parameters: - session – SQLAlchemy session for the pcic_meta database
- name – Name of this server e.g. my-raster-server
- version – Version of the server application
- api_version – OPeNDAP API version?
- ensemble – The identifier for the PCIC MetaData DataBase (
Mddb
) ensemble to configure - root_url – URL to prepend to all of the dataset ids
Random Stuff¶
-
class
pdp_util.filters.
FormFilter
¶ A simple class for validating form input and mapping the input to a database constraint on the crmp_network_geoserver table.
NOTE regarding filtering on variables:
Filtering on variables (parameters input-var and input-vars), is based on the column crmp_network_geoserver.vars. This column contains a comma- separated (actually: ‘, ‘-separated) list of variable identifiers aggregated over the history_id.
A variable identifier is formed from a row of pycds.Variable (table meta_vars) by concatenating (without separator) the column standard_name and the string derived from column cell_method by replacing all occurrences of the string ‘time: ‘ with ‘_’. It is unknown at the date of this writing why this replacement is performed. For reference, an identifier is formed by the following PostgreSQL expression (see CRMP database view collapsed_vars_v, column vars):
` array_to_string(array_agg(meta_vars.standard_name::text || regexp_replace(meta_vars.cell_method::text, 'time: '::text, '_'::text, 'g'::text)), ', '::text) `
-
pdp_util.filters.
validate_vars
(environ)¶ Iterate over the POST variables and convert them to SQL constraints
Parameters: environ – dict which can include: - from-date
- to-date
- network-name
- input-var
- input-freq
- input-polygon
- only-with-climatology
Return type: list of callables or text SQL constraints