Gridded data indexer (Grindex)
Results from discussion in Bergen, 08.09.2010
Background
Diana starts now reading gridded data via Fimex. Fimex provides access to a information unit, i.e. a model run, a netcdf-file. Fimex does not provide the functionality to efficiently extract the information unit from an archive, e.g. a directory with model-files (/opdata ?, /starc/DNMI_HIRLAM4) or a database (WDB).
What Grindex should do
General indexer should be an indexer API with user-configurable changeable index-types. It should enable us to extract a set of information from a certain data-source in a fast way. Grindex should be API driven and easily to integrate with diana and Fimex.
Exampel input types
- filter on model
- filter on referenceTime (from/to)
- additinal restrictions
- all given as ascii-strings (boost::regex?)
- extra input required for (datatype (grib,felt,nc,wdb), config-file for input)
Example data sources
- catalog (ftp, filesystem, http), filenames (patterns)
- fimex enabled files (parameters in fimex)
- wdb
- CSW (long term)
Expected return values
- count of results
- fimex vector<boost::shared_ptr<CDMReader»
- dump to netcdf-files (via fimex/NetCDFWriter) for testing (command-line tool)
Concrete example of input
DNMI_HIRLAM4/2010/09/15/grdqh00.dat_20100915 | HIRLAM4 = model, 2010/09/15/00 = reference time |
arctic_mfc-b2010091500-f2010091506.nc | arctic_mfc = model, 2010091500 = reference time |
/opdata/hirlam4/grdqh00.dat | hirlam4 = model, reference time from data content |
API outline
str searchCriteria = "MODEL,REFERANCETIME,FILENAMEMATCH:"*=MODEL/YY/MM/DD/grdqhHH.dat_*"'; gr = new Grindex(uri, string searchCriteria, dataformat, config) str searchDSL = "model=*;refernceTime < 2007-08-09' GrindexFind found = gr->find(str searchDSL) size_t count = found->count() vector<boost::shared_ptr<CDMReader> > = found->cdmReaders();
Fimex input from WDB after filtering
(Maybe this should be a part of fimex, but very related to Grindex?)
- dataprovider
- shape-name (grid-information, proj-string (also on latlong), proj-units (required for m), axes in m/degree)
- ref-time
- valid-time (from, to) [bounds]
- parameter-name (no convention yet)
- level-names (not level-numbers)
- level (from-to) (no level2-numbers)
- dataversion (eps primarily, different version of same (new model-run, same ref-time))
- referanse to field