This is an old revision of the document!
Gridded data indexer (Grindex)
Results from discussion in Bergen, 08.09.2010
Background
Diana starts now reading gridded data via Fimex. Fimex provides access to a information unit, i.e. a model run, a netcdf-file. Fimex does not provide the functionality to efficiently extract the information unit from an archive, e.g. a directory with model-files (/opdata ?, /starc/DNMI_HIRLAM4) or a database (WDB).
What Grindex should do
General indexer should be an indexer API with user-configurable changeable index-types. It should enable us to extract a set of information from a certain data-source in a fast way. Grindex should be API driven and easily to integrate with diana and Fimex.
Exampel input types
- filter on model
- filter on referenceTime (from/to)
- additinal restrictions
- all given as ascii-strings (boost::regex?)
- extra input required for (datatype (grib,felt,nc,wdb), config-file for input)
Example data sources
- catalog (ftp, filesystem, http), filenames (patterns)
- fimex enabled files (parameters in fimex)
- wdb
- CSW (long term)
Expected return values
- count of results
- fimex vector<boost::shared_ptr<CDMReader»
- dump to netcdf-files (via fimex/NetCDFWriter) for testing (command-line tool)
Concrete example of input
DNMI_HIRLAM4/2010/09/15/grdqh00.dat_20100915 | HIRLAM4 = model, 2010/09/15/00 = reference time |
arctic_mfc-b2010091500-f2010091506.nc | arctic_mfc = model, 2010091500 = reference time |
/opdata/hirlam4/grdqh00.dat | hirlam4 = model, reference time from data content |
API outline
str searchCriteria = "MODEL,REFERANCETIME,FILENAMEMATCH:"*=MODEL/YY/MM/DD/grdqhHH.dat_*"'; gr = new Grindex(uri, string searchCriteria, dataformat, config) str searchDSL = "model=*;refernceTime < 2007-08-09' GrindexFind found = gr->find(str searchDSL) size_t count = found->count() vector<boost::shared_ptr<CDMReader> > = found->cdmReaders();
Fimex input from WDB after filtering
(Maybe this should be a part of fimex, but very related to Grindex?)
- dataprovider
- shape-name (grid-information, proj-string (also on latlong), proj-units (required for m), axes in m/degree)
- ref-time
- valid-time (from, to) [bounds]
- parameter-name (no convention yet)
- level-names (not level-numbers)
- level (from-to) (no level2-numbers)
- dataversion (eps primarily, different version of same (new model-run, same ref-time))
- referanse to field