Gridded data indexer (Grindex)
Results from discussion in Bergen, 08.09.2010
Background
Diana starts now reading gridded data via Fimex. Fimex provides access to a information unit, i.e. a model run, a netcdf-file. Fimex does not provide the functionality to efficiently extract the information unit from an archive, e.g. a directory with model-files (/opdata ?, /starc/DNMI_HIRLAM4) or a database (WDB).
What Grindex should do
General indexer should be an indexer API with user-configurable changeable index-types. It should enable us to extract a set of information from a certain data-source in a fast way. Grindex should be API driven and easily to integrate with diana and Fimex.
Exampel input types
- filter on model
 - filter on referenceTime (from/to)
 - additinal restrictions
 - all given as ascii-strings (boost::regex?)
 - extra input required for (datatype (grib,felt,nc,wdb), config-file for input)
 
Example data sources
- catalog (ftp, filesystem, http), filenames (patterns)
 - fimex enabled files (parameters in fimex)
 - wdb
 - CSW (long term)
 
Expected return values
- count of results
 - fimex vector<boost::shared_ptr<CDMReader»
 - dump to netcdf-files (via fimex/NetCDFWriter) for testing (command-line tool)
 
Concrete example of input
| DNMI_HIRLAM4/2010/09/15/grdqh00.dat_20100915 | HIRLAM4 = model, 2010/09/15/00 = reference time | 
| arctic_mfc-b2010091500-f2010091506.nc | arctic_mfc = model, 2010091500 = reference time | 
| /opdata/hirlam4/grdqh00.dat | hirlam4 = model, reference time from data content | 
API outline
str searchCriteria = "MODEL,REFERANCETIME,FILENAMEMATCH:"*=MODEL/YY/MM/DD/grdqhHH.dat_*"'; gr = new Grindex(uri, string searchCriteria, dataformat, config) str searchDSL = "model=*;refernceTime < 2007-08-09' GrindexFind found = gr->find(str searchDSL) size_t count = found->count() vector<boost::shared_ptr<CDMReader> > = found->cdmReaders();
Fimex input from WDB after filtering
(Maybe this should be a part of fimex, but very related to Grindex?)
- dataprovider
 - shape-name (grid-information, proj-string (also on latlong), proj-units (required for m), axes in m/degree)
 - ref-time
 - valid-time (from, to) [bounds]
 - parameter-name (no convention yet)
 - level-names (not level-numbers)
 - level (from-to) (no level2-numbers)
 - dataversion (eps primarily, different version of same (new model-run, same ref-time))
 - referanse to field