====== Gridded data indexer (Grindex) ====== Results from discussion in Bergen, 08.09.2010 ===== Background ===== Diana starts now reading gridded data via Fimex. Fimex provides access to a information unit, i.e. a model run, a netcdf-file. Fimex does not provide the functionality to efficiently extract the information unit from an archive, e.g. a directory with model-files (/opdata ?, /starc/DNMI_HIRLAM4) or a database (WDB). ===== What Grindex should do ===== General indexer should be an indexer API with user-configurable changeable index-types. It should enable us to extract a set of information from a certain data-source in a fast way. Grindex should be API driven and easily to integrate with diana and Fimex. ==== Exampel input types ==== * filter on model * filter on referenceTime (from/to) * additinal restrictions * all given as ascii-strings (boost::regex?) * extra input required for (datatype (grib,felt,nc,wdb), config-file for input) ==== Example data sources ==== - catalog (ftp, filesystem, http), filenames (patterns) - fimex enabled files (parameters in fimex) - wdb - CSW (long term) ==== Expected return values ==== - count of results - fimex vector> - dump to netcdf-files (via fimex/NetCDFWriter) for testing (command-line tool) ==== Concrete example of input ==== |DNMI_HIRLAM4/2010/09/15/grdqh00.dat_20100915 | HIRLAM4 = model, 2010/09/15/00 = reference time | |arctic_mfc-b2010091500-f2010091506.nc | arctic_mfc = model, 2010091500 = reference time | |/opdata/hirlam4/grdqh00.dat| hirlam4 = model, reference time from data content | |/starc/ ==== API outline ==== str searchCriteria = "MODEL,REFERANCETIME,FILENAMEMATCH:"*=MODEL/YY/MM/DD/grdqhHH.dat_*"'; gr = new Grindex(uri, string searchCriteria, dataformat, config) str searchDSL = "model=*;refernceTime < 2007-08-09' GrindexFind found = gr->find(str searchDSL) size_t count = found->count() vector > = found->cdmReaders(); ===== Fimex input from WDB after filtering ===== (Maybe this should be a part of fimex, but very related to Grindex?) - dataprovider - shape-name (grid-information, proj-string (also on latlong), proj-units (required for m), axes in m/degree) - ref-time - valid-time (from, to) [bounds] - parameter-name (no convention yet) - level-names (not level-numbers) - level (from-to) (no level2-numbers) - dataversion (eps primarily, different version of same (new model-run, same ref-time)) - referanse to field