Table of Contents

Gridded data indexer (Grindex)

Results from discussion in Bergen, 08.09.2010

Background

Diana starts now reading gridded data via Fimex. Fimex provides access to a information unit, i.e. a model run, a netcdf-file. Fimex does not provide the functionality to efficiently extract the information unit from an archive, e.g. a directory with model-files (/opdata ?, /starc/DNMI_HIRLAM4) or a database (WDB).

What Grindex should do

General indexer should be an indexer API with user-configurable changeable index-types. It should enable us to extract a set of information from a certain data-source in a fast way. Grindex should be API driven and easily to integrate with diana and Fimex.

Exampel input types

Example data sources

  1. catalog (ftp, filesystem, http), filenames (patterns)
  2. fimex enabled files (parameters in fimex)
  3. wdb
  4. CSW (long term)

Expected return values

  1. count of results
  2. fimex vector<boost::shared_ptr<CDMReader»
  3. dump to netcdf-files (via fimex/NetCDFWriter) for testing (command-line tool)

Concrete example of input

DNMI_HIRLAM4/2010/09/15/grdqh00.dat_20100915 HIRLAM4 = model, 2010/09/15/00 = reference time
arctic_mfc-b2010091500-f2010091506.nc arctic_mfc = model, 2010091500 = reference time
/opdata/hirlam4/grdqh00.dat hirlam4 = model, reference time from data content

API outline

str searchCriteria = "MODEL,REFERANCETIME,FILENAMEMATCH:"*=MODEL/YY/MM/DD/grdqhHH.dat_*"';

gr = new Grindex(uri, string searchCriteria, dataformat, config)

str searchDSL = "model=*;refernceTime < 2007-08-09'
GrindexFind found = gr->find(str searchDSL)

size_t count = found->count()
vector<boost::shared_ptr<CDMReader> > = found->cdmReaders();

Fimex input from WDB after filtering

(Maybe this should be a part of fimex, but very related to Grindex?)

  1. dataprovider
  2. shape-name (grid-information, proj-string (also on latlong), proj-units (required for m), axes in m/degree)
  3. ref-time
  4. valid-time (from, to) [bounds]
  5. parameter-name (no convention yet)
  6. level-names (not level-numbers)
  7. level (from-to) (no level2-numbers)
  8. dataversion (eps primarily, different version of same (new model-run, same ref-time))
  9. referanse to field