Table of Contents

How to manage different input metadata formats?

Discussion between Heiko and Egil 2008-10-23

Metadata-Storage

We have currently two places to store metadata:

  1. XML-files: These are the files as received from the (meta-)data provider in the original metadata-standard. Currently, providers are: digest_nc.pl (reading from nc-cf-1.0 files), quest, oai-pmh harvest
  2. SQL-database: The SQL database keeps a normalized and indexed view of the XML files. The SQL-database has a known set of supported metadata-names, e.g. institution, variable, datacollection_period, abstract. Those can be found in the table MetadataType. The normalized metadata in the SQL-database can be searched through the search-module, and can be exported to other formats in the oai-pmh module (currently, conversion to DIF).

Problem description

Currently we receive metadata as attributes in netCDF files and from forms in the quest module. In the near future, we will also recieve metadata as DIF XML from the harvest module. We can also expect other XML metadata formats (e.g. WMO XML profile). This situation raise three conserns:

  1. We need a general way to manage different metadata formats as input to the database.
  2. We need to transform the received metadata so that the same information is stored in the SQL database in the same way. We do not want to miss datasets when we search the database only because the search items were tagged differently in the source XML files. So we need to normalize the metadata.
  3. We also need to keep the matadata as we received them in their original XML formats.

Interaction with the different modules

Here is the state as planned for Metamod 2.1, not everything exists yet!

Outstanding problems

Possible solution