User Tools

Site Tools


metamod:minutes2008_10_23

How to manage different input metadata formats?

Discussion between Heiko and Egil 2008-10-23

Metadata-Storage

We have currently two places to store metadata:

  1. XML-files: These are the files as received from the (meta-)data provider in the original metadata-standard. Currently, providers are: digest_nc.pl (reading from nc-cf-1.0 files), quest, oai-pmh harvest
  2. SQL-database: The SQL database keeps a normalized and indexed view of the XML files. The SQL-database has a known set of supported metadata-names, e.g. institution, variable, datacollection_period, abstract. Those can be found in the table MetadataType. The normalized metadata in the SQL-database can be searched through the search-module, and can be exported to other formats in the oai-pmh module (currently, conversion to DIF).

Problem description

Currently we receive metadata as attributes in netCDF files and from forms in the quest module. In the near future, we will also recieve metadata as DIF XML from the harvest module. We can also expect other XML metadata formats (e.g. WMO XML profile). This situation raise three conserns:

  1. We need a general way to manage different metadata formats as input to the database.
  2. We need to transform the received metadata so that the same information is stored in the SQL database in the same way. We do not want to miss datasets when we search the database only because the search items were tagged differently in the source XML files. So we need to normalize the metadata.
  3. We also need to keep the matadata as we received them in their original XML formats.

Interaction with the different modules

Here is the state as planned for Metamod 2.1, not everything exists yet!

  • search: read from database PHP
  • base: read XML-files, normalize, write to SQL-database (import_dataset.pl) Perl
  • quest: write to XML-files, read old parameters from SQL-database this will change the metadata-format to our internal format PHP
  • upload: write to XML-files (digest_nc.pl) Perl, eventually edit metadata via quest
  • pmh: read from SQL-database PHP
  • harvest: write to XML-files PHP

Outstanding problems

  • writing to SQL-database (base: import_dataset.pl) is asynchronus (once per hour) - this is required due to possible ftp-uploads
  • we don't keep track of changes to XML-files (history required?) (connected to previous)
  • pmh might translate metadata twice (once during harvest, once during output) - possible loss of information

Possible solution

  • store all XML files in a blob in the database, including a history
  • upload of XML-files to SQL-database including normalization should be automatically triggered by web-interface (base:import_dataset.pl). Asynchronous reading only required by (upload:digest_nc.pl).
  • pmh: output original XML-files if requested metadata-standard = original metadata-standard
metamod/minutes2008_10_23.txt · Last modified: 2008-11-04 08:17:08 by heikok