How to manage different input metadata formats?

Discussion between Heiko and Egil 2008-10-23

We have currently two places to store metadata:

  1. XML-files: These are the files as received from the (meta-)data provider in the original metadata-standard. Currently, providers are: digest_nc.pl (reading from nc-cf-1.0 files), quest, oai-pmh harvest
  2. SQL-database: The SQL database keeps a normalized and indexed view of the XML files. The SQL-database has a known set of supported metadata-names, e.g. institution, variable, datacollection_period, abstract. Those can be found in the table MetadataType. The normalized metadata in the SQL-database can be searched through the search-module, and can be exported to other formats in the oai-pmh module (currently, conversion to DIF).

Currently we receive metadata as attributes in netCDF files and from forms in the quest module. In the near future, we will also recieve metadata as DIF XML from the harvest module. We can also expect other XML metadata formats (e.g. WMO XML profile). This situation raise three conserns:

  1. We need a general way to manage different metadata formats as input to the database.
  2. We need to transform the received metadata so that the same information is stored in the SQL database in the same way. We do not want to miss datasets when we search the database only because the search items were tagged differently in the source XML files. So we need to normalize the metadata.
  3. We also need to keep the matadata as we received them in their original XML formats.

Here is the state as planned for Metamod 2.1, not everything exists yet!

  • search: read from database PHP
  • base: read XML-files, normalize, write to SQL-database (import_dataset.pl) Perl
  • quest: write to XML-files, read old parameters from SQL-database this will change the metadata-format to our internal format PHP
  • upload: write to XML-files (digest_nc.pl) Perl, eventually edit metadata via quest
  • pmh: read from SQL-database PHP
  • harvest: write to XML-files PHP
  • writing to SQL-database (base: import_dataset.pl) is asynchronus (once per hour) - this is required due to possible ftp-uploads
  • we don't keep track of changes to XML-files (history required?) (connected to previous)
  • pmh might translate metadata twice (once during harvest, once during output) - possible loss of information
  • store all XML files in a blob in the database, including a history
  • upload of XML-files to SQL-database including normalization should be automatically triggered by web-interface (base:import_dataset.pl). Asynchronous reading only required by (upload:digest_nc.pl).
  • pmh: output original XML-files if requested metadata-standard = original metadata-standard
This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies
  • metamod/minutes2008_10_23.txt
  • Last modified: 2022-05-31 09:29:32
  • (external edit)