This is an old revision of the document!
Minutes 2008-10-23
Heiko, Egil
Metadata-Storage
We have currently to places to store metadata:
- XML-files: These are the files as received from the (meta-)data provider in the original metadata-standard. Currently, providers are: digest_nc.pl (reading from nc-cf-1.0 files), quest, oai-pmh harvest
- SQL-database: The SQL database keeps a normalized and indexed view of the XML files. The SQL-database has a known set of supported metadata-names, e.g. institution, variable, datacollection_period, abstract. Those can be found in the table MetadataType. The normalized metadata in the SQL-database can be searched through the search-module, and can be exported to other formats in the oai-pmh module (currently, conversion to DIF).
Interaction with the different modules
Here is the state as planned for Metamod 2.1, not everything exists yet!
- search: read from database
- base: read XML-files, normalize, write to SQL-database (import_dataset.pl)
- quest: write to XML-files, read old parameters from SQL-database this will change the metadata-format to our internal format
- upload: write to XML-files (digest_nc.pl), eventually edit metadata via quest
- pmh: read from SQL-database
- harvest: write to XML-files
Outstanding problems
- writing to SQL-database (base: import_dataset.pl) is asynchronus (once per hour) - this is required due to possible ftp-uploads
- we don't keep track of changes to XML-files (history required?) (connected to previous)
- pmh might translate metadata twice (once during harvest, once during output) - possible loss of information
Possible solution
- store all XML files in a blob in the database, including a history
- upload of XML-files to SQL-database including normalization should be automatically triggered by web-interface (base:import_dataset.pl). Asynchronous reading only required by (upload:digest_nc.pl).
- pmh: output original XML-files if requested metadata-standard = original metadata-standard