====== How to manage different input metadata formats? ====== Discussion between Heiko and Egil 2008-10-23 ===== Metadata-Storage ===== We have currently two places to store metadata: - XML-files: These are the files as received from the (meta-)data provider in the original metadata-standard. Currently, providers are: //digest_nc.pl (reading from nc-cf-1.0 files), quest, oai-pmh harvest// - SQL-database: The SQL database keeps a normalized and indexed view of the XML files. The SQL-database has a known set of supported metadata-names, e.g. //institution, variable, datacollection_period, abstract//. Those can be found in the table //MetadataType//. The normalized metadata in the SQL-database can be searched through the search-module, and can be exported to other formats in the //oai-pmh// module (currently, conversion to DIF). ==== Problem description ==== Currently we receive metadata as attributes in netCDF files and from forms in the //quest// module. In the near future, we will also recieve metadata as DIF XML from the //harvest// module. We can also expect other XML metadata formats (e.g. WMO XML profile). This situation raise three conserns: - We need a general way to manage different metadata formats as input to the database. - We need to transform the received metadata so that the same information is stored in the SQL database in the same way. We do not want to miss datasets when we search the database only because the search items were tagged differently in the source XML files. So we need to normalize the metadata. - We also need to keep the matadata as we received them in their original XML formats. ==== Interaction with the different modules ==== Here is the state as planned for Metamod 2.1, not everything exists yet! * search: read from database //PHP// * base: read XML-files, **normalize**, write to SQL-database (import_dataset.pl) //Perl// * quest: write to XML-files, read old parameters from SQL-database **this will change the metadata-format to our internal format** //PHP// * upload: write to XML-files (digest_nc.pl) //Perl//, eventually edit metadata via //quest// * pmh: read from SQL-database //PHP// * harvest: write to XML-files //PHP// ==== Outstanding problems ==== * writing to SQL-database (base: import_dataset.pl) is asynchronus (once per hour) - this is required due to possible ftp-uploads * we don't keep track of changes to XML-files (history required?) (connected to previous) * pmh might translate metadata twice (once during harvest, once during output) - possible loss of information ==== Possible solution ==== * store all XML files in a blob in the database, including a history * upload of XML-files to SQL-database including normalization should be automatically triggered by web-interface (base:import_dataset.pl). Asynchronous reading only required by (upload:digest_nc.pl). * pmh: output original XML-files if requested metadata-standard = original metadata-standard