Minutes of meetings between Heiko and Egil, 2009-12-03 and 2009-12-04.
The Osisaf-project wishes to use Metamod for search and serve of data as http or opendap, but the most important users of Osisaf want to have data in well-defined places on a ftp-server. To avoid data-duplication, it is therefore important that Metamod leaves full control about data-placement to the data-production step. Metamod should not move any Osisaf-data, or set requirements to data-names.
A independent upload task will be written. This task will be triggered based upon an url which is:
Calling this url will extract the metadata from the files named by
filename and add the metadata to the
dirkey is used as a password and needs to be set up when setting up the
dataset. The data-files need be uploaded to the final data-store (ftp-server) before triggering that task. This task will only harvest metadata, but not move any data.
Comments by — Egil Støren 2009/12/08 14:34: This solution (with no restrictions on file names belonging to a given dataset) gives us some problems regarding THREDDS catalogs. Up to now, we have used the datasetScan element in the THREDDS catalog to create a dataset entry within THREDDS corresponding to all files in a directory. We can continue to use this method if all files in a dataset belongs to the same directory. Then the consequences will not be wery difficult to cope with: We can contunue to use the same method to create THREDDS catalogs, and few changes need to be done in the web interface for creating new datasets.
On the other hand, if several datasets are to share the same directory, we have to do major changes both in THREDDS catalog generation and in the web interface for creating new datasets. If files in the same dataset are allowed to be situated in different directories, the situation will be even worse.
So my question is: Can we assume a one to one correspondence between dataset and lowest level directory in the data repository?
If we can not assume this correspondence, the next best alternative will be, for each dataset, to identify a regular expression (or wildcard expression) that will correspond to all files in the dataset. This expression would have to be entered by the data provider in the web interface when creating the dataset. This expression could then be used in the THREDDS catalog as a filter element inside the datasetScan element.
Comment by — Heiko Klein 2009/12/09 08:32: That are in fact two questions:
I think question 1) should be made a requirement. If we allow to scan subdirectories for additional use by the data-provider, this will not be very hard to achieve. Putting data which belongs together in the same directory is generally considered 'best practise'.
Question 2) will not be true. The southern-hemisphere and northern-hemisphere files of the ice-products will (most likely) reside in one directory, but Metamod needs to have two different datasets to allow for efficient geographical search and display. I think it is not a big problem, that a users sees a directory with both northern and southern hemisphere data after he is redirected from Metamod to thredds.
We can think about a solution, which creates virtual directories in thredds by writing a explicit mapping between a dataset and a file, without using a dataset-scan. This solution is implemented in our thredds server for the MyOcean data, which puts all data-files into one directory and maps the files to two virtual directories, one for weekly bulletins and one for best estimates. Since the main usage of the osisaf-files are download by ftp, I think it will be easier to keep the real directory structure instead of having one (real) directory structure for ftp and another (virtual) for thredds.
Tentative date: 2010-01-15