This is an old revision of the document!
Introducing two levels in the DataSet table
Minutes from discussion between Heiko and Egil 2009-02-17.
Background
A dataset imported to the database through the UPLOAD module represents a set of files. For the time being, each dataset is represented by one record in the DataSet table. To be able to search on individual files, the DataSet table should be extended to encompass records for each individual file. This gives a natural way to divide the records in the DataSet table into two levels. A record on the highest level represents one whole set of files (i.e. a directory), while a record on the lowest level represents a single file.
Implementaion
Database
The 'DS_parent' field in the DataSet table can be used to link a file record to its corresponding directory record (file.DS_parent == directory.DS_id). To distinguish a file record from a directory record, DS_parent == 0 for all directory records.
XML files
Each dataset corresponds to a pair of XML files (.xmd and .xml). This convention will be continued. To distinguish between directory datasets and file datasets, a new directory level in the file system is introduced. The current convention uses one directory for each application connected to the same database:
.../XML/APP1 .../XML/APP2 etc.
In the new convention a new directory level will be introduced beneath each of these application directories. For example, for one of these application directories, the structure will be like this:
.../XML/APP1 name1.xmd name1.xml name2.xmd name2.xml ... .../XML/APP1/name1 name1_xxx.xml name1_xxx.xmd name1_yyy.xml name1_yyy.xmd .../XML/APP1/name2 ...
These files are created by extracting metadata from netCDF files like name1_xxx.nc, name1_yyy.nc etc. The xml/xmd files on the highest level will remain exactly as they are today. The xml/xmd files on the lowest level will each represent one netCDF file only. The format of these files will be the same as those describing the directory level datasets. (This format is described on).