User Tools

Site Tools


metamod:two_levels

Introducing two levels in the DataSet table

Minutes from discussion between Heiko and Egil 2009-02-17.

Background

A dataset imported to the database through the UPLOAD module represents a set of files. For the time being, each dataset is represented by one record in the DataSet table. To be able to search on individual files, the DataSet table should be extended to encompass records for each individual file. This gives a natural way to divide the records in the DataSet table into two levels. A record on the highest level represents one whole set of files (i.e. a directory), while a record on the lowest level represents a single file.

Implementaion

Database

The 'DS_parent' field in the DataSet table can be used to link a file record to its corresponding directory record (file.DS_parent == directory.DS_id). To distinguish a file record from a directory record, DS_parent == 0 for all directory records.

XML files

Each dataset corresponds to a pair of XML files (.xmd and .xml). This convention will be continued. To distinguish between directory datasets and file datasets, a new directory level in the file system is introduced. The current convention uses one directory for each application connected to the same database:

.../XML/APP1
.../XML/APP2
etc.

In the new convention a new directory level will be introduced beneath each of these application directories. For example, for one of these application directories, the structure will be like this:

.../XML/APP1
         name1.xmd
         name1.xml
         name2.xmd
         name2.xml
         ...
.../XML/APP1/name1
         name1_xxx.xml
         name1_xxx.xmd
         name1_yyy.xml
         name1_yyy.xmd
.../XML/APP1/name2
         ...

These files are created by extracting metadata from netCDF files like name1_xxx.nc, name1_yyy.nc etc. The xml/xmd files on the highest level will remain exactly as they are today. The xml/xmd files on the lowest level will each represent one netCDF file only. The format of these files will be the same as those describing the directory level datasets. (This format is described on XML format for dataset descriptions).

The dataset ID field (xpath in .xmd file: dataset/info@name) will be now restricted to match one or two / characters, one / meaning parent (APPLICATION/DIRECTORY), two / meaning file (APPLICATION/DIRECTORY/FILE).

Updating from Metamod 2.1 to Metamod 2.2 (adding file-metadata)

The current trunk of Metamod, which will lead to Metamod 2.2, extracts now both levels of metadata from fresh uploaded files. The file-metadata from older netcdf-files stored on the same machine in an OpENDAP enabled catalogue can be generated using the program createChildDatasets.pl, which is installed in the script directory and should be used as:

createChildDatasets.pl /outputdir

The outputdir will contain all second level metadata. After manual verification it should be copied over to metadata-directory of the project.

SEARCH module

The main functionality of the search user interface will be kept. Searching will initially be limited to the highest (directory) level. When the search results are displayed, a small button [+] will be shown for each directory dataset that has corresponding files. When pushing this [+] button, the search result will be expanded with one row for each file record that match the search criteria. (Possibly we should also have a button that let the user see all the files belonging to that directory). Each row with file information includes a link directly to the THREDDS page for that file. The subtable with file information should have a separate heading row customized to the fields that vary between the file records.

metamod/two_levels.txt · Last modified: 2009-02-24 10:01:06 by heikok