====== Introducing two levels in the DataSet table ======

Minutes from discussion between Heiko and Egil 2009-02-17.

===== Background =====
A dataset imported to the database through the UPLOAD module represents a set of files. For the time being, each dataset is represented by one record in the DataSet table. To be able to search on individual files, the DataSet table should be extended to encompass records for each individual file. This gives a natural way to divide the records in the DataSet table into two levels. A record on the highest level represents one whole set of files (i.e. a directory), while a record on the lowest level represents a single file. 

===== Implementaion =====

==== Database ====

The 'DS_parent' field in the DataSet table can be used to link a file record to its corresponding directory record (file.DS_parent == directory.DS_id). To distinguish a file record from a directory record, DS_parent == 0 for all directory records.

==== XML files ====

Each dataset corresponds to a pair of XML files (.xmd and .xml). This convention will be continued. To distinguish between directory datasets and file datasets, a new directory level in the file system is introduced. The current convention uses one directory for each application connected to the same database:

  .../XML/APP1
  .../XML/APP2
  etc.

In the new convention a new directory level will be introduced beneath each of these application directories. For example, for one of these application directories, the structure will be like this:

  .../XML/APP1
           name1.xmd
           name1.xml
           name2.xmd
           name2.xml
           ...
  .../XML/APP1/name1
           name1_xxx.xml
           name1_xxx.xmd
           name1_yyy.xml
           name1_yyy.xmd
  .../XML/APP1/name2
           ...

These files are created by extracting metadata from netCDF files like name1_xxx.nc, name1_yyy.nc etc. The xml/xmd files on the highest level will remain exactly as they are today. The xml/xmd files on the lowest level will each represent one netCDF file only. The format of these files will be the same as those describing the directory level datasets. (This format is described on [[xml-format]]).

The dataset ID field (xpath in .xmd file: ''dataset/info@name'') will be now restricted to match one or
two ''/'' characters, one ''/'' meaning parent (''APPLICATION/DIRECTORY''), two ''/'' meaning file (''APPLICATION/DIRECTORY/FILE'').

=== Updating from Metamod 2.1 to Metamod 2.2 (adding file-metadata) ===

The current trunk of Metamod, which will lead to Metamod 2.2, extracts now both levels of metadata
from fresh uploaded files. The file-metadata from older netcdf-files stored on the same machine in an OpENDAP enabled catalogue can be generated using the program ''createChildDatasets.pl'', which is installed in the script directory and should be used as:

  createChildDatasets.pl /outputdir

The outputdir will contain all second level metadata. After manual verification it should be copied over
to metadata-directory of the project.


==== SEARCH module ====
The main functionality of the search user interface will be kept. Searching will initially be limited to the highest (directory) level. When the search results are displayed, a small button [+] will be shown for each directory dataset that has corresponding files. When pushing this [+] button, the search result will be expanded with one row for each file record **//that match the search criteria//**. (Possibly we should also have a button that let the user see **//all//** the files belonging to that directory). Each row with file information includes a link directly to the THREDDS page for that file. The subtable with file information should have a separate heading row customized to the fields that vary between the file records.