containerMD : implementation guidelines and examples

The entry / content file: where to draw the line

In containerMD, an entry is the subdivision of a container. As such, it references the contained file itself. Therefore, the ‹format› element references the format of the content file.

If you want to extract information from the content files headers that give additional information about how a particular file should be handled, you can use an ‹extension› element. This has been designed this way because such metadata are tied to a particular format.

For example, if an ARC file contains an HTML file, the generic ‹entry› element will express information about the HTML file itself; on the opposite, the ‹ARCEntry› extension will express information about the ARC record itself, e.g. information about the protocol and response code for the harvested HTML file.

Recommended implementations

containerMD has been designed for three possible uses listed below. Whenever used in conjunction with a more general metadata scheme, recommended best practice is to use these schemas to declare generic information and use containerMD only for container format specific fields.

Use in conjunction with PREMIS

Recommended best practice is to use PREMIS semantic units for all information that is not specific to container files, and containerMD for all specific fields. In practice, this means using a premis:object of type "file" to express core information about the container file, and to use containerMD inside an ‹objectCharacteristicsExtension› to express container-specific fields. These specific fields are ‹entriesInformation› (non verbose mode) and extension containers. We intend to add a mapping table between containerMD fields and PREMIS semantic units very soon. Here is an example of PREMIS used for an ARC file:

The described ARC sample file (gunzip and open with a web browser or a text editor)
Corresponding PREMIS sample with containerMD nested within it

Use in conjunction with METS

Recommended best practice is to use containerMD as an extension schema in a ‹techMD› section. It is also recommended to use containerMD in conjunction with premis:object, according to the best practices mentioned above. This can be implemented either as parallel ‹techMD› sections referred to from the same ‹file›, or with a single "premis:object" ‹techMD› nesting itself a ‹containerMD› section in a ‹premis:objectCharacteristicsExtension›.

For verbose descriptions, that is, description of all the container and content files, you only need containerMD for its container-specific extensions. METS and PREMIS can manage the whole-part relationship, and PREMIS will be able to manage all core metadata about each file (container or contained).

For non-verbose descriptions, containerMD ‹entriesInformation› can also be used for its aggregation mechanisms.

Here is an example of a METS file describing a gzipped arc file. Most of the choices comply with the "Using PREMIS with METS" recommendations with the following additional choices:

The METS transformFile feature is used so that we can link descriptions to a particular state of the file: the GZ or the ARC file.
A separate premis:object from each compositionLevel. This allows to bind closely a particular composition level (GZ or ARC container) with its own file analysis processes.
There is one characterization event for the GZ file, and one for the ARC file.

Note that this example is an example of a possible use. Different choices can be made on a particular implementation (and they are actually different at BnF).

Use as a standalone file

This approach is not recommended unless containerMD is used as an output format of a tool. This is the only implementation where it is relevant to store provenance information inside containerMD (assessmentInformation section).

The described ARC sample file (gunzip and open with a web browser or a text editor)
Corresponding containerMD standalone file

Last updated: 2011, october 5th