Bibliothèque nationale de France ]

containerMD : implementation guidelines and examples

Recommended implementations

containerMD has been designed for three possible uses listed below. Whenever used in conjunction with a more general metadata scheme, recommended best practice is to use these schemas to declare generic information and use containerMD only for container format specific fields.

Use in conjunction with PREMIS

Recommended best practice is to use PREMIS semantic units for all information that is not specific to container files, and containerMD for all specific fields. In practice, this means using a ‹premis:object› of type "file" to express core information about the container file, and to use containerMD inside an ‹premis:objectCharacteristicsExtension› to express container-specific fields. These specific fields are ‹entriesInformation› (non verbose mode) and extension containers. We intend to add a mapping table between containerMD fields and PREMIS semantic units very soon. Here is an example of PREMIS used for an ARC file:

Use in conjunction with METS

Recommended best practice is to use containerMD as an extension schema in a ‹mets:techMD› element. It is also recommended to use containerMD in conjunction with ‹premis:object›, according to the best practices mentioned above. This can be implemented either as parallel ‹mets:techMD› sections referred to from the same ‹mets:file›, or with a single ‹mets:techMD› containing a ‹premis:object› element itself containing a ‹containerMD› section in a ‹premis:objectCharacteristicsExtension› element.

For verbose descriptions, that is, description of all the container and content files, you only need containerMD for its container-specific extensions. METS and PREMIS can manage the whole-part relationship, and PREMIS will be able to manage all core metadata about each file (container or contained).

For non-verbose descriptions, containerMD ‹entriesInformation› can also be used for its aggregation mechanisms.

Here is an example of a METS file describing a gzipped arc file. Most of the choices comply with the "Using PREMIS with METS" recommendations with the following additional choices:
  • The METS transformFile feature is used so that we can link descriptions to a particular state of the file: the GZ or the ARC file.
  • A separate premis:object from each compositionLevel. This allows to bind closely a particular composition level (GZ or ARC container) with its own file analysis processes.
  • There is one characterization event for the GZ file, and one for the ARC file.
Note that this example is an example of a possible use. Different choices can be made on a particular implementation (and they are actually different at BnF).

Use as a standalone file

This approach is not recommended unless containerMD is used as an output format of a tool. This is the only implementation where it is relevant to store provenance information inside containerMD (assessmentInformation element).

containerMD for ARC files

ARC files contain ARC records that in turn contain files in their payload, generally harvested on the web. Users may want to describe the harvested files and/or the ARC records: they can do it by using, in the first case, either ‹entriesInformation› or ‹entry› elements, and in the second case, either ‹ARCEntries› or ‹ARCEntry› in a ‹entriesExtension› or ‹entryExtension› element, depending on whether they describe the content of a container file in a verbose or non-verbose mode.

For example, if an ARC file contains an HTML file, the generic ‹entry› element will express information about the HTML file itself; on the opposite, the ‹ARCEntry› extension will express information about the ARC record itself, e.g. information about the protocol and response code for the harvested HTML file.

containerMD for WARC files

The use of containerMD for WARC files follows the same rules as those applying to ARC files, but defined in addition the following principles:

  • The ‹WARCExtension› at the container level was deprecated, because information specific to the WARC file were preferably recorded in the ‹WARCEntries› element describing the warcinfo record.
  • Subtypes of ‹WARCEntries› were defined, corresponding to each of the 6 WARC records used at BnF (warcinfo, response, resource, request, metadata, and revisit; the conversion and continuation types are not used thus not defined). Each of these subelements share common characteristics but also bear, in addition and for some of hem, specific elements to describe the peculiarities of their content.

Also note that the verbose mode was not investigated for WARC files, therefore the ‹WARCEntry› element is not completely defined. Please feel free to contact us if you would want to further elaborate this element for your own needs.

containerMD for disk image files

Upon Harvard Libraries' request, the containerMD metadata schema was extended in 2020 to address the need for describing a disk image file, and in particular the media from which it was created (media type, manufacturer, serial number and capacity) and its file system.

All information needed were defined in a ‹diskImageContainer› element, which contains ‹diskImageType› (values: "logical", "physical" or "unspecified"), ‹sourceMedia› and ‹fileSystems› elements.

Last updated: 2020, september 6th