HDF5 format =========== Introduction ------------ Like many scientific data formats (CGNS, MED, SILO), |namespec| is based upon HDF5. HDF5 (``_) is a very flexible file format, that is developed by the `hdfgroup `_. According to the web page: "*HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.*" The main features of HDF5 are : * The data model can represent very complex data objects * A portable data format * A library that runs on all platform and implements a high-level API with C, C++, Fortran and Java interfaces * Access time and storage optimization * Tools for viewing the data collection * A complete documentation and set of examples (tests) for all languages XML would be a really good candidate to express such a data, queries can be performed by technologies like XPath. Unfortunately, |namespec| aims at to be scalable, portable and cross language and there is no XML solution for the Fortran world to read/write XML documents. Editing tools ^^^^^^^^^^^^^ Furthermore, the hdfgroup provides tools to view or manipulate HDF5 files : * ``hdfview`` ``_ can : * view a file * create new file * modify the content * modify attributes * ``gif2h5`` - Converts a GIF file into HDF5 * ``h5import`` - Imports ASCII or binary data into HDF5 * ``h5diff`` - Compares two HDF5 files and reports the differences * ``h5repack`` - Copies an HDF5 file to a new file with or without compression/chunking * ``h52gif`` - Converts an HDF5 file into GIF * ``h5cc, h5fc, h5c++`` - Simplifies compiling an HDF5 application * ``h5dump`` - Enables the user to examine the contents of an HDF5 file and dump those contents to an ASCII file * ``h5jam/h5unjam`` - Add/Remove text to/from user block at the beginning of an HDF5 file. * ``h5ls`` - Lists selected information about file objects in the specified format * ``h5repart`` - Repartitions a file or family of files * ``h5copy`` - Copies objects to a new HDF5 file * ``h5mkgrp`` - Makes a group in an HDF5 file * ``h5stat`` - Displays object and metadata information for an HDF5 file The python world offers a very good editing tools of HDF5 documents : * h5py : "*The HDF5 library is a versatile, mature library designed for the storage of numerical data. The h5py package provides a simple, Pythonic interface to HDF5. A straightforward high-level interface allows the manipulation of HDF5 files, groups and datasets using established Python and NumPy metaphors.*" ( http://h5py.alfven.org/ ) * pytables is python module to handle HDF5 format as pyh5 does ( http://www.pytables.org/moin ) * vitables (``_) is based upon the python pytables module (``_), is a graphical interface to pytables Languages for technical computing have also some HDF5 capabilities : * Matlab provides capabilities to read/write HDF5 with the functions ``hdf5info``, ``hdf5read`` and ``hdf5write``. Data organization ^^^^^^^^^^^^^^^^^ An HDF5 file is hierarchicaly organized like a file system (there are directories and files), the main kinds of objects are : * Group. It looks like a directory in a file system. It can contain other objects. * Dataset. It represents a multi-dimension typed matrix an is contained in a group as a file is contained by a directory in a file system. * Table. It is a special dataset and represents multi-column data. Each object is located by an absolute or relative path from the root node or from another node. Each object can be described by attributes, an attribute is a pair key, value. The value of an attribute can be one of all HDF5 supported types : integer, real, boolean, string. A file can then be represented by a tree structure like directories and files in a file system explorer tool. Group are directories and datasets (and tables) are files : :: data.h5/ |-- dataset1[@type=a_type] |-- table1 |-- group1 | |-- dataset2 | |-- dataset3 | |-- table2 | |-- group2 | | |-- table3 | | `-- table4 | `-- table5 |-- table2 `-- dataset3 The h5 extension is often associated to HDF5 files. Elements are localized by their absolute path from the root or by their relative path from the parent group, for instance : * ``/group1/group2/table3`` is a valid absolute path to reach table3 in group2 in group1 * ``group2/table3`` is a valid relative path to reach table3 from ``/group1`` Therefore, two elements can have the same name if they have not the same parent, ``/dataset3`` and ``/group1/dataset3`` can coexist in an HDF5 file. In this document, attributes of HDF5 elements are represented like XML attributes, they are preceding by ``@`` and they are all inside square brackets, no quotes are used around the value. All HDF5 examples can be opened with hdfview (version 2.4), the preceding example opened with it is presented below : .. figure:: images/hdfview_hdf5h5.png :width: 50% :align: center HDFView main window HDF5 modules ------------ There are two versions of HDF5 in production : * the version 1.6, the last release is 1.6.4 * the new version 1.8, the last release is 1.8.2. The main feature that comes with the version 1.8 is the Lite API : "The HDF5 Lite API consists of higher-level functions which do more operations per call than the basic HDF5 interface. The purpose is to wrap intuitive functions around certain sets of features in the existing APIs. This version of the API has two sets of functions: dataset and attribute related functions." The HDF5 format can be read and writen from a library that is also developed by the hdfgroup, this library can be downloaded from ``_. .. note:: Since |namespec| specification is dedicated to scientific applications, examples will be given in Fortran language and sometimes in C language. |namespec| can be read almost thanks to the API Lite. First of all, to manipulate an HDF5 file, the modules which have to be loaded are: ( see example1.f90 ) .. code-block:: fortran ! The HDF5 API use hdf5 ! The lite API use h5lt Open and close a file --------------------- The first step is the initialization of the HDF5 library, then we can open a file : ( see example2.f90 ) .. code-block:: fortran ! Variable declaration character(len=*), parameter :: filename = "data.h5" integer(hid_t) :: file_id integer :: hdferr ! Library initialization (native type reading) call h5open_f(hdferr) ! Generally, if hdferr is negative a problem occured if (hdferr < 0) then print *, "h5open_f, KO" end if ! Open a file call h5fopen_f(filename, H5F_ACC_RDONLY_F, file_id, hdferr, H5P_DEFAULT_F) * ``H5F_ACC_RDONLY_F`` is an HDF5 constant indicating the file is opened in the read only mode * ``file_id`` is the file identifier returned by HDF5 * ``hdferr`` is the error code returned by the function .. note:: Take care at the unfamiliar ``hid_t`` type of ``file_id``, fortran type kind must be respected Finaly, close the file: .. code-block:: fortran ! Close filename file call h5fclose_f(file_id, hdferr) As we can see, in Fortran, the last argument is always ``hdferr`` or whatever integer variable. This argument is the return error code of HDF5 functions. If ``hdferr`` is negative something went wrong. **It's a good habit to check ``hdferr`` value.**, though for the sake of clarity it is last time we perform checks in the examples. The HDF5 lite API ----------------- |namespec| is designed to be easily readable by a person. This legibility is found again at source code level. In order to aid in performing this task, HDF5 provides an API for higher-level functions which do more operations per call than the basic HDF5 interface, therefore it becomes straightforward to walk through an |namespec| file. For instance, it is possible to read the number of records and the number of fields of a table with a single function : .. code-block:: fortran ! Table's name character(len=*), parameter :: table_absolute_name = "/a_table" ! Number of columns (fields) in a table integer(hsize_t) :: nfields ! Number of rows (records) in a table integer(hsize_t) :: nrecords ! Error code integer :: hdferr call h5tbget_table_info_f(file_id, table_absolute_name, & nfields, nrecords, hdferr) |namespec| can be almost entirely read with the lite API, used functions are presented in the next section. Query for table's information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is possible to get table's information with the function ``h5tbget_table_info_f``. The function returns : * The number of columns (fields) of a table * the number of rows (lines) of a table. (see read-table.f90) The signature of ``h5tbget_table_info_f`` is : .. code-block:: fortran ! The parent id integer(hid_t) :: loc_id ! Table name character(len=*), parameter :: table_name = "/a_table" ! Number of columns (fields) in a table integer(hsize_t) :: nfields ! Number of rows (records) in a table integer(hsize_t) :: nrecords ! Error code integer :: hdferr call h5tbget_table_info_f(file_id, table_name, nfields, nrecords, hdferr) Read the records of a table ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Table's records can be read with the function ``h5tbread_field_name_f``. **It takes an already allocated buffer** and returns : * the buffer containing the read values of the named column (see read-table.f90) .. code-block:: fortran ! The file id integer(hid_t) :: file_id ! Table name character(len=*), parameter :: table_name = "/a_table" ! The field's name to be read character(len=*), parameter :: field_name = "a_field" ! the reading start row integer(hsize_t) :: start ! Number of read rows integer(hsize_t) :: nrecords ! The type size integer(size_t) :: type_size ! If data are real real, dimension(nrecords) :: data_buffer ! Error code integer :: hdferr call h5tbread_field_name_f(file_id, table_name, field_name, & start, nrecords, type_size, data_buffer, hdferr) Check the presence of an attribute ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "/an_attribute" ! Does attribute exist ? logical :: attribute_exists ! Error code integer :: hdferr call h5aexists_by_name_f(file_id, element_name, attribute_name, & attribute_exists, hdferr, H5P_DEFAULT_F) Read attribute's information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``h5ltget_attribute_info_f`` can read attribute information, it returns : * The dimensions of the attribute (an attribute can be an array). * The class identifier * The size of the datatype in bytes .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "an_attribute" ! Dimensions integer(hsize_t), dimension(:), allocatable :: dims ! Type class integer :: type_class ! Type size in bytes integer(size_t) :: type_size ! Error code integer :: hdferr call h5ltget_attribute_info_f(file_id, element_name, attribute_name, & dims, type_class, type_size, hdferr) Read a string attribute ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "an_attribute" ! Attribute's value character(len=20) :: attribute_value = "" ! Error code integer :: hdferr call h5ltget_attribute_string_f(file_id, element_name, attribute_name, & attribute_value, hdferr) Read an integer attribute ^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "an_attribute" ! Attribute's value integer :: attribute_value ! Error code integer :: hdferr call h5ltget_attribute_int_f(file_id, element_name, attribute_name, & attribute_value, hdferr) Read a float attribute ^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "an_attribute" ! Attribute's value real :: attribute_value ! Error code integer :: hdferr call h5ltget_attribute_float_f(file_id, element_name, attribute_name, & attribute_value, hdferr) Read a double attribute ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Attribute's name character(len=*), parameter :: attribute_name = "an_attribute" ! Attribute's value double precision :: attribute_value ! Error code integer :: hdferr call h5ltget_attribute_double_f(file_id, element_name, attribute_name, & attribute_value, hdferr) Read a dataset's information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The function ``h5ltget_dataset_info_f`` read dataset's information, it returns : * The dimensions of the dataset * The class identifier * The size of the datatype in bytes .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Dimensions integer(hsize_t), dimension(*) :: dims ! Type class integer :: type_class ! Type size in bytes integer(size_t) :: type_size ! Error code integer :: hdferr call h5ltget_dataset_info_f(file_id, element_name, & dims, type_class, type_size, hdferr) Read a float dataset ^^^^^^^^^^^^^^^^^^^^ Dataset's values can be read with the function ``h5ltread_dataset_float_f`` , the data buffer memory must be allocated before the call. .. code-block:: fortran ! File id integer(hid_t) :: file_id ! Element's name character(len=*), parameter :: element_name = "/an_element" ! Dimensions integer(hsize_t), dimension(*) :: dims ! Dateset values real, dimension(dims) :: dataset_value ! Type class integer :: type_class ! Type size in bytes integer(size_t) :: type_size ! Error code integer :: hdferr call h5ltread_dataset_float_f(file_id, element_name, & dataset_value, dims, hdferr) Inquire if a dataset exists ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``h5ltfind_dataset_f`` inquires if a dataset exist. It returns 1 if the dataset exists and returns 0 otherwise. .. code-block:: fortran ! file or group identifier integer(hid_t), intent(in) :: loc_id ! name of the dataset character(len=*), parameter :: dataset_name = "/an_element" ! error code integer :: hdferr result = h5ltfind_dataset_f(loc_id, dataset_name, hdferr) Groups functions ^^^^^^^^^^^^^^^^ In addition, some querry functions about groups are used. Read the number of members in a group : .. code-block:: fortran ! file or group identifier integer(hid_t) :: loc_id ! name of the group character(len=*), parameter :: group_name = "/an_element" ! number of members in the group integer :: nmembers ! error code integer :: hdferr call h5gn_members_f(loc_id, group_name, nmembers, hdferr) Read the name of the members of a group : .. code-block:: fortran ! File or group identifier integer(hid_t) :: loc_id ! Name of the group character(len=*), parameter :: element_name = "an_element" ! Index of the member integer :: index ! Name of the member character(len=*), parameter :: member_name = "an_attribute" ! Possible member types ! H5G_LINK_F if member is a link ! H5G_GROUP_F if member is a group ! H5G_DATASET_F if member is a dataset ! H5G_TYPE_F if member is a type integer :: member_type ! Error code integer :: hdferr call h5gget_obj_info_idx_f(file_id, group_name, index, & member_name, member_type, hdferr) Integers and reals ------------------ By default in |namespec|, all integers are 32bit integers. As for as the reals, |namespec| objects definition doesn't require reals written on more than 32bits. So by default, all reals are 32bit floats and complex are 2x32bit complex (see :ref:`complextype`). Longer reals can be used in ``arraySet`` (see :ref:`arrayset`) to take into account the precision of computed numerical data. Practically, HDFView and h5dump show the data type, it is useful to check when writting data in |namespec| format. .. _complextype: The complex type ---------------- Natively HDF5 does not propose the complex number type. However it offers a very powerful mechanism to create our own type. There are two ways to organize a complex number : * as an array of two elements : A(dim=2) = (r, i) is a two element array and A(0) (A(1) in Fortran) is the real part and A(1) (A(2) in Fortran) is the imaginary part. * as a dictionary with two key/value pairs with A("r") = r and A("i") = i. |namespec| uses the compound approach, although it is not the simpliest formulation cause it is not accessible from the API lite, it is the strategy followed by some other tools like octave or pytables. That is to say a complex number is always a compound datatype of two element (r, i). Read a complex type ^^^^^^^^^^^^^^^^^^^ Even if a complex type is a compound structure, the too real or double numbers are written as if they were two consecutive elements of an array : .. code-block:: fortran ! File or group identifier integer(hid_t) :: loc_id ! Name of the group character(len=*), parameter :: element_name = "/an_element" ! The complex attribute is a 2 elements array real, dimension(2) :: complex_attribute = (/0.0, 0.0/) ! Error code integer :: hdferr call h5ltget_attribute_float_f(loc_id, element_name, "complex_attribute", & complex_attribute, hdferr) print *, "\nComplex attribute value :", complex_attribute ``complex_attribute`` is defined as a two real element array. The ``h5ltget_attribute_float_f`` function fills in the array with the ``r`` field and ``i`` filed of the compound complex attribute structure. Therefore, ``complex_attribute(1)`` equals ``r`` and ``complex_attribute(2)`` equals ``i``. Table and Dataset ----------------- We have seen HDF5 defines tables and datasets. A dataset is a multidimensional matrix, each cell contains data of the same nature (integer, float, ....). A table is like a spreadsheet, it has many columns which can contain different nature data. In |namespec|, datasets are used by default when the data's nature are identical even if data can be seen by column. For example, consider the data structure (name, path), a list of (name, path) can be written with two columns : ======== ======== name path ======== ======== $name1 $path1 $name2 $path2 $name3 $path3 ======== ======== Tables are presented with column headers as HDFView does. One would create an HDF5 two string column table but since the two columns contain string. .. warning:: |namespec| has made the choice to use a (n x 2) dataset ======== ======== $name1 $path1 $name2 $path2 $name3 $path3 ======== ======== .. note:: In fact, a table is a (n x 1) dataset with a compound datatype.