2. HDF5 format

2.1. Introduction

Like many scientific data formats (CGNS, MED, SILO), Amelet HDF is based upon HDF5.

HDF5 (http://www.hdfgroup.org/HDF5) is a very flexible file format, that is developed by the hdfgroup.

According to the web page:
HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.

The main features of HDF5 are :

  • The data model can represent very complex data objects
  • A portable data format
  • A library that runs on all platform and implements a high-level API with C, C++, Fortran and Java interfaces
  • Access time and storage optimization
  • Tools for viewing the data collection
  • A complete documentation and set of examples (tests) for all languages

XML would be a really good candidate to express such a data, queries can be performed by technologies like XPath. Unfortunately, Amelet HDF aims at to be scalable, portable and cross language and there is no XML solution for the Fortran world to read/write XML documents.

2.1.1. Editing tools

Furthermore, the hdfgroup provides tools to view or manipulate HDF5 files :

  • hdfview http://www.hdfgroup.org/hdf-java-html/hdfview/ can :
    • view a file
    • create new file
    • modify the content
    • modify attributes
  • gif2h5 - Converts a GIF file into HDF5
  • h5import - Imports ASCII or binary data into HDF5
  • h5diff - Compares two HDF5 files and reports the differences
  • h5repack - Copies an HDF5 file to a new file with or without compression/chunking
  • h52gif - Converts an HDF5 file into GIF
  • h5cc, h5fc, h5c++ - Simplifies compiling an HDF5 application
  • h5dump - Enables the user to examine the contents of an HDF5 file and dump those contents to an ASCII file
  • h5jam/h5unjam - Add/Remove text to/from user block at the beginning of an HDF5 file.
  • h5ls - Lists selected information about file objects in the specified format
  • h5repart - Repartitions a file or family of files
  • h5copy - Copies objects to a new HDF5 file
  • h5mkgrp - Makes a group in an HDF5 file
  • h5stat - Displays object and metadata information for an HDF5 file

The python world offers a very good editing tools of HDF5 documents :

  • h5py : “The HDF5 library is a versatile, mature library designed for the storage of numerical data. The h5py package provides a simple, Pythonic interface to HDF5. A straightforward high-level interface allows the manipulation of HDF5 files, groups and datasets using established Python and NumPy metaphors.” ( http://h5py.alfven.org/ )
  • pytables is python module to handle HDF5 format as pyh5 does ( http://www.pytables.org/moin )
  • vitables (http://vitables.berlios.de) is based upon the python pytables module (http://www.pytables.org/moin), is a graphical interface to pytables

Languages for technical computing have also some HDF5 capabilities :

  • Matlab provides capabilities to read/write HDF5 with the functions hdf5info, hdf5read and hdf5write.

2.1.2. Data organization

An HDF5 file is hierarchicaly organized like a file system (there are directories and files), the main kinds of objects are :

  • Group. It looks like a directory in a file system. It can contain other objects.
  • Dataset. It represents a multi-dimension typed matrix an is contained in a group as a file is contained by a directory in a file system.
  • Table. It is a special dataset and represents multi-column data.

Each object is located by an absolute or relative path from the root node or from another node.

Each object can be described by attributes, an attribute is a pair key, value. The value of an attribute can be one of all HDF5 supported types : integer, real, boolean, string.

A file can then be represented by a tree structure like directories and files in a file system explorer tool. Group are directories and datasets (and tables) are files :

data.h5/
|-- dataset1[@type=a_type]
|-- table1
|-- group1
|   |-- dataset2
|   |-- dataset3
|   |-- table2
|   |-- group2
|   |   |-- table3
|   |   `-- table4
|   `-- table5
|-- table2
`-- dataset3

The h5 extension is often associated to HDF5 files. Elements are localized by their absolute path from the root or by their relative path from the parent group, for instance :

  • /group1/group2/table3 is a valid absolute path to reach table3 in group2 in group1
  • group2/table3 is a valid relative path to reach table3 from /group1

Therefore, two elements can have the same name if they have not the same parent, /dataset3 and /group1/dataset3 can coexist in an HDF5 file.

In this document, attributes of HDF5 elements are represented like XML attributes, they are preceding by @ and they are all inside square brackets, no quotes are used around the value.

All HDF5 examples can be opened with hdfview (version 2.4), the preceding example opened with it is presented below :

_images/hdfview_hdf5h5.png

HDFView main window

2.2. HDF5 modules

There are two versions of HDF5 in production :

  • the version 1.6, the last release is 1.6.4
  • the new version 1.8, the last release is 1.8.2. The main feature that comes with the version 1.8 is the Lite API : “The HDF5 Lite API consists of higher-level functions which do more operations per call than the basic HDF5 interface. The purpose is to wrap intuitive functions around certain sets of features in the existing APIs. This version of the API has two sets of functions: dataset and attribute related functions.”

The HDF5 format can be read and writen from a library that is also developed by the hdfgroup, this library can be downloaded from http://www.hdfgroup.org/HDF5/release/obtain5.html.

Note

Since Amelet HDF specification is dedicated to scientific applications, examples will be given in Fortran language and sometimes in C language.

Amelet HDF can be read almost thanks to the API Lite.

First of all, to manipulate an HDF5 file, the modules which have to be loaded are:

( see example1.f90 )

! The HDF5 API
use hdf5
! The lite API
use h5lt

2.3. Open and close a file

The first step is the initialization of the HDF5 library, then we can open a file :

( see example2.f90 )

! Variable declaration
character(len=*), parameter :: filename = "data.h5"
integer(hid_t) :: file_id
integer :: hdferr

! Library initialization (native type reading)
call h5open_f(hdferr)

! Generally, if hdferr is negative a problem occured
if (hdferr < 0) then
    print *, "h5open_f, KO"
end if

! Open a file
call h5fopen_f(filename, H5F_ACC_RDONLY_F, file_id, hdferr, H5P_DEFAULT_F)
  • H5F_ACC_RDONLY_F is an HDF5 constant indicating the file is opened in the read only mode
  • file_id is the file identifier returned by HDF5
  • hdferr is the error code returned by the function

Note

Take care at the unfamiliar hid_t type of file_id, fortran type kind must be respected

Finaly, close the file:

! Close filename file
call h5fclose_f(file_id, hdferr)

As we can see, in Fortran, the last argument is always hdferr or whatever integer variable. This argument is the return error code of HDF5 functions. If hdferr is negative something went wrong.

It’s a good habit to check ``hdferr`` value., though for the sake of clarity it is last time we perform checks in the examples.

2.4. The HDF5 lite API

Amelet HDF is designed to be easily readable by a person. This legibility is found again at source code level. In order to aid in performing this task, HDF5 provides an API for higher-level functions which do more operations per call than the basic HDF5 interface, therefore it becomes straightforward to walk through an Amelet HDF file.

For instance, it is possible to read the number of records and the number of fields of a table with a single function :

! Table's name
character(len=*), parameter :: table_absolute_name = "/a_table"
! Number of columns (fields) in a table
integer(hsize_t) :: nfields
! Number of rows (records) in a table
integer(hsize_t) :: nrecords
! Error code
integer :: hdferr

call h5tbget_table_info_f(file_id, table_absolute_name, &
                          nfields, nrecords, hdferr)

Amelet HDF can be almost entirely read with the lite API, used functions are presented in the next section.

2.4.1. Query for table’s information

It is possible to get table’s information with the function h5tbget_table_info_f. The function returns :

  • The number of columns (fields) of a table
  • the number of rows (lines) of a table.

(see read-table.f90)

The signature of h5tbget_table_info_f is :

! The parent id
integer(hid_t) :: loc_id
! Table name
character(len=*), parameter :: table_name = "/a_table"
! Number of columns (fields) in a table
integer(hsize_t) :: nfields
! Number of rows (records) in a table
integer(hsize_t) :: nrecords
! Error code
integer :: hdferr

call h5tbget_table_info_f(file_id, table_name, nfields, nrecords, hdferr)

2.4.2. Read the records of a table

Table’s records can be read with the function h5tbread_field_name_f. It takes an already allocated buffer and returns :

  • the buffer containing the read values of the named column

(see read-table.f90)

! The file id
integer(hid_t) :: file_id
! Table name
character(len=*), parameter :: table_name = "/a_table"
! The field's name to be read
character(len=*), parameter :: field_name = "a_field"
! the reading start row
integer(hsize_t) :: start
! Number of read rows
integer(hsize_t) :: nrecords
! The type size
integer(size_t) :: type_size
! If data are real
real, dimension(nrecords) :: data_buffer
! Error code
integer :: hdferr

call h5tbread_field_name_f(file_id, table_name, field_name, &
                           start, nrecords, type_size, data_buffer, hdferr)

2.4.3. Check the presence of an attribute

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "/an_attribute"
! Does attribute exist ?
logical :: attribute_exists
! Error code
integer :: hdferr

call h5aexists_by_name_f(file_id, element_name, attribute_name, &
                         attribute_exists, hdferr, H5P_DEFAULT_F)

2.4.4. Read attribute’s information

The h5ltget_attribute_info_f can read attribute information, it returns :

  • The dimensions of the attribute (an attribute can be an array).
  • The class identifier
  • The size of the datatype in bytes
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Dimensions
integer(hsize_t), dimension(:), allocatable :: dims
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr

call h5ltget_attribute_info_f(file_id, element_name, attribute_name, &
                              dims, type_class, type_size, hdferr)

2.4.5. Read a string attribute

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
character(len=20) :: attribute_value = ""
! Error code
integer :: hdferr

call h5ltget_attribute_string_f(file_id, element_name, attribute_name, &
                                attribute_value, hdferr)

2.4.6. Read an integer attribute

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
integer :: attribute_value
! Error code
integer :: hdferr

call h5ltget_attribute_int_f(file_id, element_name, attribute_name, &
                             attribute_value, hdferr)

2.4.7. Read a float attribute

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
real :: attribute_value
! Error code
integer :: hdferr

call h5ltget_attribute_float_f(file_id, element_name, attribute_name, &
                               attribute_value, hdferr)

2.4.8. Read a double attribute

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
double precision :: attribute_value
! Error code
integer :: hdferr

call h5ltget_attribute_double_f(file_id, element_name, attribute_name, &
                                attribute_value, hdferr)

2.4.9. Read a dataset’s information

The function h5ltget_dataset_info_f read dataset’s information, it returns :

  • The dimensions of the dataset
  • The class identifier
  • The size of the datatype in bytes
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Dimensions
integer(hsize_t), dimension(*) :: dims
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr

call h5ltget_dataset_info_f(file_id, element_name, &
                            dims, type_class, type_size, hdferr)

2.4.10. Read a float dataset

Dataset’s values can be read with the function h5ltread_dataset_float_f , the data buffer memory must be allocated before the call.

! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Dimensions
integer(hsize_t), dimension(*) :: dims
! Dateset values
real, dimension(dims) :: dataset_value
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr

call h5ltread_dataset_float_f(file_id, element_name, &
                              dataset_value, dims, hdferr)

2.4.11. Inquire if a dataset exists

h5ltfind_dataset_f inquires if a dataset exist. It returns 1 if the dataset exists and returns 0 otherwise.

! file or group identifier
integer(hid_t), intent(in) :: loc_id
! name of the dataset
character(len=*), parameter :: dataset_name = "/an_element"
! error code
integer :: hdferr

result = h5ltfind_dataset_f(loc_id, dataset_name, hdferr)

2.4.12. Groups functions

In addition, some querry functions about groups are used.

Read the number of members in a group :

! file or group identifier
integer(hid_t) :: loc_id
! name of the group
character(len=*), parameter :: group_name = "/an_element"
! number of members in the group
integer :: nmembers
! error code
integer :: hdferr

call h5gn_members_f(loc_id, group_name, nmembers, hdferr)

Read the name of the members of a group :

! File or group identifier
integer(hid_t) :: loc_id
! Name of the group
character(len=*), parameter :: element_name = "an_element"
! Index of the member
integer :: index
! Name of the member
character(len=*), parameter :: member_name = "an_attribute"
! Possible member types
! H5G_LINK_F if member is a link
! H5G_GROUP_F if member is a group
! H5G_DATASET_F if member is a dataset
! H5G_TYPE_F if member is a type
integer :: member_type
! Error code
integer :: hdferr

call h5gget_obj_info_idx_f(file_id, group_name, index, &
                           member_name, member_type, hdferr)

2.5. Integers and reals

By default in Amelet HDF, all integers are 32bit integers.

As for as the reals, Amelet HDF objects definition doesn’t require reals written on more than 32bits. So by default, all reals are 32bit floats and complex are 2x32bit complex (see The complex type).

Longer reals can be used in arraySet (see ArraySet) to take into account the precision of computed numerical data.

Practically, HDFView and h5dump show the data type, it is useful to check when writting data in Amelet HDF format.

2.6. The complex type

Natively HDF5 does not propose the complex number type. However it offers a very powerful mechanism to create our own type.

There are two ways to organize a complex number :

  • as an array of two elements : A(dim=2) = (r, i) is a two element array and A(0) (A(1) in Fortran) is the real part and A(1) (A(2) in Fortran) is the imaginary part.
  • as a dictionary with two key/value pairs with A(“r”) = r and A(“i”) = i.

Amelet HDF uses the compound approach, although it is not the simpliest formulation cause it is not accessible from the API lite, it is the strategy followed by some other tools like octave or pytables.

That is to say a complex number is always a compound datatype of two element (r, i).

2.6.1. Read a complex type

Even if a complex type is a compound structure, the too real or double numbers are written as if they were two consecutive elements of an array :

! File or group identifier
integer(hid_t) :: loc_id
! Name of the group
character(len=*), parameter :: element_name = "/an_element"
! The complex attribute is a 2 elements array
real, dimension(2) :: complex_attribute = (/0.0, 0.0/)
! Error code
integer :: hdferr

call h5ltget_attribute_float_f(loc_id, element_name, "complex_attribute", &
                               complex_attribute, hdferr)
print *, "\nComplex attribute value :", complex_attribute

complex_attribute is defined as a two real element array. The h5ltget_attribute_float_f function fills in the array with the r field and i filed of the compound complex attribute structure. Therefore, complex_attribute(1) equals r and complex_attribute(2) equals i.

2.7. Table and Dataset

We have seen HDF5 defines tables and datasets. A dataset is a multidimensional matrix, each cell contains data of the same nature (integer, float, ....). A table is like a spreadsheet, it has many columns which can contain different nature data.

In Amelet HDF, datasets are used by default when the data’s nature are identical even if data can be seen by column.

For example, consider the data structure (name, path), a list of (name, path) can be written with two columns :

name path
$name1 $path1
$name2 $path2
$name3 $path3

Tables are presented with column headers as HDFView does.

One would create an HDF5 two string column table but since the two columns contain string.

Warning

Amelet HDF has made the choice to use a (n x 2) dataset

$name1 $path1
$name2 $path2
$name3 $path3

Note

In fact, a table is a (n x 1) dataset with a compound datatype.