ETSF File Format
One of the central features of the ETSF electronic-structure and simulation programs is their interoperability. It is essential that the output of one code (typically to calculate the ground state properties of a system) be readable in an automatic way by other codes, which will use it to calculate other derived properties (typically optical, or transport properties). This is more involved than sharing simple physical quantities (atomic positions, energy) because of the sheer volume of data which must be handed over. In simple cases with a few atoms, the wavefunctions and related quantities will occupy several megabytes of disk space. In more extended or complex calculations this can quickly reach gigabytes. The efficient passing of data implies that it must be in a binary format, but it must also be platform-independent.
The ETSF has developed standard file formats to allow better integration and better interoperability between codes; both those developed internally in the ETSF and any others: the format and a suite of libraries for file access are completely open source. The standardization philosophy is based as much as possible on existing file formats, such as XML or NetCDF. We focus on the capability to read or write these formats on a wide variety of platforms and using different programming languages (in particular Fortran 90, C, and Python). Backward compatibility is also a strong concern.
ETSF specifications for Input/Output of wavefunctions and related data
In order to allow software to inter-operate efficiently and exchange data, file format specifications are mandatory. Widely used file format specifications are still lacking in the field of first-principles calculations of material properties. One of the objectives of the ETSF is precisely to specify such file formats, for content that is relevant to our scientific activity in theoretical spectroscopy (wavefunctions, crystallographic data, densities and potentials, etc...).
The ETSF I/O specifications include the detailed NetCDF description of the fields and content of a valid ETSF-IO file. These specifications have been first devised during the Nanoquanta Network Of Excellence, and continuously enhanced, to reach now version 3.3.
The specifications are presented as a PDF document, which is organized in sections :
- Section 1 presents general considerations concerning the present file format specifications.
- Section 2 presents general specifications concerning NQ NetCDF file formats.
- Section 3 deals with files containing crystallographic data, and present a rather detailed NetCDF specification, ready for exchange of data among the NQ nodes. It also briefly presents other existing standardization of files containing crystalline structure and atomic geometries.
- Section 4 deals with files containing density/potential, with the same level of detail.
- Section 5 deals with files containing wavefunctions, with the same level of detail.
- Section 6 deals with pseudopotentials / PAW (projector augmented waves) set up files. It presents the existing specifications, and summarizes the debates and conclusions reached during a mini-workshop in Louvain-la-Neuve.
- Section 7 is an overview of the other contents relevant for NQ, and the status of the file format specification.
The ETSF File Format has been implemented as the ETSF I/O Library. Please see our Libraries and Tools page for more details and download instructions.
Deprecated versions (only for reference)
ETSF Coding Standards
While portability has always been an issue in the field of simulation, interoperability in scientific software development is only a recent concern, triggered both by the individual efforts reaching a level of
complexity where collaboration becomes a necessity, and also by the appealing networking capabilities now offered by modern intranets and the internet. Just like for living cells which long ago started to join and create multicellular organisms, evolutionary pressure drives us now towards ``multicode simulations'', promising greater horizons, greater achievements, and greater challenges as well.
The turning point has been passed when technological applications have reached the nanoscale, i.e. the scale at which the properties of the systems start to vary with their size. In other words, when adding or removing just one atom causes important variations in their properties, thus invalidating their description by any statistical approach. The number of atoms involved was still however way too large for the systems to be tractable for an atomic-scale description. To address this issue, a huge effort aiming at enhancing considerably the available computational power has been undertaken.
Another aspect is the context in which these efforts are conducted. In the world of academic research, most of the software developments are performed by post-docs and PhD students, who are now in a much more precarious situation than in the past decades. The turnover is very high, and regularly new students have to be trained before being fully operational, which is very demanding. It also makes long-term projects — such as scientific software — more difficult to carry out, since these non-permanent people usually participate to the project for a few years only and may leave it very abruptly at any time. It is thus critical to devise a strategy for preserving maintainability over time.
This document provides a set of generic rules to follow in order to develop good-quality scientific software for the ETSF. It was formerly known under the title Guidelines for code development and code documentation and used as a reference within the Nanoquanta Network of Excellence, the main objective was the creation of the ETSF. The content of the current document is a typo-fixed release of the version 3 of these guidelines, published on February 2nd 2008 by the Nanoquanta Integration Team 9 “Integration of theory and code developments”.
Instead of restarting from scratch at the beginning of the ETSF, we have decided to consider the guidelines written during the lifetime of Nanoquanta as the major version 1 of the coding standards, each release of the guidelines constituting a minor version. This means that we start at version 1.3 and can improve on it while preparing a major version 2 of the coding standards.