Simulation data processing

How to add raw simulation data on your Terminus server ?

The first step is now to add raw simulation data on the Terminus server filesystem. The first question that needs to be answered is where ?

To avoid any confusion, the directory hierarchy within the Terminus data directory must match perfectly the end of the URL of the corresponding simulation page on Galactica, starting after galactica-simulations.eu/db/. Since that url is unique, it can be used to identify the target simulation data.

For example, the ORION_FIL_MHD simulation can be accessed at http://www.galactica-simulations.eu/db/STAR_FORM/ORION/ORION_FIL_MHD/. Hence the part of that URL that is going to be useful on Terminus is STAR_FORM/ORION/ORION_FIL_MHD/.The top directory is the project catagory alias (Star formation), the inner directory is the projet alias (Orion), and the child directory is the simulation alias (ORION_FIL_MHD).

To be accessed by Terminus data processing services, the raw data of that simulation must be stored within the Terminus data directory, following the exact same path : ${TERMINUS_DATA_DIR}/STAR_FORM/ORION/ORION_FIL_MHD/ . In this directory, you can transfer all the simulation snapshot directories (e.g. RAMSES outputs) See Example for an additional example.

New in version 0.7

To help the Terminus server administrator configure properly all these raw simulation data directories under the Terminus data directory, a terminus_datasource_cfg CLI tool is provided with the installation of Terminus. Upon simulation snapshot pages creation (and data association to post-processing service) on the Galactica simulation database web server (for projects hosted on your Terminus server), the Terminus server administrator will be notified of new data path definitions to configure on the Terminus server filesystem. It will take the form of a single JSON file that can be used as the only argument of the terminus_datasource_cfg command to automatically define the proper data paths and symbolic links :

> terminus_datasource_cfg new_datasource.json
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Terminus datasource directory definitions ~~~~~~~~~~~~~~~~~~~~~~~~ #
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
Project category directory '/data/Terminus/STAR_FORM' already exists.
 -> Project directory '/data/Terminus/STAR_FORM/ORION' already exists.
     * Simulation directory '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD' already exists.
         - Created snapshot data symbolic link : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00050' => '/raid/data/Proj_ORION/fil/MHD/output_00050'.
         - Created snapshot data symbolic link : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00060' => '/raid/data/Proj_ORION/fil/MHD/output_00060'.
         - Created snapshot data symbolic link : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00080' => '/raid/data/Proj_ORION/fil/MHD/output_00080'.
         - Snapshot data symbolic link unchanged : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00120' => '/raid/data/Proj_ORION/fil/MHD/output_00120'.
         - Snapshot data symlink '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00150' has changed:
             + old target : '/raid/data/Proj_ORION/fil/Run4_phi3.45_beta0.5/output_00150'.
             + new target : '/raid/data/Proj_ORION/fil/MHD/output_00150'.
           Overwrite (y='yes', n='no', a='yes to all', x='no to all') [Y] ? y
           -> Deleted deprecated snapshot data symbolic link : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00150' => '/raid/data/Proj_ORION/fil/Run4_phi3.45_beta0.5/output_00150'.
         - Created snapshot data symbolic link : '/data/Terminus/STAR_FORM/ORION/ORION_FIL_MHD/output_00150' => '/raid/data/Proj_ORION/fil/MHD/output_00150'.
Done...
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #

Once this command is executed successfully, the Terminus can immediately access the configured simulation data, answer Galactica incoming job request and execute data processing jobs. No Terminus server restart is necessary.

How to create new post-processing services ?

First, go to your defined terminus service directory and create a new directory matching the name of the service (e.g. my_service). In that directory, copy and paste the following python template for your new service in a new python script, matching the name of the service (e.g. my_service.py):

${HOME}/terminus
└── services
    └── my_service
        ├── my_service.py
        └── service.json
 1# -*- coding: utf-8 -*-
 2import os
 3import sys
 4#  
 5# Do your custom imports here
 6import h5py  # If you want to export data as HDF5 file for example
 7#
 8
 9
10def parse_data_ref(data_path, data_ref):
11    """Fetch the required data
12
13    :param data_path:
14    :param data_ref: dataset reference as string
15    :return:
16    """
17    #
18    # Parse your data reference
19    #
20    # As an example, you could interpret the 'dataset reference' as a directory name, and concatenates it
21    # with the base data directory path
22    dataset_path = os.path.join(data_path, data_ref)
23    if not os.path.isdir(dataset_path):
24        raise IOError("Simulation snapshot data directory not found.")
25
26    return dataset_path
27
28
29def run(data_path, data_ref, argument_1="A", argument_2=256, argument_3=False, test=False):
30    """My custom data processing service description
31
32    :param data_path: path to data directory (string)
33    :param data_ref: dataset reference (string)
34    :param argument_1:
35    :param argument_2:
36    :param argument_3:
37    :param test: Activate service test mode
38    """
39    path = parse_data_ref(data_path, data_ref)
40
41    if test:
42        #
43        # Run your service in test mode
44        #
45        return
46
47    #
48    # Python code for the service in normal mode goes here !
49    #
50
51    # The data processing service must write its output data in a local (already created) "out/" directory.
52    output_dir = "out"
53
54    # Example => write data in a HDF5 file
55    with h5py.File(os.path.join(output_dir, "results.h5"), 'w') as h5f:
56        # Write result in a HDF5 dataset
57        h5f.create_dataset("my_array", ...)
58
59
60if __name__ == "__main__":
61    run(sys.argv[1], sys.argv[2])
62
63__all__ = ["run"]

Target data attributes

In the above script, the run python method will be directly called by the Terminus job upon execution. The data_path attribute is automatically transferred by Galactica during the job submission stage and matches the Galactica URL as explained in the previous section.

The data_ref attribute is the unique identifier of the simulation snapshot, as defined in Galactica. It allows to distinguish the various snapshots within a simulation and is transferred as a string (for example, the data_ref could be the snapshot number to post-process). Both the data_path and the data_ref attributes are provided to help you find the target data to run your service on (here in this example, they are parsed by the parse_data_ref function).

Test mode boolean attribute

The last attribute test is mandatory (set to False by default) and is used by Galactica to periodically test the Terminus data processing service. Indeed, Galactica will schedule periodic job requests in test mode (test=True) to monitor the availability of the Terminus server, the consistency of the data processing services, the existence of the target data and the correct execution of the processing service.

In test mode, the Terminus service will be run with the default values for the custom attributes (see next section), Galactica will not provide any value for these attributes. For performance reasons, you are advised to choose default values so that your service runs in the most degraded way (e.g. lowest resolution, small geometric region, …) with these default attribute values.

Custom service attributes

The attributes between data_ref and test are specific to your data processing service and can be customized at will. The values for these attributes will be defined by the requesting user in the job submission form online. In this template we use only 3 attributes (argument_1, argument_2, argument_3) but you can use as many attributes as you need to perform your service.

You are encouraged to restrict the set of values considered valid for each parameter (e.g. value range for numeric attributes, limited set of possible resolution, restricted choice of physical quantity to process, etc.) and raise errors in case the received values are invalid. You can notify the Galactica administrator of these restrictions in order to design the job request submission form online accordingly.

The choice of the default values for your custom attributes is important: they must be set to run the service in a fast and memory-friendly way (see test mode section above).

Where the service must write data ?

During its execution, the data processing service will run in a dedicated job directory created in the Terminus job directory (see Terminus configuration) :

${HOME}/terminus
└── jobs
    ├── 148_density_map
    |    └── out
    └── 156_datacube3D
         └── out

within this job execution directory, an out/ directory has been created for you tu put all the data files you want your service to provide. You only need to write data into this local out/ directory within your service script and this directory will be tarballed upon job completion and finally uploaded to the Galactica server.

Service runtime configuration file

In addition to the service script file, Terminus requires a small JSON file named service.json that must contain the job type and interpreter required to run the service script. This allows, for example, the user to use python 2 service script while the Terminus server itself uses a python 3 environment, or even separate the python environments for each data processing service (each with their own dependencies).

Optionally, the job n_nodes (number of nodes), n_cores (total number of cores) and timout_mn (SLURM job timeout in minutes) integer parameters can be set (if you configured the Terminus server to submit processing jobs as SLURM jobs). These parameter values will define the SLURM submission parameters #SBATCH -N n_nodes, #SBATCH -n n_cores and SBATCH -t timeout_duration. If left undefined, the default number of node is 1, the default total number of cores is 8, and the default SLURM job timeout 30 minutes.

The content of the service.json must look like this:

{
  "job":
    {
      "type":"python",
      "interpreter":"/path/to/interpretor/pythonX.Y",
      "n_nodes": 1,
      "n_cores": 16,
      "timout_mn": 30
    }

}

Publish your new post-processing service on the Galactica web app

Once this is done, the final step for the service to be available on Galactica, is to send some information to a Galactica admin :

  • the name of your service and the name of the Terminus server on which you deployed it,

  • a description of your data processing service, to document it on the Galactica web pages (full HTML content if you prefer, along with inserted images if required),

  • the custom attribute names of the run method as well as their type and restrictions on their valid values. For example, argument_1 must be an integer in the range [0,10], argument_2 a character that can only take one of those values [x,y,z], argument_3 must a string chosen in the set [‘gas density’, ‘gas temperature’, ‘gas metallicity’], etc.

Once defined by the Galactica admin on the web server, you will be able to connect this new data processing service to your project so that authenticated users could submit job requests online to manipulate your raw data.

Note

There is no need need to restart your Terminus server instance when :
  • you create a new post-processing service script into your terminus_service_directory,

  • you modify an existing service script into your terminus_service_directory,

  • you activate a new Terminus post-processing service on Galactica.