Instructions for installing Corvus for calculations of optical constants

These instructions describe the steps needed in order to install and test Corvus and FEFF10, which are needed for "opcons" Corvus workflow simulations.

Prerequisites

The following are prerequisites for installing and running Corvus and FEFF10 with the opcons workflow:

  • python version 3.7 or above
  • A message passing interface - we suggest openmpi, but many others will work as well.
  • A fortran compiler - we suggest gfortran from gnu, but many other compilers will work.

Installation

  1. Create a directory to install the sources and chang into it. Here we use "src/OpCons"

    mkdir -p ~/src/OpCons
    cd ~/src/OpCons

  2. Clone the "convert-to-3" branch of Corvus and the "fullspectrum" branch of FEFF10 using git

    git clone --branch convert-to-3 https://github.com/times-software/Corvus.git
    git clone --branch fullspectrum https://github.com/times-software/feff10.git


    Alternatively the branches can be downloaded directly from the GitHub website

    Corvus
    FEFF10

    in tarball form and extracted in ~/src/OpCons.
  3. Compile FEFF

    In "~/src/OpCons/feff10/src" do:

    cp Compiler.mk.default Compiler.mk

    If the desired MPI Fortran compiler name is different than "mpif90", edit "Compiler.mk" by changing the following line to the correct compiler:

    MPIF90 = mpif90

    After this, still in "~/src/OpCons/feff10/src" do:

    make mpi

  4. Configure and install Corvus and the Materials Project (MP) tools:

    In "~/src/OpCons/Corvus" do:

    cp corvus.conf.template corvus.conf

    Edit "corvus.conf" to include the full path to the feff executables. For example:

    feff : [HOME]/src/OpCons/feff10/bin/MPI

    where [HOME] is the home directory path for the current user.

    The Corvus MP tools (crv_mp_mk_set and crv_mp_run_set) are located in "~/src/OpCons/Corvus/corvutils/mp" together with their configuration file (crv_mp.ini). This configuration file contains information to choose the MPI implementation to be used (MPISYS, with possible options openmpi, mvapich2 or slurm). The same directory contains a template with possible settings for the available MPI systems). Once the system is chosen, the Corvus MP tools set up all options needed to allocate the nodes and processors, for instance "-np" for Open MPI, or "-N" for Slurm. Any additional options required by the user can be added using the MPIOPT_User configuration option.

    crv_mp_mk_set is used to create a set of MP systems for which the opcons can be run simultaneously (if the resources are available). crv_mp_run_set is used to run the calculations for a set.

    Corvus and its MP tools requires python3.7 or higher (and python3-dev in some systems) for configuration and use. Please make sure you have the correct version (use "python --version") before configuring. Thus, with the correct version of python and still in "~/src/OpCons/Corvus" do:

    python setup.py install --user

    for a production deployement or:

    python setup.py develop --user

    for a development one. We recommend users use the development deployment at present, in order to simplify the debugging process if a problem occurs. In what follows we assume development installation.

    At this point Feff, Corvus and the MP tools should be available for use.

Testing, tutorials, and examples

  1. Test that the Corvus MP tools are in the path

    In principle, the tools should be accessible after configuration is Python is properly installed. To check this do:

    which crv_mp_mk_set

    This should return something similar to

    ~/.local/bin/crv_mp_mk_set

    If this is not the case, then add "~/.local/bin" to the PATH environment variable. From here on we assume that this has been done and the tools are accessible to the user system wide.
  2. Create a simple test set and run it in serial mode

    crv_mp_mk_set is used to create a set of MP systems for which the opcons can be run simultaneously (if the resources are available). This command requires that the user have a Materials Project API key. To obtain a key, please go to:

    https://materialsproject.org/open

    and follow the instructions under "API keys". From here on we assume the user has a valid key, which we will represent with "[KEY]". We also assume that the user is running the tests in the directory "[TESTS]".

    To create a simple test case with the label "LiBeB" do:
    crv_mp_mk_set --k [KEY] --nSCF 1 --nFMS 2 --v --ptel 'Li,Be,B' LiBeB

    In this "--nSCF" and "--nFMS" set the number of shells used for the SCF and FMS procedures, respectively, the "--v" option makes the output more verbose, and finally, the "--ptel" sets which elements of the periodic table to look for in the Materials Project. This command should produce the following output:

    and it should also create the directory "[TESTS]/LiBeB" as well as an MSON file ("[TESTS]/LiBeB.mson") with all the information for the run.

    To run this set we use crv_mp_run_set. First we execute it using a single processor, in order to verify that the installation was succesfull:

    crv_mp_run_set --np 1 --v LiBeB


    This should produce the following standard output:



    This output shows that the calculation was split into 3 serial steps (one for each system), and that 1 processor was assigned to each serial run. The estimated time cost is in units of a single edge run. The partition that crv_mp_run_set will use can be found without running the calculation by using the dry-run ("--dr") option.

    The output for each individual system can be found in the run directories listed above (e.g. "[TESTS]/LiBeB/Be/mp-87/corvus_oc") in the following files:

    • absorption_conv.dat
    • absorption.dat
    • epsilon_conv.dat
    • epsilon.dat
    • epsilon_drude.dat
    • index_conv.dat
    • index.dat
    • loss_conv.dat
    • loss.dat
    • reflectance_conv.dat
    • reflectance.dat
    • sumrules_conv.dat
    • sumrules.dat

    For testing purposes, these reference results as well as other for different tests can be found in:

    ~/src/OpCons/Corvus/corvutils/mp/examples

    For storage efficiency, this directory only contains the input and final opcons output results. All the intermediate workflow files have been removed.

  3. Run the simple test in parallel mode Next we test the parallelization of the runs. First, we need to recreate the set using the command

    crv_mp_mk_set --k [KEY] --nSCF 1 --nFMS 2 --v --ptel 'Li,Be,B' LiBeB


    again. This will save the old results to "[TESTS]/LiBeB.save.000" and it is needed because Corvuse reuses the old results and thus the timings would not be correct.

    If 10 processors are available, all the edges can be run simultaneously with:

    crv_mp_run_set --np 10 --v LiBeB

    with the following output:

    Here the run was done in a single serial step, with all systems running simultaneously, thus having an estimated cost 10 times faster than the run in a single processor. In practice, the calculation runs approximately 7 times faster.

    Given the partitioning of the runs over systems and edges, it is sometimes difficult to estimate what is the most advantagoues choice for the total number of processors. For this purpose the crv_mp_run_set tool provides an estimate of the time it would take to complete the simulations, in units of a single edge run:

    crv_mp_run_set --t --np 32 LiBeB


    This command estimate the time ("--t") and efficiency of the calculation up to "--np" processors. Typical output looks like:



    This estimate was computed using the default "serial fraction" of the code of 0.5, as well as the default number of processors per node ("--ppn") of 64. The serial fraction ("--sf") provides an estimate of the parallelization efficiency ("--sf 0.0" => perfectly parallelizable code, "--sf 1.0" => completely serial code) and it depends on the size of the systems to be computed. At present, we recommend that the estimates are done using the default serial fraction. Once we gather more data in the future we will have a better estimate of the best values or different systems types.

    The output above shows that certain "magic number" of processors provide theoretical 100% efficiency and are related to the number of edges to compute, and that the efficiency of the calculation drops quickly towards the theoretical limit set by the serial fraction. The "Cost" gives an estimate of how much more a calculation would cost relative to the pure serial calculation. In a perfectly parallel the cost should be identical, but in practice the actual cost is usually higher. Users can use this estimate to assess the trade off between speed up and cost.