JobManager¶
Introduction¶
The purpose of the JobManager
module is to provide a python wrapper for submitting and tracking jobs in a queue environment.
Configuration¶
The JobManager
is initially built for a PBS queue environment, so many of the commands will have to be modified for usage in a different queue environment. These customizations will likely take place in the following files.
- The
submit
andwrite_submit
function in thestructopt/utilities/job_manager.py
file will likely need to be updated to reflect your specific queue environment. - The dictionaries held in
structopt.utilities/rc.py
is the first attempt to store some dictionaries specific to the queue environment. Many queue specific variables are drawn from here.
Submitting jobs¶
Single job¶
The script below is an example script of submitting a single job to a queue using the JobManager
. The optimization run is a short run of a Au55nanoparticle using only LAMMPS. A large part of the script is defining the input, which goes into the JobManager
class. These inputs are given below.
calcdir
: This is a string that tells where the calculation is run. Note that the calculation itself is run within thecalcdir/logs{time}
directory, which is created when the job starts to run on the queue. Unless an absolute path, thecalcdir
directory is always given with respect to directory that the job script is run fromoptimizer
: This is a string of the optimizer file used for the calculation. These files can be found in thestructopt/optimizers
folder. Upon run, a copy of this script is placed insde of thecalcdir
directory and accessed from there.StructOpt_parameters
: This is a dictionary object that should mirror the input file you are trying to submitsubmit_parameters
: This dictionary holds the submit parameters. These will be specific to the queue system in use. In this example, we specify the the submission system, queue, number of nodes, number of cores, and walltime.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
LAMMPS_parameters = {"use_mpi4py": True,
"MPMD": 0,
"keep_files": False,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0}
StructOpt_parameters = {
"seed": 0,
"structure_type": "cluster",
"generators": {"sphere": {"number_of_individuals": 20,
"kwargs": {"atomlist": [["Au", 55]],
"cell": [20, 20, 20]}}},
"fitnesses": {"LAMMPS": {"weight": 1.0,
"kwargs": LAMMPS_parameters}},
"relaxations": {"LAMMPS": {"order": 0,
"kwargs": LAMMPS_parameters}},
"convergence": {"max_generations": 10},
"mutations": {"move_atoms": {"probability": 0.1},
"rotate_cluster": {"probability": 0.1}},
"crossovers": {"rotate": {"probability": 0.7}},
"predators": {"best": {"probability": 1.0}},
"selections": {"rank": {"probability": 1.0,
"kwargs": {"unique_pairs": False,
"unique_parents": False}}},
"fingerprinters": {"keep_best": True,
"diversify_module": {"probability": 1.0,
"kwargs": {"module": "LAMMPS",
"min_diff": 0.001}}},
"post_processing": {"XYZs": -1},
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, StructOpt_parameters, submit_parameters)
job.optimize()
Upon running this script, the user should get back an exception called structopt.utilities.exceptions.Submitted
with the jobid. This is normal behavior and communicates that the job has successfully been submitted.
Multiple jobs¶
One advantage of the job manager is that it allows one to submit multiple jobs to the queue. This is often useful for tuning the optimizer against different inputs. The script below is an example of submitting the same job at different seeds.
In the previous script, submitting a single job successfully with JobManager.optimizer
method resulted in an exception. We can catch this exception with a try
and except
statement. This is shown below in the script where upon a successful submission, the script prints out the jobid to the user.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
LAMMPS_parameters = {"use_mpi4py": True,
"MPMD": 0,
"keep_files": False,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0}
StructOpt_parameters = {
"seed": 0,
"structure_type": "cluster",
"generators": {"sphere": {"number_of_individuals": 20,
"kwargs": {"atomlist": [["Au", 55]],
"cell": [20, 20, 20]}}},
"fitnesses": {"LAMMPS": {"weight": 1.0,
"kwargs": LAMMPS_parameters}},
"relaxations": {"LAMMPS": {"order": 0,
"kwargs": LAMMPS_parameters}},
"convergence": {"max_generations": 10},
"mutations": {"move_atoms": {"probability": 0.1},
"rotate_cluster": {"probability": 0.1}},
"crossovers": {"rotate": {"probability": 0.7}},
"predators": {"best": {"probability": 1.0}},
"selections": {"rank": {"probability": 1.0,
"kwargs": {"unique_pairs": False,
"unique_parents": False}}},
"fingerprinters": {"keep_best": True,
"diversify_module": {"probability": 1.0,
"kwargs": {"module": "LAMMPS",
"min_diff": 0.001}}},
"post_processing": {"XYZs": -1},
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
seeds = [0, 1, 2, 3, 4]
for seed in seeds:
StructOpt_parameters['seed'] = seed
calcdir = 'job_manager_examples/Au55-seed-{}'.format(seed)
job = JobManager(calcdir, optimizer, StructOpt_parameters, submit_parameters)
try:
job.optimize()
except Submitted:
print(calcdir, job.get_jobid(), 'submitted')
job_manager_examples/Au55-seed-0 936454.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-1 936455.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-2 936456.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-3 936457.bardeen.msae.wisc.edu submitted
job_manager_examples/Au55-seed-4 936458.bardeen.msae.wisc.edu submitted
Tracking jobs¶
In the previous section, we covered how to submit a new job from an empty directory. This is done by first initializing an instance of the StructOpt.utilities.job_manager.JobManager
class with a calculation directory along with some input files and then submitting the job with the JobManager.optimize
method. The JobManager.optimize
method knows what to do because upon initialization, it detected an empty directory. If the directory was not empty and contained a StructOpt job, the JobManager
knows what to do with it if optimize
was run again. This is all done with exceptions.
The four primary exceptions that are returned upon executing the optimize
method are below along with their explanations.
Submitted
: This exception is returned if a job is submitted from the directory. This is done whenJobManager.optimize
is called in an empty directory orJobManager.optimize
is called with the kwargrestart=True
in a directory where a job is not queued or running.Queued
: The job is queued and has not started running. There should be no output files to be analyzed.Running
: The job is running and output files should be continously be updated. These output files can be used for analysis before the job has finished running.UnknownState
: This is returned if thecalcdir
is not an empty directory doesn’t detect it as a StructOpt run. A StructOpt run is detected when astructopt.in.json
file is found in thecalcdir
.
Note that if no exception is returned, it means the job is done and is ready to be analyzed. Job.optimize
does nothing in this case.
One way of using these three exceptions is below. If the job is submitted or Queued, we want the script to stop and not submit the job. If it is running, additional commands can be used to track the progress of the job.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
LAMMPS_parameters = {"use_mpi4py": True,
"MPMD": 0,
"keep_files": False,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0}
StructOpt_parameters = {
"seed": 0,
"structure_type": "cluster",
"generators": {"sphere": {"number_of_individuals": 20,
"kwargs": {"atomlist": [["Au", 55]],
"cell": [20, 20, 20]}}},
"fitnesses": {"LAMMPS": {"weight": 1.0,
"kwargs": LAMMPS_parameters}},
"relaxations": {"LAMMPS": {"order": 0,
"kwargs": LAMMPS_parameters}},
"convergence": {"max_generations": 10},
"mutations": {"move_atoms": {"probability": 0.1},
"rotate_cluster": {"probability": 0.1}},
"crossovers": {"rotate": {"probability": 0.7}},
"predators": {"best": {"probability": 1.0}},
"selections": {"rank": {"probability": 1.0,
"kwargs": {"unique_pairs": False,
"unique_parents": False}}},
"fingerprinters": {"keep_best": True,
"diversify_module": {"probability": 1.0,
"kwargs": {"module": "LAMMPS",
"min_diff": 0.001}}},
"post_processing": {"XYZs": -1},
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, StructOpt_parameters, submit_parameters)
try:
job.optimize()
except (Submitted, Queued):
print(calcdir, job.get_jobid(), 'submitted or queued')
except Running:
pass
job_manager_examples/Au55-example 936453.bardeen.msae.wisc.edu submitted or queued
Restarting jobs¶
Sometimes jobs need to be restarted or continued from the last generation. The JobManager
does this by submitting a new job from the same calcdir
folder the previous job was run in. Because calculations take place in unique log{time}
directories, the job will run in a new log{time}
directory. Furthermore, the JobManager
modifies the structopt.in.json
file so the initial population of the new job are the XYZ files of the last generation of the previous run. The code below is an example of restarting the first run of this example. The only difference between this code and the one presented in the previous section is that a restart=True
kwarg has been added to the JobManager.optimize
command.
from structopt.utilities.job_manager import JobManager
from structopt.utilities.exceptions import Running, Submitted, Queued
calcdir = 'job_manager_examples/Au55-example'
LAMMPS_parameters = {"use_mpi4py": True,
"MPMD": 0,
"keep_files": False,
"min_style": "cg",
"min_modify": "line quadratic",
"minimize": "1e-8 1e-8 5000 10000",
"pair_style": "eam",
"potential_file": "$STRUCTOPT_HOME/potentials/Au_u3.eam",
"thermo_steps": 0}
StructOpt_parameters = {
"seed": 0,
"structure_type": "cluster",
"generators": {"sphere": {"number_of_individuals": 20,
"kwargs": {"atomlist": [["Au", 55]],
"cell": [20, 20, 20]}}},
"fitnesses": {"LAMMPS": {"weight": 1.0,
"kwargs": LAMMPS_parameters}},
"relaxations": {"LAMMPS": {"order": 0,
"kwargs": LAMMPS_parameters}},
"convergence": {"max_generations": 10},
"mutations": {"move_atoms": {"probability": 0.1},
"rotate_cluster": {"probability": 0.1}},
"crossovers": {"rotate": {"probability": 0.7}},
"predators": {"best": {"probability": 1.0}},
"selections": {"rank": {"probability": 1.0,
"kwargs": {"unique_pairs": False,
"unique_parents": False}}},
"fingerprinters": {"keep_best": True,
"diversify_module": {"probability": 1.0,
"kwargs": {"module": "LAMMPS",
"min_diff": 0.001}}},
"post_processing": {"XYZs": -1},
}
submit_parameters = {'system': 'PBS',
'queue': 'morgan2',
'nodes': 1,
'cores': 12,
'walltime': 12}
optimizer = 'genetic.py'
job = JobManager(calcdir, optimizer, StructOpt_parameters, submit_parameters)
job.optimize(restart=True)