The OSG is a Distributed High Throughput Computing (DHTC) environment, which means that users can access compute cores on over 100 different computing sites across the nation with a single job submission. This also means that your jobs must fit a set of criteria in order to be eligible to run on OSG. The list below provides some rule of thumb characteristics that can help us make a decision if using OSG for a given job is a viable option.
Characteristics of an OSG friendly job |
---|
Variable | Suggested Values |
---|---|
Memory/Process | <= 2GB |
Type of job | serial (i.e. mostly single core) |
Network traffic (input or output files) |
<= 2GB each side |
Running Time | Ideal time is 1-10 hours - max is 72 |
Runtime Disk Usage | <= 10GB |
Binary Type | Portable RHEL6/7 |
Total CPU Time (of job workflow) | Large, typically >= 1000 hours |
The relatively short runtime is necessary due to job pre-emption. Jobs belonging to resource owners on the machine where your job is running may pre-empt (or kill) your job unexpectedly. When this happens, your job’s progress is not automatically saved, and it will have to start over from the beginning. For this reason, it is good practice to build automatic checkpointing into your job, or break a large job into multiple small jobs if it is at all possible.