Job Description JSON Schema¶
The Job Description json for Tibanna Pony and Zebra are different from the json for Tibanna, but it’s the same in that it defines an individual execution. The config part is largely the same. The Pony/Zebra input json does not have args but has its own set of fields.
The first step of the Pony/Zebra step function converts this input json to a Unicorn input json and pass it to the second step (run_task).
Example job description¶
{
"description": [
"This input json run a bwa-mem workflow, which is part of 4DN Hi-C pipeline",
"on hg38 genome reference."
],
"app_name": "bwa-mem",
"_tibanna": {
"env": "fourfront-webdev",
"run_type": "bwa-mem"
},
"output_bucket": "elasticbeanstalk-fourfront-webdev-wfoutput",
"workflow_uuid": "0fbe4db8-0b5f-448e-8b58-3f8c84baabf5",
"parameters" : {"nThreads": 4},
"input_files" : [
{
"object_key": "4DNFIZQZ39L9.bwaIndex.tgz",
"workflow_argument_name": "bwa_index",
"uuid": "1f53df95-4cf3-41cc-971d-81bb16c486dd",
"bucket_name": "elasticbeanstalk-fourfront-webdev-files",
"rename": "hg38.tar.gz"
},
{
"workflow_argument_name": "fastq1",
"bucket_name": "elasticbeanstalk-fourfront-webdev-files",
"uuid": "1150b428-272b-4a0c-b3e6-4b405c148f7c",
"object_key": "4DNFIVOZN511.fastq.gz"
},
{
"workflow_argument_name": "fastq2",
"bucket_name": "elasticbeanstalk-fourfront-webdev-files",
"uuid": "f4864029-a8ad-4bb8-93e7-5108f462ccaa",
"object_key": "4DNFIRSRJH45.fastq.gz"
}
],
"config": {
"instance_type": "t3.large",
"EBS_optimized": true,
"ebs_size": 30,
"ebs_type": "gp2",
"shutdown_min": 30,
"password": "",
"log_bucket": "tibanna-output",
"key_name": "4dn-encoded",
"spot_instance": true,
"spot_duration": 360,
"behavior_on_capacity_limit": "wait_and_retry",
"overwrite_input_extra": false,
"cloudwatch_dashboard", false,
"email": true,
"public_postrun_json" : true
},
"custom_pf_fields": {
"out_bam": {
"genome_assembly": "GRCh38"
}
},
"wfr_meta": {
"notes": "a nice workflow run"
},
"custom_qc_fields": {
"award": "/awards/5UM1HL128773-04/",
"lab": "/labs/bing-ren-lab/"
},
"push_error_to_end": true
"dependency": {
"exec_arn": [
"arn:aws:states:us-east-1:643366669028:execution:tibanna_unicorn_default_7412:md5_test",
"arn:aws:states:us-east-1:643366669028:execution:tibanna_unicorn_default_7412:md5_test2"
]
}
}
- The
descriptionfield is an optional field for humans and they are ignored by Tibanna. - The
app_namefield contains the name of the workflow. - The
output_bucketfield specifies the bucket where all the output files go to. It is not required if_tibannaspecifiesenv. The bucket name is auto-determined from theenvvalue. - The
workflow_uuidfield contains the uuid of the 4DN workflow metadata. - The
parametersfield contains a set of workflow-specific parameters in a dictionary. - The
additional_benchmarking_parametersfield contains a set of additional parameters that are not required for workflow runs but is required for a benchmarking function (e.g. resource usage depends on number of reads which is not a parameter for workflow run) - The
input_filesfield specifies the argument names (matching the names in CWL), the input file metadata uuid and its bucket and object key name.workflow_argument_nameanduuidare required fields.bucket_nameandobject_keyare required only if the content is a list.rename(optional) can be used to rename a file upon download from s3 to an instance where the workflow will be executed.
- The
configfield is directly passed on to the second step, where instance_type, ebs_size, EBS_optimized are auto-filled, if not given.- The
spot_instancefield (optional), if settrue, requests a spot instance instead of an on-demand instance. - The
spot_durationfield (optional), if set, requests a fixed-duration spot instance instead of a regular spot instance. The value is the duration in minutes. This field has no effect ifspot_instanceis eitherfalseor not set. - The
behavior_on_capacity_limitfield (optional) sets the behavior of Tibanna in case AWS instance Limit or Spot instance capacity limit is encountered. Default value isfail. If set towait_and_retry, Tibanna will wait until the instance becomes available and rerun (10 min interval, for 1 week). Ifspot_instanceistrueandbehavior_on_capacity_limitis set toretry_without_spot, when the spot instance is not available, it will automatically switch to a regular instance of the same type (applicable only whenspot_instanceistrue). - The
overwrite_input_extra(optional) allows overwriting on an existing extra file, if the workflow hasan output of typeOutput to-be-extra-input file(i.e., creating an extra file of an input rather than creating a new processed file object). Defaultfalse. - The
cloudwatch_dashboardfield (optional), if settrue, creates a cloudwatch dashboard for the job, which allows users to trace memory, disk and CPU utilization during and after the run. - The
emailfield (optional), if settrue, sends a notification email to4dndcic@gmail.comwhen a workflow run finishes. - The
public_postrun_jsonfield (optional) is recommended to be settrue. This way the postrun json files become publicly available when they’re created. - The
key_namefield is recommended to be set4dn-encodedwhich is the key used by the 4DN DCIC team.
- The
- The
push_error_to_endfield (optional), if set true, passes any error to the last step so that the metadata can be updated with proper error status. (default true) - The
custom_pf_fieldsfield (optional) contains a dictionary that can be directly passed to the processed file metadata. The key may be eitherALL(applies to all processed files) or the argument name for a specific processed file (or both). - The
wfr_metafield (optional) contains a dictionary that can be directly passed to the workflow run metadata. - The
custom_qc_fieldsfield (optional) contains a dictionary that can be directly passed to an associated Quality Metric object. - The
dependencyfield (optional) sets dependent jobs. The job will not start until the dependencies successfully finish. If dependency fails, the current job will also fail. Theexec_arnis the list of step function execution arns. The job will wait at the run_task step, not at the start_task step (for consistenty with unicorn). This field will be passed to run_task asdependencyinside theargsfield.