Overview of the Tibanna code structure¶
Tibanna Pony (tibanna_4dn) and Zebra (tibanna_cgap) are built upon Tibanna Unicorn (tibanna, independent of any data portal). Code for Pony and Zebra uses code for Unicorn by either importing or inheriting. The code shared between Pony and Zebra that are not a part of Unicorn is stored in the shared component tibanna_ffcommon. All of these use AWSEM (Automonous Workflow Step Executor Machine) at the core, which is an EC2 instance that is auto-configured by Tibanna that does its job automonously and terminates itself at the end.
Repository & Directory structure¶
https://github.com/4dn-dcic/tibanna
tibanna: code for Unicornawsf: code that runs on AWSEM (commonly used by Unicorn, Pony and Zebra)
https://github.com/4dn-dcic/tibanna_ff
tibanna_4dn: code for Ponytibanna_cgap: code for Zebratibanna_ffcommon: code shared between Pony and Zebra that are not part of Unicorn
Each of the three variants (Unicorn, Pony or Zebra) consists of a core API (core.py), CLI (__main__.py), lambdas (/lambdas) and set of python modules that are used by the former three.
AWSEM¶
The code in awsf is not a part of any Python package, but the scripts in the folder is pulled by an AWSEM EC2 instance directly from the public tibanna Github repo. Currently, awsf is still using Python 2.7, whereas all the other code is based on Python 3.6. The reason awsf uses Python 2.7 is because it runs on the pre-built Tibanna AMI which is based on Python 2.7 and we haven’t updated the AMI yet.
AWS Lambda¶
The code for individual AWS Lambda functions is defined in individual .py files inside the lambdas directory under the package directory (e.g. tibanna_4dn, tibanna_cgap, or in case of unicorn, tibanna in the tibanna repo.)
Unicorn¶
A Unicorn consists of two AWS Lambda functions - run_task_awsem and check_task_awsem.
Pony¶
A Pony step function consists of four Lambda functions - in addition to run_task_pony, check_task_pony, it has start_run_pony and update_ffmeta_pony. Additionally, tibanna_4dn’s deploy_pony and deploy_core functions allow deploying other Lambdas that are not a part of a Pony step function. These include the following:
run_workflow_pony: a Lambda function that triggers a workflow run on thetibanna_ponystep function, that serves as a fourfront endpoint.validate_md5_s3_trigger_pony: a Lambda function that gets triggered upon file upload to a fourfront bucket. Once triggered, it invokestibanna_initiatorstep function which in turn invokesvalidate_md5_s3_initiator_ponyLambda.validate_md5_s3_initiator_pony: a Lambda function that triggersmd5sumandfastqcworkflow runs on thetibanna_pony_tmp_md5step function.status_wfr_pony: a mysterious Lambda function that does something
The .py files for the Lambdas do not have the suffix pony in their file names, but the Lambdas do always have the suffix, to differentiate them from unicorn or zebra Lambdas.
Zebra¶
Like Pony, a Zebra step function consists of four Lambda functions - in addition to run_task_zebra, check_task_zebra, it has start_run_zebra and update_ffmeta_zebra. Additionally, tibanna_cgap’s deploy_zebra and deploy_core functions allow deploying other Lambdas that are not a part of a Zebra step function. These include the following:
run_workflow_zebra: a Lambda function that triggers a workflow run on thetibanna_ponystep function, that serves as a fourfront endpoint.validate_md5_s3_trigger_zebra: a Lambda function that gets triggered upon file upload to a cgap bucket. Once triggered, it invokestibanna_initiator_zebrastep function which in turn invokesvalidate_md5_s3_initiator_zebraLambda.validate_md5_s3_initiator_zebra: a Lambda function that triggersmd5sumandfastqcworkflow runs on thetibanna_zebra_tmp_md5step function.status_wfr_zebra: a mysterious Lambda function that does something
The .py files for the Lambdas do not have the suffix zebra in their file names, but the Lambdas do always have the suffix, to differentiate them from unicorn or pony Lambdas.
AWS Step Functions¶
The code that describes a step function structure is in stepfunction.py in tibanna, tibanna_ffcommon, tibanna_4dn and tibanna_cgap. The step function class of tibanna_4dn (class StepFunctionPony) and tibanna_cgap (class StepFunctionZebra) inherit from that of tibanna_ffcommon (class StepFunctionFFAbstract) which in turn inherits from that of tibanna (class StepFunctionUnicorn). Class StepFunctionFFAbstract is not an actually functional step function but works as a common component that both StepFunctionPony and StepFunctionZebra can inherit from.