.. _pegasus-aws-batch:

=================
pegasus-aws-batch
=================

pegasus-aws-batch - a client to run tasks on Amazon AWS Batch.
::

      pegasus-aws-batch [-h]
                          [-C propsfile]
                          [-L  <error, warn, info, debug, trace>]
                          [-l logfile]
                          [--merge-logs prefix]
                          [-a AWS account id]
                          [-r AWS region]
                          [--create]
                          [--delete]
                          [-p prefix]
                          [-j jsonfile or arn or job-definition name]
                          [--ce jsonfile or arn or compute-environment name]
                          [-q jsonfile or arn or queue name]
                          [-s  s3 bucket URL]
                          [-f file1[,file2 …]]
                          job submit file


Description
===========

**pegasus-aws-batch** a client to run tasks on Amazon AWS Batch. Also
allows you to create and delete entities such as job definition, compute
environment and job queue required by AWS Batch Service. The tool also
allows you to upload files from your local machine to the S3 bucket
specified on the command line or the properties. This allows you to ship
data to S3 that your jobs running in AWS Batch require. The tool will
automatically fetch the stdout of your jobs from the CloudWatch Logs and
place it on the local filesystem.


Options
=======

**-h**; \ **--help**
   Show help message for subcommand and exit

**-C** *propsfile*; \ **--conf**\ =\ *propsfile*
   Path to the properties file containing the properties to configure
   the too.

**-L** *<error, warn, info, debug, trace>*; \ **-log-level** *<error, warn, info, debug, trace>*
   Sets the log level for the tool.

**-l** *logfile*; \ **--log-file**\ =\ *logfile*
   Path to the file where you want the client to log to. By default,
   client logs to it’s stdout and stderr.

**-m** *prefix*; \ **--merge-logs**\ =\ *prefix*
   By default, the tool pulls down the task stdout and stderr to
   separate files with the name being determined by the job name
   specified in the task submit file. The prefix is used for merging all
   the tasks stdout to a single file starting with the name prefix and
   ending in .out. Similar behavior is applied for the tasks stderr.

**-a** *AWS account id*; \ **--account**\ ='AWS account id '
   the AWS account to use for running jobs on AWS Batch. Can also be
   specified in the properites using the property
   **pegasus.aws.account**.

**-a** *AWS region*; \ **--account**\ =\ *AWS region*
   the AWS region in which the S3 bucket and other batch entitites
   required by AWS batch exist. Can also be specified in the properites
   using the property **pegasus.aws.region**.

**-c**; \ **--create**
   Only create the batch entities specified by -j,--ce,-q,--s3 options

   1. Don’t run any jobs.

**-d**; \ **--delete**
   Delete the batch entities specified by -j,--ce,-q,--s3 options

   1. Don’t run any jobs.

**-p** *prefix*; \ **--prefix**\ =\ *prefix*
   The prefix to use for naming the batch entities created. Default
   suffixes -job-definition, -compute-environment, -job-queue, -bucket
   are added depending on the batch entity being created.

**-j** *jsonfile or arn or job-definition name*; \ **--job-definition** *jsonfile or arn or job-definition name*
   the json file containing job definition specification to register for
   executing jobs or the ARN of existing job definition or basename of
   an existing job definition. The JSON file format is same as the AWS
   Batch format
   https://docs.aws.amazon.com/batch/latest/userguide/job-definition-template.html
   A sample job definition file is listed in the configuration section.

   The value for this option can also be specified in the properites
   using the property **pegasus.aws.batch.job_definition**.

**--ce** *jsonfile or arn or compute environment name*; \ **--compute-environment** *jsonfile or arn or compute environment name*
   the json file containing compute environment specification to create
   in Amazon cloud for executing jobs or the ARN of existing compute
   environment or basename of an existing compute environment. The JSON
   file format is same as the AWS Batch format
   https://docs.aws.amazon.com/batch/latest/userguide/compute-environment-template.html
   A sample compute-environment file is listed in the configuration
   section.

   The value for this option can also be specified in the properites
   using the property **pegasus.aws.batch.compute_environment**.

**-q** *jsonfile or arn or job queue name*; \ **--job-queue** *jsonfile or arn or job queue name*
   the json file containing job queue specification to create in Amazon
   cloud for managing jobs. The queue is associated with the compute
   environment on which the jobs are run, or basename of an existing job
   queue. The JSON file format is same as the AWS Batch format
   https://docs.aws.amazon.com/batch/latest/userguide/job-queue-template.html
   A sample job-queue file is listed in the configuration section.

   The value for this option can also be specified in the properites
   using the property **pegasus.aws.batch.job_queue**.

**-s** *s3 URL*; \ **--s3** *s3 URL*
   The S3 bucket to use for lifecycle of the client. If not specifed
   then a bucket is created based on the prefix passed.

   The value for this option can also be specified in the properites
   using the property **pegasus.aws.batch.s3_bucket**.

**-f** *file*\ [,*file*,…]; \ **--files** *file*\ [,*file*,…]
   A comma separated list of files that need to be copied to the
   associated s3 bucket before any task starts.

*job submit file* A JSON formatted file that contains the job
description of the jobs that need to be executed. A sample job
description file is listed in the configuration section.

.. _AWS_CONFIGURATION:

Configuration
=============

Each user should specify a configuration file that **pegasus-aws-batch**
will use to authentication tokens. It is the same as standard Amazon EC2
credentials file and default Amazon search path semantics apply.


Sample File
-----------

$ cat ~/.aws/credentials

aws_access_key_id = XXXXXXXXXXXX aws_secret_access_key = XXXXXXXXXXX


Configuration Properties
------------------------

**endpoint** (site)
   The URL of the web service endpoint. If the URL begins with *https*,
   then SSL will be used.

**pegasus.aws.account** (aws account) The AWS region to use. Can alse be
specified by -a option.

**pegasus.aws.region** (region) The AWS region to use. Can alse be
specified by -r option.

**pegasus.aws.batch.job_definition** (the json file or existing ARN or
basename) Can alse be specified by -j option.

**pegasus.aws.batch.compute_environment** (the json file or existing ARN
or basename) Can alse be specified by --ce option.

**pegasus.aws.batch.job_queue** (the json file or existing ARN or
basename) Can alse be specified by -q option.

**pegasus.aws.batch.s3_bucket** (the S3 URL) Can alse be specified by
--s3 option.


Example JSON Files
------------------

Example JSON files are listed below


Job Definition File
===================

A sample job definition file. Update to reflect your settings.

::

   $ cat  sample-job-definition.json

   {
     "containerProperties": {
       "mountPoints": [],
       "image": "XXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/awsbatch/fetch_and_run",
       "jobRoleArn": "batchJobRole"  ,
       "environment": [ {
               "name": "PEGASUS_EXAMPLE",
               "value": "batch-black"
            }],
       "vcpus": 1,
       "command": [
         "/bin/bash",
         "-c",
         "exit $AWS_BATCH_JOB_ATTEMPT"
       ],
       "volumes": [],
       "memory": 500,
       "ulimits": []
     },
     "retryStrategy": {
       "attempts": 1
     },
     "parameters": {},
     "type": "container"
   }


Compute Environment File
========================

A sample job definition file. Update to reflect your settings.

::

   $ cat conf/sample-compute-env.json
   {

     "state": "ENABLED",
     "type": "MANAGED",
     "computeResources": {
       "subnets": [
         "subnet-a9bb63cc"
       ],
       "type": "EC2",
       "tags": {
         "Name": "Batch Instance - optimal"
       },
       "desiredvCpus": 0,
       "minvCpus": 0,
       "instanceTypes": [
         "optimal"
       ],
       "securityGroupIds": [
         "sg-91d645f4"
       ],
       "instanceRole": "ecsInstanceRole" ,
       "maxvCpus": 2,
       "bidPercentage": 20
     },
     "serviceRole": "AWSBatchServiceRole"
   }


Job Queue File
==============

A sample job definition file. Update to reflect your settings.

::

   $  cat conf/sample-job-queue.json
   {
     "priority": 10,
     "state": "ENABLED",
     "computeEnvironmentOrder": [
       {
         "order": 1
       }
     ]
   }


Job Submit File
===============

A sample job submit file that lists the bag of jobs that need to be
executed on AWS Batch

::

   $ cat merge_diamond-findrange-4_0_PID2_ID1.in
   {
     "SubmitJob" : [ {
       "jobName" : "findrange_ID0000002",
       "executable" : "pegasus-aws-batch-launch.sh",
       "arguments" : "findrange_ID0000002.sh",
       "environment" : [ {
         "name" : "S3CFG_aws_batch",
         "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/.s3cfg"
       }, {
         "name" : "TRANSFER_INPUT_FILES",
         "value" : "/scitech/input/pegasus-worker-4.9.0dev-x86_64_rhel_7.tar.gz,/scitech/input/00/00/findrange_ID0000002.sh"
       }, {
         "name" : "BATCH_FILE_TYPE",
         "value" : "script"
       }, {
         "name" : "BATCH_FILE_S3_URL",
         "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/pegasus-aws-batch-launch.sh"
       } ]
     }, {
       "jobName" : "findrange_ID0000003",
       "executable" : "pegasus-aws-batch-launch.sh",
       "arguments" : "findrange_ID0000003.sh",
       "environment" : [ {
         "name" : "S3CFG_aws_batch",
         "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/.s3cfg"
       }, {
         "name" : "TRANSFER_INPUT_FILES",
         "value" : "/scitech/input/pegasus-worker-4.9.0dev-x86_64_rhel_7.tar.gz,/scitech/input/00/00/findrange_ID0000003.sh"
       }, {
         "name" : "BATCH_FILE_TYPE",
         "value" : "script"
       }, {
         "name" : "BATCH_FILE_S3_URL",
         "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/pegasus-aws-batch-launch.sh"
       } ]
     } ]
   }


File Transfers
==============

The tool allows you to upload files to the associated S3 bucket from the
local filesystem in two ways. a. Common Files Required For All Jobs

+ You can the command line option **--files** to give a comma separated
  list of files to transfer.

+ b. TRANSFER_INPUT_FILES Environment Variable

+ You can also associate in the job submit a file, an enviornment
  variable named **TRANSFER_INPUT_FILES** for each job that the tool will
  transfer at the time of job submission. The value for the environment
  variable is a comma separated list of files.


Return Value
============

**pegasus-aws-batch** returns a zero exist status if the operation is
successful. A non-zero exit status is returned in case of failure. If
you run any jobs using the tool, then tool will return with a non zero
exitcode in case one or more of your tasks fail.