11.2. pegasus-aws-batch

pegasus-aws-batch - a client to run tasks on Amazon AWS Batch.

pegasus-aws-batch [-h]
                    [-C propsfile]
                    [-L  <error, warn, info, debug, trace>]
                    [-l logfile]
                    [--merge-logs prefix]
                    [-a AWS account id]
                    [-r AWS region]
                    [--create]
                    [--delete]
                    [-p prefix]
                    [-j jsonfile or arn or job-definition name]
                    [--ce jsonfile or arn or compute-environment name]
                    [-q jsonfile or arn or queue name]
                    [-s  s3 bucket URL]
                    [-f file1[,file2 …]]
                    job submit file

11.2.1. Description

pegasus-aws-batch a client to run tasks on Amazon AWS Batch. Also allows you to create and delete entities such as job definition, compute environment and job queue required by AWS Batch Service. The tool also allows you to upload files from your local machine to the S3 bucket specified on the command line or the properties. This allows you to ship data to S3 that your jobs running in AWS Batch require. The tool will automatically fetch the stdout of your jobs from the CloudWatch Logs and place it on the local filesystem.

11.2.2. Options

-h; –help

Show help message for subcommand and exit

-C propsfile; –conf=propsfile

Path to the properties file containing the properties to configure the too.

-L <error, warn, info, debug, trace>; -log-level <error, warn, info, debug, trace>

Sets the log level for the tool.

-l logfile; –log-file=logfile

Path to the file where you want the client to log to. By default, client logs to it’s stdout and stderr.

-m prefix; –merge-logs=prefix

By default, the tool pulls down the task stdout and stderr to separate files with the name being determined by the job name specified in the task submit file. The prefix is used for merging all the tasks stdout to a single file starting with the name prefix and ending in .out. Similar behavior is applied for the tasks stderr.

-a AWS account id; –account=’AWS account id ‘

the AWS account to use for running jobs on AWS Batch. Can also be specified in the properites using the property pegasus.aws.account.

-a AWS region; –account=AWS region

the AWS region in which the S3 bucket and other batch entitites required by AWS batch exist. Can also be specified in the properites using the property pegasus.aws.region.

-c; –create

Only create the batch entities specified by -j,–ce,-q,–s3 options

  1. Don’t run any jobs.

-d; –delete

Delete the batch entities specified by -j,–ce,-q,–s3 options

  1. Don’t run any jobs.

-p prefix; –prefix=prefix

The prefix to use for naming the batch entities created. Default suffixes -job-definition, -compute-environment, -job-queue, -bucket are added depending on the batch entity being created.

-j jsonfile or arn or job-definition name; –job-definition jsonfile or arn or job-definition name

the json file containing job definition specification to register for executing jobs or the ARN of existing job definition or basename of an existing job definition. The JSON file format is same as the AWS Batch format https://docs.aws.amazon.com/batch/latest/userguide/job-definition-template.html A sample job definition file is listed in the configuration section.

The value for this option can also be specified in the properites using the property pegasus.aws.batch.job_definition.

–ce jsonfile or arn or compute environment name; –compute-environment jsonfile or arn or compute environment name

the json file containing compute environment specification to create in Amazon cloud for executing jobs or the ARN of existing compute environment or basename of an existing compute environment. The JSON file format is same as the AWS Batch format https://docs.aws.amazon.com/batch/latest/userguide/compute-environment-template.html A sample compute-environment file is listed in the configuration section.

The value for this option can also be specified in the properites using the property pegasus.aws.batch.compute_environment.

-q jsonfile or arn or job queue name; –job-queue jsonfile or arn or job queue name

the json file containing job queue specification to create in Amazon cloud for managing jobs. The queue is associated with the compute environment on which the jobs are run, or basename of an existing job queue. The JSON file format is same as the AWS Batch format https://docs.aws.amazon.com/batch/latest/userguide/job-queue-template.html A sample job-queue file is listed in the configuration section.

The value for this option can also be specified in the properites using the property pegasus.aws.batch.job_queue.

-s s3 URL; –s3 s3 URL

The S3 bucket to use for lifecycle of the client. If not specifed then a bucket is created based on the prefix passed.

The value for this option can also be specified in the properites using the property pegasus.aws.batch.s3_bucket.

-f file[,*file*,…]; –files file[,*file*,…]

A comma separated list of files that need to be copied to the associated s3 bucket before any task starts.

job submit file A JSON formatted file that contains the job description of the jobs that need to be executed. A sample job description file is listed in the configuration section.

11.2.3. Configuration

Each user should specify a configuration file that pegasus-aws-batch will use to authentication tokens. It is the same as standard Amazon EC2 credentials file and default Amazon search path semantics apply.

11.2.3.1. Sample File

$ cat ~/.aws/credentials

aws_access_key_id = XXXXXXXXXXXX aws_secret_access_key = XXXXXXXXXXX

11.2.3.2. Configuration Properties

endpoint (site)

The URL of the web service endpoint. If the URL begins with https, then SSL will be used.

pegasus.aws.account (aws account) The AWS region to use. Can alse be specified by -a option.

pegasus.aws.region (region) The AWS region to use. Can alse be specified by -r option.

pegasus.aws.batch.job_definition (the json file or existing ARN or basename) Can alse be specified by -j option.

pegasus.aws.batch.compute_environment (the json file or existing ARN or basename) Can alse be specified by –ce option.

pegasus.aws.batch.job_queue (the json file or existing ARN or basename) Can alse be specified by -q option.

pegasus.aws.batch.s3_bucket (the S3 URL) Can alse be specified by –s3 option.

11.2.3.3. Example JSON Files

Example JSON files are listed below

11.2.4. Job Definition File

A sample job definition file. Update to reflect your settings.

$ cat  sample-job-definition.json

{
  "containerProperties": {
    "mountPoints": [],
    "image": "XXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/awsbatch/fetch_and_run",
    "jobRoleArn": "batchJobRole"  ,
    "environment": [ {
            "name": "PEGASUS_EXAMPLE",
            "value": "batch-black"
         }],
    "vcpus": 1,
    "command": [
      "/bin/bash",
      "-c",
      "exit $AWS_BATCH_JOB_ATTEMPT"
    ],
    "volumes": [],
    "memory": 500,
    "ulimits": []
  },
  "retryStrategy": {
    "attempts": 1
  },
  "parameters": {},
  "type": "container"
}

11.2.5. Compute Environment File

A sample job definition file. Update to reflect your settings.

$ cat conf/sample-compute-env.json
{

  "state": "ENABLED",
  "type": "MANAGED",
  "computeResources": {
    "subnets": [
      "subnet-a9bb63cc"
    ],
    "type": "EC2",
    "tags": {
      "Name": "Batch Instance - optimal"
    },
    "desiredvCpus": 0,
    "minvCpus": 0,
    "instanceTypes": [
      "optimal"
    ],
    "securityGroupIds": [
      "sg-91d645f4"
    ],
    "instanceRole": "ecsInstanceRole" ,
    "maxvCpus": 2,
    "bidPercentage": 20
  },
  "serviceRole": "AWSBatchServiceRole"
}

11.2.6. Job Queue File

A sample job definition file. Update to reflect your settings.

$  cat conf/sample-job-queue.json
{
  "priority": 10,
  "state": "ENABLED",
  "computeEnvironmentOrder": [
    {
      "order": 1
    }
  ]
}

11.2.7. Job Submit File

A sample job submit file that lists the bag of jobs that need to be executed on AWS Batch

$ cat merge_diamond-findrange-4_0_PID2_ID1.in
{
  "SubmitJob" : [ {
    "jobName" : "findrange_ID0000002",
    "executable" : "pegasus-aws-batch-launch.sh",
    "arguments" : "findrange_ID0000002.sh",
    "environment" : [ {
      "name" : "S3CFG_aws_batch",
      "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/.s3cfg"
    }, {
      "name" : "TRANSFER_INPUT_FILES",
      "value" : "/scitech/input/pegasus-worker-4.9.0dev-x86_64_rhel_7.tar.gz,/scitech/input/00/00/findrange_ID0000002.sh"
    }, {
      "name" : "BATCH_FILE_TYPE",
      "value" : "script"
    }, {
      "name" : "BATCH_FILE_S3_URL",
      "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/pegasus-aws-batch-launch.sh"
    } ]
  }, {
    "jobName" : "findrange_ID0000003",
    "executable" : "pegasus-aws-batch-launch.sh",
    "arguments" : "findrange_ID0000003.sh",
    "environment" : [ {
      "name" : "S3CFG_aws_batch",
      "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/.s3cfg"
    }, {
      "name" : "TRANSFER_INPUT_FILES",
      "value" : "/scitech/input/pegasus-worker-4.9.0dev-x86_64_rhel_7.tar.gz,/scitech/input/00/00/findrange_ID0000003.sh"
    }, {
      "name" : "BATCH_FILE_TYPE",
      "value" : "script"
    }, {
      "name" : "BATCH_FILE_S3_URL",
      "value" : "s3://pegasus-batch-bamboo/mybatch-bucket/run0001/pegasus-aws-batch-launch.sh"
    } ]
  } ]
}

11.2.8. File Transfers

The tool allows you to upload files to the associated S3 bucket from the local filesystem in two ways. a. Common Files Required For All Jobs

  • You can the command line option –files to give a comma separated list of files to transfer.

    1. TRANSFER_INPUT_FILES Environment Variable

  • You can also associate in the job submit a file, an enviornment variable named TRANSFER_INPUT_FILES for each job that the tool will transfer at the time of job submission. The value for the environment variable is a comma separated list of files.

11.2.9. Return Value

pegasus-aws-batch returns a zero exist status if the operation is successful. A non-zero exit status is returned in case of failure. If you run any jobs using the tool, then tool will return with a non zero exitcode in case one or more of your tasks fail.