org.griphyn.cPlanner.cluster.aggregator
Class MPIExec

java.lang.Object
  extended by org.griphyn.cPlanner.cluster.aggregator.Abstract
      extended by org.griphyn.cPlanner.cluster.aggregator.MPIExec
All Implemented Interfaces:
JobAggregator

public class MPIExec
extends Abstract

This class aggregates the smaller jobs in a manner such that they are launched at remote end, by mpiexec on n nodes where n is the nodecount associated with the aggregated job that is being lauched by mpiexec. The executable mpiexec is a VDS tool distributed in the VDS worker package, and can be usually found at $PEGASUS_HOME/bin/mpiexec.

Version:
$Revision: 450 $
Author:
Karan Vahi vahi@isi.edu

Field Summary
static String COLLAPSE_LOGICAL_NAME
          The logical name of the transformation that is able to run multiple jobs sequentially.
 
Fields inherited from class org.griphyn.cPlanner.cluster.aggregator.Abstract
DERIVATION_NAMESPACE, DERIVATION_VERSION, FAT_JOB_PREFIX, mBag, mClusteredADag, mDirectory, mGridStartFactory, mLogger, mProps, mSiteHandle, mTCHandle, TRANSFORMATION_NAMESPACE, TRANSFORMATION_VERSION
 
Fields inherited from interface org.griphyn.cPlanner.cluster.JobAggregator
VERSION
 
Constructor Summary
MPIExec()
          The default constructor.
 
Method Summary
 boolean abortOnFristJobFailure()
          Returns a boolean indicating whether to fail the aggregated job on detecting the first failure during execution of constituent jobs.
 String aggregatedJobArguments(AggregatedJob job)
          Returns the arguments with which the AggregatedJob needs to be invoked with.
 AggregatedJob construct(List jobs, String name, String id)
          Constructs a new aggregated job that contains all the jobs passed to it.
protected  AggregatedJob enable(AggregatedJob mergedJob, List jobs)
          Enables the constitutent jobs that make up a aggregated job.
 boolean entryNotInTC(String site)
          Determines whether there is NOT an entry in the transformation catalog for the job aggregator executable on a particular site.
 String getCollapserLFN()
          Returns the logical name of the transformation that is used to collapse the jobs.
 void initialize(ADag dag, PegasusBag bag)
          Initializes the JobAggregator impelementation
 void setAbortOnFirstJobFailure(boolean fail)
          Setter method to indicate , failure on first consitutent job should result in the abort of the whole aggregated job.
 
Methods inherited from class org.griphyn.cPlanner.cluster.aggregator.Abstract
construct, entryNotInTC, getCompleteTranformationName, getTCEntry, setDirectory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COLLAPSE_LOGICAL_NAME

public static final String COLLAPSE_LOGICAL_NAME
The logical name of the transformation that is able to run multiple jobs sequentially.

See Also:
Constant Field Values
Constructor Detail

MPIExec

public MPIExec()
The default constructor.

Method Detail

initialize

public void initialize(ADag dag,
                       PegasusBag bag)
Initializes the JobAggregator impelementation

Specified by:
initialize in interface JobAggregator
Overrides:
initialize in class Abstract
Parameters:
dag - the workflow that is being clustered.
bag - the bag of objects that is useful for initialization.

construct

public AggregatedJob construct(List jobs,
                               String name,
                               String id)
Constructs a new aggregated job that contains all the jobs passed to it. The new aggregated job, appears as a single job in the workflow and replaces the jobs it contains in the workflow.

The aggregated job is executed at a site, using mpiexec that executes each of the smaller jobs in the aggregated job on n number of nodes where n is the nodecount associated with the job. All the sub jobs are in turn launched via kickstart if kickstart is installed at the site where the job resides.

Specified by:
construct in interface JobAggregator
Overrides:
construct in class Abstract
Parameters:
jobs - the list of SubInfo objects that need to be collapsed. All the jobs being collapsed should be scheduled at the same pool, to maintain correct semantics.
name - the logical name of the jobs in the list passed to this function.
id - the id that is given to the new job.
Returns:
the AggregatedJob object corresponding to the aggregated job containing the jobs passed as List in the input, null if the list of jobs is empty

enable

protected AggregatedJob enable(AggregatedJob mergedJob,
                               List jobs)
Enables the constitutent jobs that make up a aggregated job. Makes sure that they all are enabled via no kickstart

Specified by:
enable in class Abstract
Parameters:
mergedJob - the clusteredJob
jobs - the constitutent jobs
Returns:
AggregatedJob

getCollapserLFN

public String getCollapserLFN()
Returns the logical name of the transformation that is used to collapse the jobs.

Returns:
the the logical name of the collapser executable.
See Also:
COLLAPSE_LOGICAL_NAME

entryNotInTC

public boolean entryNotInTC(String site)
Determines whether there is NOT an entry in the transformation catalog for the job aggregator executable on a particular site.

Parameters:
site - the site at which existence check is required.
Returns:
boolean true if an entry does not exists, false otherwise.

aggregatedJobArguments

public String aggregatedJobArguments(AggregatedJob job)
Returns the arguments with which the AggregatedJob needs to be invoked with. At present any empty argument string is returned.

Specified by:
aggregatedJobArguments in class Abstract
Parameters:
job - the AggregatedJob for which the arguments have to be constructed.
Returns:
argument string

setAbortOnFirstJobFailure

public void setAbortOnFirstJobFailure(boolean fail)
Setter method to indicate , failure on first consitutent job should result in the abort of the whole aggregated job. Ignores any value passed, as MPIExec does not handle it for time being.

Parameters:
fail - indicates whether to abort or not .

abortOnFristJobFailure

public boolean abortOnFristJobFailure()
Returns a boolean indicating whether to fail the aggregated job on detecting the first failure during execution of constituent jobs.

Returns:
boolean indicating whether to fail or not.


Copyright © 2007 The University of Southern California. All Rights Reserved.