Pegasus WMS is designed to manage and automate the execution of complex workflows, including those involved in AI and machine learning. Its features make it well-suited for handling AI workflows, particularly in research and high throughput (HTC) and high-performance computing (HPC) environments.

Here’s how Pegasus WMS can be applied to AI workflows:

  1. Workflow Automation: Pegasus can automate the multiple stages of an AI pipeline, such as data preprocessing, model training, hyperparameter tuning, and evaluation. It handles dependencies between tasks, ensuring they execute in the correct order.
  2. Scalability: AI workflows, especially those involving deep learning, often require significant computational resources. Pegasus can distribute tasks across HPC clusters, cloud environments, or hybrid infrastructures.
  3. Reproducibility: Pegasus ensures reproducibility by maintaining logs, data lineage, and workflow descriptions. This is crucial for AI workflows where results need to be reproducible for validation.
  4. Resource Management: Pegasus integrates with resource schedulers like HTCondor, SLURM, or Kubernetes to manage compute resources efficiently.
  5. Data Management: AI workflows often involve large datasets. Pegasus can handle data staging, movement, and cleanup across different storage systems.
  6. Support for Diverse Environments: Pegasus workflows can be executed on local machines, clusters, or cloud platforms, making it adaptable to the needs of AI projects.


 


Example AI Workflow with Pegasus

Some of the following examples are standalone workflows, while some are specific for the ACCESS Pegasus platform.

LLM-RAG (Retrieval Augmented Generation)

LLM RAG (Large Language Model Retrieval-Augmented Generation) is a technique that enhances large language models by incorporating information retrieval mechanisms. It involves retrieving relevant information from a database or document corpus and combining it with the original query to provide additional context. This augmented input is then processed by the large language model to generate more accurate and contextually relevant responses. LLM RAG offers benefits such as improved accuracy, access to up-to-date information, and better contextual understanding, making it useful for applications like question answering, summarization, and conversational AI. This workflow interacts with a local LLM, first asking a question with out RAG, and then the same question with RAG. The output of the workflow consists of those replies.

[ACCESS Pegasus Example]

Lung Segmentation

A workflow that employs supervised learning techniques to locate lungs on X-ray images. Lung instance segmentation workflow uses Chest X-ray images for predicting lung masks from the images using U-Net model.

[GitHub] [ACCESS Pegasus Example]

Mask Detection

The workflow addresses the problem of determining what percentage of the population is properly wearing masks to better track our collective efforts in preventing the spread of COVID-19 in public spaces. It uses the FastRCNNPredictor deep learning model to detect masks on faces.

[GitHub] [ACCESS Pegasus Example]

Orca Sound

This workflow processes and analyzes the hydrophone data of sensors from three locations in the state of Washington, and uses trained machine learning models to automatically identify the whistles of the Orcas.

[GitHub] [ACCESS Pegasus Example]