 Volta Basin Water Allocation System Workflow: Couples the Mike Basin and the GAMS models to improve the hydrologic simulation and economic optimization of the two models.
A new paradigm and services for efficient distributed execution of scientific data-intensive workflows in the GLOWA Volta Grid Infrastrcture
Motivation
GLOWA Volta decision support system integrates several heterogeneous distributed simulation systems and databases. These systems are invoked as parts of scientific workflows calculating optimal policies for sustainable management of natural resources. The practical evaluation showed that currently available workflow management systems still lack some facilities to be able to handle the complexity of the interaction between these heterogeneous resources. Therefore a new approach for the execution of such workflows was developed.
Workflow Execution Paradigm
Our approach is realized with a bundle of grid services for workflow management deployed in every grid node (see the figure). The bundle consists of four main services:
Task service: is responsible for the actual execution of the submitted task, caching the task's output, and notifying the submitting scheduler about the task status.
Task service is contacting the Mediator Catalog to retrieve mediators for transformation of task input and output data.
Data management service: is dedicated for reference-based data movement between nodes and data provenance. Here OGSA-DAI service exposes the generated service/application data as web services, which allows direct remote access to the data. The remote interaction of services located in different nodes is based on publish and subscribe notification events supported by the Web Services Notification framework.
Resource information service: provides information about resources involving both, relatively static information such as system configuration and more dynamic information such as instantaneous load.
Features
- Support for distributed execution of workflows.
- Diminution of communication traffic through reference-based data movement.
- Full control over long running applications.
- Dynamic data transformation.
- Support for smart re-run through data caching.
- Distributed fault handling and load balancing.
 Reference-based data movement and dynamic data transformation (applied where the data is located)
 Smart re-run (through data caching), distributed fault handling and load balancing
 Centralized Monitor (Scheduler), Distributed Execution (Task services on remote nodes)
Person in Charge: Mahmoud El-Gayyar
|