Support Architecture for Large-Scale Subsurface Analysis (SALSSA)
Overview
In this SciDAC SAP, we have developed a user environment that integrates data management, workflow, and visualization tools to support model execution and analysys. The framework has been used to run both the Subsurface Transport Over Multiple Phases (STOMP) and Smoothed Particle Hydrodynamics (SPH) codes though the framework itself is generic and can be applied to other models. We are leveraging technologies developed by SciDAC centers including the Kepler workflow engine (SDM Center) for job execution and data staging, and the Visit visualization system from VACET. The framework includes a data and provenance tracking system to keep track of the simulations, the inputs and outputs, and the analyses. Large output files can be sent to an archive or referred to via their URIs. SALSSA can be thought of as both an activity tracking system and a dashboard that summarizes modeling and analysis activity.
Key Capabilities
Please see our latest poster for more information.- Tool Integration
SALSSA has a general mechanism for integrating tools through a registry. This applies both to simulators and analysis or setup preparation tools.
- Models: STOMP and SPH
- Editors: desktop editors
- Analysis tools (see below)
- Scripts
- Job Launching
- File stating and execution to workstations and queued machines
- Job monitoring
- Easily extended to support new queueing systems
- Execution of multiple parallel jobs
- Simple load balancing of multiple jobs across multiple machines
- Terminate jobs
- Archive of simulations
- Analysis We primarily integrate off the shelf and open source visualization and analysis tools but include some simply tightly integarted plotting capabilities for real-time monitoring.
- Parallel rendering (via VisIt)
- Plotting via GNU plot
- Realtime monitoring of output file plots
- GMV and TecPlot
- Data Mangement SALSSA employs a two tier architecture for data and metadata management. We automatically keep track of user activites and associated data files, metadata, and provenance. The SALSSA organizer presents different views of the data store.
- Tracks all inputs and outputs of each activity
- Tracks relationships between activities
- Supports multiple views of the data including provenance graphs, tables of simulations, and context views of individual simulations
- Tracks large simulation outputs via archive references
- Extensible metadata extraction through registry or python scripts that can be applied to any file type
- Provenance model base on the Open Provenance Model (OPM)
SALSSA Software Releases
Several development versions of the SALSSA data management and workflow environment have been released. These releases are available for download and each is briefly described in the release summaries below. SALSSA releases previous to version 3.0 require a data management system to be installed and because of this are only available on site for project staff and collaborators. SALSSA, starting with version 3.0, can be run without a shared data managment system.
SALSSA 3.0 (May 2011)
This release added the capability for task parallel job execution. A set of simulations can now be submitted as a single job where all simulations are run independenly. A benefit of this approach is that a single job is placed in the queue and latency of waiting for individual jobs to get their turn in the queue is avoided. A spreadsheet interface for SPH execution was created, suporting the setup of a multi-simulation SPH job within the SALSSA Organizer. Also new to this version is the way in which SALSSA software is provided. izpack is now being used to package SALSSA software and generate a single installer application for each supported platform, greatly simplifying the distribution and installation process. Other features developed for this release include the capability to run SALSSA in an offline mode removing the requirement for a shared data management server, execution on NERSC systems, upgrade to the Kepler workfow software, improvements to archive/unarchive, and documentation was created describing how to add your own simulation code to SALSSA, see Adding a New Code.SALSSA 2.4 (Jan 2010)
This release added context panels for individual simulations. From these panels, a user can easily get a realtime plot of the simulation progress, see a summary of the input parameters, job status, and output file locations. Other features developed for this release include a Parameter study editor for the SPH code, the capability to reconnect to jobs in case of failure of you shut your system down, and support for archiving large simulation outputs.SALSSA 2.2 (Feb 09)
This version of SALSSA was used to perform the Smoothed Particle Hydrodynamics Validation Study. New features include:- Support for SPH Slit, Pore, Cylinder codes
- Job launching to queued machines
SALSSA 2.0 (Sep 08)
This prototype provides a graph-based view of activities. The activities can be chained together to an arbitrary depth and complexity. Jobs can be run to multiple workstations. The data management and provenance is based fully on rdf relationships. SALSSA v2.0 has been tested by running several of the simulations described on the Benchmark Application page, and has been provided to Idaho National Laboratory collaborators for use in performing additional design simulations.SALSSA 1.0 (Jan 08)
An initial prototype with the capability to set up STOMP parameter studies, run jobs to workstations, and view the runs, jobs, and job files.Additional Information
Project Contributors
- Karen Schuchardt, Pacific Northwest National Laboratory (PI) karen.schuchardt@pnl.gov
- Jeff Daily, Pacific Northwest National Laboratory
- Todd Elsethagen, Pacific Northwest National Laboratory
- Jared Chase, Pacific Northwest National Laboratory
- Khushbu Agarwal, Pacific Northwest National Laboratory
- Vicky Freedman, Pacific Northwest National Laboratory
Past Contributors
- Lisong Sun, Pacific Northwest National Laboratory
- Gary Black, Pacific Northwest National Laboratory