The Metadata Catalog Service (MCS)

Overview

MCS is a metadata catalog service that stores descriptive information (metadata) about logical data items. MCS has been developed as part of the Grid Physics Network (GriPhyN) and NVO projects. The aim of these projects is to support large-scale scientific experiments.  MCS is a standalone catalog that stores information about logical data items (e.g., files). It also allows users to aggregate the data items into collections MCS provides system-defined as well as user-defined attributes for logical items and collections. One distinguishing characteristic of MCS is that users can dynamically define and add metadata attributes. MCS can also provide the names of the user-defined attributes.  As a result, different MCS instances can be created with alternative contents. MCS have been implemented to run on top of standard web services or on top of the OGSA-DAI grid service. In the latter case MCS leverages the OGSA-DAI's authentication capabilities to provide secure access to the metadata.

MCS may be used for storing and accessing metadata about logical files. A logical file uniquely identifies the content of a file. MCS can be used in conjunction with the Globus Replica Location Service (RLS) that locates the physical instances of logical files. Among the attributes of logical files are: file creator, creating timestamp, etc.

The figure below illustrates the simplest scenario for attribute-based data access via the MCS and the other components of the data grid, including the Replica Location Service and the particular storage system where the data resides.

 

Usage Scenario

  1. The client application (or a request planner tasked with satisfying a client’s request) queries the Metadata Service based on some attributes of the desired data.
  2. The MCS returns the logical file names of one or more data items corresponding to those attributes.
  3. The client application next queries the Replica Location Service with the logical file names.
  4. The RLS returns the physical file names of the requested files to the client application.
  5. The client application then contacts the physical storage system where the file resides (using GridFTP for example)
  6. The file is returned by the physical storage system.

Publications

Grid-Based Metadata Services, Ewa Deelman, Gurmeet Singh, Malcolm P. Atkinson, Ann Chervenak, Neil P Chue Hong, Carl Kesselman, Sonal Patil, Laura Pearlman, Mei-Hui Su, 16th International Conference on Scientific and Statistical Database Management (SSDBM04), 21-23 June 2004 Santorini Island Greece

Artemis: Integrating Scientific Data on the Grid, Rattapoom Tuchinda, Snehal Thakkar, Yolanda Gil Ewa Deelman. In Proceedings of the 16th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), July 25-29, San Jose, CA

A Metadata Catalog Service for Data Intensive Applications, by Gurmeet Singh, Shishir Bharathi,Ann Chervenak, Ewa Deelman, Carl Kesselman, Mary Manohar, Sonal Patil, and Laura Pearlman.  SC 2003.

Talks

"Metadata Catalog Service for Data Intensive Applications"--talk given by Gurmeet Singh at SC 2003.

"A Metadata Catalog Service for Data Intensive Applications"-- talk given by Ewa Deelman, January 2003.

Software

The code for MCS is available at http://gaul.isi.edu/mcs

The Java API description for the first implementation by Gurmeet Singh can be found here