The NIMH Center for Collaborative Genomic Studies on Mental Disorders was established through the NIMH Human Genetics Initiative in 1998 to leverage and increase the value of human genetic samples and data produced through NIMH funded research. NIMH Repository and Genomics Resource (NIMH-RGR) plays a key role in facilitating psychiatric genetic research by providing a collection of over 150,000 well characterized, high quality patient and control samples from a wide-range of mental disorders.

NRGR Quality Control –
Pegasus Workflow

NRGR Quality Control System (QC)

Prior to the development of the QC workflow, researchers would submit data in a various formats, via Emails. Thus making the task for curating the data tedious and error prone. The QC system standardized the process of data submissions, based on following design decisions –

Enable users to submit data through the web.
Automated data checks and real-time error reporting.
Standardize file types for submissions; MS Excel compatible CSV/TSV files.
Standardize file format for defining data-dictionary.
Standardize file templates using system defined data-dictionaries for well known data, and user provided data, data-dictionaries for phenotypic (free-form) data.
Manual curation only for advanced checks

Automated Checks

File type checks; Is the file a valid CSV/TSV?
Syntax checks; Are all columns present, and populated with the correct data-types, etc.
Semantic checks; Is an individual’s age greater than the age of onset, etc.
Pedigree checks; Is an individual identifies as a father a Male, etc.
Checks against centralized repository database.


The NRGR QC System has streamlined the process of data submissions, significantly reducing the time for curation and leading to faster publishing of data.