GMP Data Warehouse
Data Management Console

The GMP DWH Data Management Console (DMC) is a collaborative working environment for the members of Regional Organization Groups, members of the Stockholm Convention Secretariat, involved data providers, and GMP DWH data managers.

GMP Data Warehouse Data Management Console

DMC is implemented as a web-based application accessible over the internet using a standard internet browser. The application is password protected and is available only to users identified by ROGs and the Secretariat.

Open Data Management Console

Main features of DMC

Sampling sites catalogue

Distribution of Import templates

Data imports (= data acquisition)

Automatic data validation

Data harmonisation

Data management

Approval management

User management

DMC persists all its data directly into GMP DWH Database and provides links directly into GMP DWH Data Visualizations for convenience and efficient work of its users.

Sampling sites catalogue

A list of all sampling sites is available in the form of a searchable catalogue. It contains all sites from GMP1, GMP2 and new sites from GMP3. One sampling site is considered unique if it has a unique name, material and monitoring network attributes. New records of new sampling sites are created automatically during the data import process.

Propper identification of sampling sites is an essential aspect of data continuity and long-term trend identification. In case data are assigned to multiple accidentally duplicated sampling sites, it would not be possible to calculate the single time trend of chemical concentration which is the main objective of the Global Monitoring Plan. For this reason, it is essential to have a well-curated and managed sampling sites catalogue.

...
...

Propper identification of sampling sites is an essential aspect of data continuity and long-term trend identification. In case data are assigned to multiple accidentally duplicated sampling sites, it would not be possible to calculate the single time trend of chemical concentration which is the main objective of the Global Monitoring Plan. For this reason, it is essential to have a well-curated and managed sampling sites catalogue.

A set of required attributes of a new sampling site is defined within the import template for each core media and described in a dedicated import template.

Import templates

Data import templates are used to force data providers to import data in the same format using a standardised set of code lists.

Import templates are distributed as Excel sheet files with a predefined header, marked required columns and complete list of predefined values when required.

There are eight different templates – one for each combination of material (Air, Water, Blood, Milk) and data type (primary, aggregated). One data import (file) can contain a single record or several thousands of records.

Data structure reference

...
...

Data imports

Data are imported into GMP DWH via so called “data imports”. It is a base working unit to organise data processing, harmonisation, validation and approvals.

Data import is considered as an undividable unit and works as an envelope. In case a correction is needed, whole data import must be deleted or cancelled and imported again with fixed data records.

A data import consists of:

  • Metadata
    • Title and description
    • Material (Air, Water, Human blood, Human milk)
    • Data type (Primary, Aggregated)
    • Data provider identification – name and institution
    • Region (Africa, Asian and Pacific, CEE, GRULAC, WEOG, International waters)
    • Status (see State machine workflow of data import chapter)
    • Validation result
    • Notes of data manager
    • Time stamps – created at, updated at
  • Data
    • Data records stored in import templates and uploaded to the DMC
...
...

The life cycle of a data import

Data provider creates new data import. They fill in necessary metadata and uploads data in the form of an import template.

Automatic validations are started immediately as data are uploaded into the DMC. Results of automatic validations are shown to the user. In case there are some issues, the data import is excluded from further data processing and data provider is encouraged to fix identified problems.

Data imports of aggregated data are directly stored in GMP DWH Database in its part dedicated to aggregated data.

In case of primary data, data import is checked by data manager (see Validations by data manager chapter).

Fully validated primary data imports are processed via internal data processing pipelines to:

perform derivation of additional parameters (see Derivation of additional parameters chapter)

union of imported data with primary data from other data imports

calculate aggregations to make data publishable (see Data aggregations chapter)

All fully validated data imports are considered as approved by ROGs. In case ROG members decide otherwise, they can revoke the approval of individual data import and exclude all included data from further processing.

Approved data are loaded into GMP DWH Data Visualizations to be publicly available.

Automatic validations

All data imports are checked using set of automatic validations. Since Data Management Console can receive data in 8 different formats (4 matrices x 2 data types), there are 8 slightly different sets of validation rules. In general, validation rules consist of:

  • Check of occurrence required fields
  • Data type checks (numeric, date, textual…)
  • Compliance with standardised code list values

Validations by data managers

Role of data managers is crucial in the process of data collection. Using prepared set of QA/QC scripts they control:

  • Proper usage sampling sites within monitoring networks across all data imports
  • Identification of data duplications (multiple data imports of the same dataset by different data providers)
  • Identification of data overlaps (multiple data imports have some overlap)

In case there is an issue in the data import, data managers communicate directly with data providers and ROG members. At the same time data manager changes status of the data import fills in a note describing the problem.

QA & QC

All data imported into the GMP DWH are imported via Data imports in batches.

A single data import batch can contain data for a single core medium – either air or water or human blood or human milk.

One data import batch can only contain data of the same data type – primary data or aggregated data.

For each core medium and datatype must be used dedicated import template file must be used (we have 8 of them, see Data import templates).

Each data import batch can contain from a single record up to several thousands of records. The number of records in a data import batch is not limited.

It is considered as a good practice to create data import batches in some logical units – for example, data from one monitoring network in a medium as separate data import batch.

It is not possible to change individual records in a batch. Data providers upload data in batches. Data providers upload data in batches and they discard them or approve them as a batch.

To do amendments in already uploaded data, data provider erases the content of that wrong data import and uploads fixed data batch again.

Data providers upload data.

Automatic validations are performed by the DMC system.

In the case of primary data: GMP DWH Data manager performs s manual data harmonisation of imported data with the rest of the GMP database content – Data manager marks data import as Validated by the Data Manager if the batch is in conformity. In case of errors the Data Manager marks data import batch as Invalid by the Data Manager. Data provider then news to correct the batch.

Data from valid data imports are synchronised with GMP DWH Data Visualization module every 30 minutes. Both data providers and ROG members can see uploaded data. Personalised link to GMP DWH Data Visualization is provided at detail page of each data import.

Data provider marks data as VERIFIED = data import process is done.

The data provider can only cancel data import batch. Delete functionality is not supported in order to be able to track and audit all performed operations.

Transparency and responsibility - audit trails

GMP DWH Data Management Console tracks all operations to provide precise audit trail records when any of the following activities takes place:

  • Data import batch is created
  • Data is uploaded via a data import
  • Data import batch is edited, modified, or its status is changed
  • New sampling site is created
  • Any page of GMP DWH Data Management Console is viewed

Username, exact date and time and information about data import batch activity is recorded.