The GMP DWH Data Management Console (DMC) is a collaborative working environment for the members of Regional Organization Groups, members of the Stockholm Convention Secretariat, involved data providers, and GMP DWH data managers.
Sampling sites catalogue
Distribution of Import templates
Data imports (= data acquisition)
Automatic data validation
Data harmonisation
Data management
Approval management
User management
DMC persists all its data directly into GMP DWH Database and provides links directly into GMP DWH Data Visualizations for convenience and efficient work of its users.
A list of all sampling sites is available in the form of a searchable catalogue. It contains all sites from GMP1, GMP2 and new sites from GMP3. One sampling site is considered unique if it has a unique name, material and monitoring network attributes. New records of new sampling sites are created automatically during the data import process.
Propper identification of sampling sites is an essential aspect of data continuity and long-term trend identification. In case data are assigned to multiple accidentally duplicated sampling sites, it would not be possible to calculate the single time trend of chemical concentration which is the main objective of the Global Monitoring Plan. For this reason, it is essential to have a well-curated and managed sampling sites catalogue.
Propper identification of sampling sites is an essential aspect of data continuity and long-term trend identification. In case data are assigned to multiple accidentally duplicated sampling sites, it would not be possible to calculate the single time trend of chemical concentration which is the main objective of the Global Monitoring Plan. For this reason, it is essential to have a well-curated and managed sampling sites catalogue.
A set of required attributes of a new sampling site is defined within the import template for each core media and described in a dedicated import template.
Data import templates are used to force data providers to import data in the same format using a standardised set of code lists.
Import templates are distributed as Excel sheet files with a predefined header, marked required columns and complete list of predefined values when required.
There are eight different templates – one for each combination of material (Air, Water, Blood, Milk) and data type (primary, aggregated). One data import (file) can contain a single record or several thousands of records.
Data are imported into GMP DWH via so called “data imports”. It is a base working unit to organise data processing, harmonisation, validation and approvals.
Data import is considered as an undividable unit and works as an envelope. In case a correction is needed, whole data import must be deleted or cancelled and imported again with fixed data records.
A data import consists of:
Data provider creates new data import. They fill in necessary metadata and uploads data in the form of an import template.
Automatic validations are started immediately as data are uploaded into the DMC. Results of automatic validations are shown to the user. In case there are some issues, the data import is excluded from further data processing and data provider is encouraged to fix identified problems.
Data imports of aggregated data are directly stored in GMP DWH Database in its part dedicated to aggregated data.
In case of primary data, data import is checked by data manager (see Validations by data manager chapter).
Fully validated primary data imports are processed via internal data processing pipelines to:
perform derivation of additional parameters (see Derivation of additional parameters chapter)
union of imported data with primary data from other data imports
calculate aggregations to make data publishable (see Data aggregations chapter)
All fully validated data imports are considered as approved by ROGs. In case ROG members decide otherwise, they can revoke the approval of individual data import and exclude all included data from further processing.
Approved data are loaded into GMP DWH Data Visualizations to be publicly available.
All data imports are checked using set of automatic validations. Since Data Management Console can receive data in 8 different formats (4 matrices x 2 data types), there are 8 slightly different sets of validation rules. In general, validation rules consist of:
Role of data managers is crucial in the process of data collection. Using prepared set of QA/QC scripts they control:
In case there is an issue in the data import, data managers communicate directly with data providers and ROG members. At the same time data manager changes status of the data import fills in a note describing the problem.
All data imported into the GMP DWH are imported via Data imports in batches.
A single data import batch can contain data for a single core medium – either air or water or human blood or human milk.
One data import batch can only contain data of the same data type – primary data or aggregated data.
For each core medium and datatype must be used dedicated import template file must be used (we have 8 of them, see Data import templates).
Each data import batch can contain from a single record up to several thousands of records. The number of records in a data import batch is not limited.
It is considered as a good practice to create data import batches in some logical units – for example, data from one monitoring network in a medium as separate data import batch.
It is not possible to change individual records in a batch. Data providers upload data in batches. Data providers upload data in batches and they discard them or approve them as a batch.
To do amendments in already uploaded data, data provider erases the content of that wrong data import and uploads fixed data batch again.
Data providers upload data.
Automatic validations are performed by the DMC system.
In the case of primary data: GMP DWH Data manager performs s manual data harmonisation of imported data with the rest of the GMP database content – Data manager marks data import as Validated by the Data Manager if the batch is in conformity. In case of errors the Data Manager marks data import batch as Invalid by the Data Manager. Data provider then news to correct the batch.
Data from valid data imports are synchronised with GMP DWH Data Visualization module every 30 minutes. Both data providers and ROG members can see uploaded data. Personalised link to GMP DWH Data Visualization is provided at detail page of each data import.
Data provider marks data as VERIFIED = data import process is done.
The data provider can only cancel data import batch. Delete functionality is not supported in order to be able to track and audit all performed operations.
GMP DWH Data Management Console tracks all operations to provide precise audit trail records when any of the following activities takes place:
Username, exact date and time and information about data import batch activity is recorded.