Program Data Set Connector & Integration Driver: Tracker-to-Aggregate Integration Across Dissimilar DHIS2 Instances


Goals

The goal of this guide is to highlight how to organize the effective use of the PDAC tool and Integration Driver tool send Tracker data from to one or more aggregate data sets across different systems.

Note that this Guide is for aggregating Tracker data between two systems. Replicating this for using PDAC on a single instance should be slightly more straightforward.

Overview

This document describes the process of implementing a complex metadata and data synchronization across multiple instances of DHIS2 that have similar metadata without matching UIDs. Furthermore, it describes how to use the Program Data Set Connector tool to aggregate tracker data and setup integration routes via BAO System’s Integration Driver to automate reporting to a centralized instance. 

Below is the general process for setting up this architecture.This document details the following process:

  1. Organize your process:
    1. Understand which server is the Source server (source of data) and which is the Target server (data being transferred to).
  2. Install and configure PDAC on the Source server.
  3. Ensure the Aggregate data set from the Target server is replicated onto the Source server
    1. UID’s for data set metadata must be the same.
    2. Org Units can be different but it is best practice to keep a standardized hierarchy.
  4. Use PDAC to create mappings for aggregating and creating dimensional disaggregation on the Source server’s tracker-based data.
  5. Schedule automated integration routes in BAO Manager to send data from the Source server to the Target server.

PDAC

About

The Program Data Set Connector automates the process of configuring DHIS2 to support data transfer from the tracker to the aggregate data model. It allows you to create custom disaggregations on program indicators, and use them as true data dimensions in DHIS2. PDAC uses the concept of category option filters to map a single program indicator value to multiple disaggregations for aggregate data.

Step-by-Step Mapping Configuration Instructions

  1. Install PDAC via the App Hub in the App Management app.
    1. Alternatively, download the latest version of PDAC HERE and load the .zip file into the App Management app via manual install.
  2. Open the app
  3. To aggregate the data from program indicators, select “Add Row”.
  4. Then, Configure the Mapping by selecting the data set, program indicator, and data element.
  5. Complete the filters for your category options according to the requirements of your configuration.
  6. Click “Update” for each filter to save it to the PDAC tool.
  7. Make sure all filters have a value.
  8. Click “Save” to save the mapping row.
  9. If you need to make changes, use the “Edit” button for the row you’d like to change.
  10. Removing row’s is possible via the “Delete” button in that row.
  11. Once you’re ready to generate the aggregate indicator necessary for the mappings, choose “Generate Mapping” for one or many rows.
  12. Verify success in the PDAC app, and then navigate to the Maintenance app to verify the creation of the appropriate indicator group.
  13. To verify the data for export, see the section below titled: "How it works: Data transfer process”.

When to use PDAC

The event reports and event visualiser apps are powerful and may suit your needs in some cases.Reasons the PDAC app might be needed:

  • Not just counting events
  • Cannot create filter in event reports
  • Want to show values alongside other data
  • Want to use the data visualiser app (additional chart types ect)
  • Custom disaggregations on numeric types
  • Data needs to be in aggregate

PDAC Installation

  • Using the latest stable release via the app hub
    • App Management
    • App Hub
    • Search “Program DataSet Connector”
    • The app is not currently in the app store for 2.36 due to a bug in DHIS2

PDAC Functionality

  1. Create a program indicator based of the original for every disaggregation
  2. Modify each program indicator filter so it only returns that disaggregation of the total
  3. Link each program indicator to a disaggregation on a data element
  4. DHIS2 will handle displaying these disaggregations as part of the total aggregate data element

Generating Program Indicator Disaggregations

The user provides three things:

  1. Data target: Aggregate data element
  2. Data source: Program indicator
  3. A filter for each category option on the data element

The app then does the following:

  1. Find the coc’s and aoc’s from the data element and data set
  2. Build PI filters for each disaggregation
  3. Create PIs from the source PI with each disaggregation
  4. Creates Indicators referencing each PI sn storing the appropriate AOC, COC, and DE uids.
  5. Finally, it also creates an indicator group with a url that tells you have to access the data

How it works: Aggregate from tracker

Step 1: Indicators to data elements

  • The app creates a custom attribute which adds a new field to indicators
    • The app puts the uid of the data element the indicator will map to
  • When exporting the indicator data, use outputIdScheme=ATTRIBUTE

Step 2: Indicators to disaggregations

  • By default, DHIS2 indicators and program indicators have two special fields:
    • Category option combination for aggregate data export
    • Attribute option combination for aggregate data export
  • When exporting indicator data, the indicator values will get category option combos and attribute option combos based on these fields.

Result:  Indicator Data element with disaggregations

How it works: Data transfer process

Step 1: Export the indicator data

  • The app creates an indicator group for each mapping which includes the url needed to export all the related data, simply add the periods and ous needed. For example, see the bold text below:
    • /api/analytics/dataValueSet.json?dimension=dx:IN_GROUP-XmPd1r6Lbqp&outputIdScheme=ATTRIBUTE:b8KbU93phhz&dimension=pe:LAST_MONTH&dimension=ou:EpNcvgH3Dld
  • Because of the configuration the app has created, the exported values will have the data element UIDs and disaggregations on them already.

Step 2: Import the data element data

  • The data is in the correct format, so can be immediately imported to the dataValueSets endpoint.

This process can be automated with the Integration driver (BAO tool), or via a custom cron service run from the back end.

PDAC Features

Add Row

Click to open the modal to CREATE the configuration mappings for Dataset:DataElement:ProgramIndicator (aka “Row”) and assign the filters that are used to create the indicators in the data set.

Edit

Click to open the modal to EDIT the configuration mappings for Dataset:DataElement:ProgramIndicator (aka “Row”) and assign the filters that are used to create the indicators in the data set.

Generate Mappings

Generate mappings creates a new indicator group based on the indicators that are generated by the configuration of the mapping rows. Mappings can be generated any number of times resulting in new indicator groups representing the same calculations. Be careful here. 

Delete

When clicking “Delete”, it will find all the metadata associated with a mapping row and it will remove it from the database.

PDAC Benefits

In depth program analysis

The single value program indicator can now be broken down by any of the applied disaggregations. Allowing you to see values by Age, Gender, Key population for example and any possible combination of these as true data dimensions in DHIS2. The disaggregations can be anything you like, if you can create a filter for it, it can be disaggregated using the app.

Help those analysing the data to spot trends and observe potential inequalities in the data more easily.

Faster analytics

Because the program indicator values have been stored against aggregate, they do not need to be calculated on the fly in the analytics apps, so pivots, charts and maps load much faster.

Planned improvements

There are still many plans for enhancements in the app, and using it on client projects has highlighted several key features which would be very useful:

  • Individual CO filters per mapping, using base values to autofill where possible
  • Ability to add a non CO specific filter which applies to all mapped PIs
  • Improved user feedback when mappings are generated
  • Retaining IDs when mappings are regenerated
  • Provide a way to run the data transfer periodically for active mappings

PDAC Challenges & Solutions

PDAC Version Support

PDAC is currently (as of Oct 1, 2021) supported on DHIS2 versions 2.33-2.36, with the caveat that 2.36 currently doesn’t work due to a bug in DHIS2.

Metadata Alignment

Data Set Import - PDAC Requirement

When creating the metadata for  integrating Tracker data to Aggregate data sets across DHIS2 Instances - the aggregate dataset must exist on both servers using the same UIDs during the process of using PDAC. The data set from the target instance can be deleted after the mappings are created/generated. However, if changes or updates are made on the Target server the data set will need to be re-imported to update the mappings via PDAC.

  1. PDAC requires the dataset in order to create the mappings
  2. The UIDs need to be the same for 
    1. Best practice for standardizing metadata
    2. Exporting/Importing data to the corresponding DE/COC mappings in the central server.
    3. PDAC to assign the correct UIDs to the indicator attributes.

Importing Target Server’s Data Set to Source Server

Reminder: 

Source Server is the server with the tracker data being aggregated and imported into the target server’s aggregate data set.

Target Server is the server with the metadata and aggregated data. 

In order to use PDAC to generate the necessary metadata and API request for the integration driver, the Target server’s dataset(s) must be imported into the source server.

For instances that have matching metadata (including matching UIDs and Org Unit hierarchies) this is achieved simply through a metadata dependency export.

For Instances that DO NOT have matching metadata, this is a complex and multi-layered challenge. For large amounts of data it is best achieved through a script that can alter a metadata dependency export or can otherwise query the metadata from the target server’s data set via the API. That is, there are two approaches:

  1. Generate a complete metadata dependency export, including the unnecessary org units and/or category-related metadata, and create a script to remove the unnecessary metadata objects and references.
  2. Generate a list of metadata and their UIDs, and query for them specifically to build a metadata dependency file that mimics only the needed parts of the data set for import in the source system.

Which option you choose is up to you and your team’s capabilities and is dependent upon knowing what metadata is needed vs. not needed. 

Category Structure Metadata Requirements

Each data set needs to be brought over with all of the relevant category structure metadata including COCs, while leaving out any metadata that is not relevant to the source server. In the case of integration with Infolink monthly HIV datasets, the Implementing Partners, Support Types, and Funding Sources that are not relevant to the source country should be excluded. It is important to make sure all category options, categories, category combinations, and category option combinations are imported.. If any are left out then COC generation will fail, PDAC will fail, and Integration with the central server will fail.

Note: UID’s will need to be managed over time as well to maintain synchronization. Integration Driver routes succeed even if metadata is not aligned, however, the import summaries will show a failed data import if the metadata is not aligned. 

COC Generation Failure

After importing the metadata from a target data set, the category structure in DHIS2 needs to be “shared” with the analytics “side” of the database. This will not change the UIDs of the imported COCs. The category structure, which relates to both metadata and analytics, has two key system processes that need to be run:

  1. Resource tables need to be run (the category structure is built and thus available to analytics as dimensions).
  2. COCs need to be generated for all possible pairs of category options based on the category combo metadata

If the resource tables are not run (process #1) before the COC regeneration process (#2) is run, there will likely be a failure since the COC regeneration process will not have a full picture for which to process all category option combos. This has been seen to result in an error that may look like this: 

Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.hisp.dhis.category.CategoryOptionCombo#7056298]

The resolution for failed COC regeneration, in this case, is to run resource tables to get the category structure properly built in the database. 

Indicators that need to be calculated across more than one program

This is currently not possible.

orgUnitOutputIdScheme Bug

https://jira.dhis2.org/browse/DHIS2-11769

A current challenge with no solution is that the key functionality of integrating data between servers is dependent on org unit UID’s being transformed on output to represent the target server’s OUs. The issue is currently under development and will be tested soon.

Program Indicator Data Types and Special Character Errors

This is just a reminder to make sure your filters are valid and taking into account the data types being compared or operated upon - this can cause errors. Also be on the lookout for special character exceptions or errors that may result in errors such as “Invalid string token”.

TIP: A best practice is to test any filters in the Program Indicator app, and ensure they pass validation before adding them in the PDAC tool.

PDAC Q&A

Question 1

Can you explain what happens when you press update, and what happens if you don’t press update?

Answer:

Pressing update saves the category option filter to the datastore and makes the change available in the app. If you do not click update next to a row the changes to that row will not be saved.

Question 2

When editing filters for a row/mapping, what do we put as the filter value if there is an irrelevant category option?

Answer:

Using false in the filter will make all values on the aggregate side that have that disaggregation in return nothing, so if certain category options are not relevant, putting false is a reasonable solution (but you might as well just delete these irrelevant PIs and indicators).

Question 3

We have multiple category options available in the category combo assigned to the data set. We want to use the filters to map to only one of them. Can we mark “true” for the valid category option and “false” for the others?

Answer:

Yes this will work. The app will still generate all the program indicators for both, your valid category option will give you what you need and then you can delete any involving the invlaid category option. Because of the string naming convention the app uses, selecting the PIs you don’t need by category option for deletion is easy.

Question 4

Does PDAC know NOT to create program indicators for the combinations that include things we want to exclude?

Answer:

There is no skip logic for this currently, when we next update the app to store the filters on a per mapping basis we will also be able to add in an option to disable category options. Currently putting false will just mean all the generated program indicators with that category option in will return nothing. 

PDAC Repository, Issues/Feature Requests, & Support Channels

Public Repository

https://github.com/plinnegan/program-dataset-connector

Open Issues & Feature Requests

Feel free to open an issue on the repository to document any known bugs, or issues. Feature requests can also be added to the repository in the form of an issue. For any major features or development reach out to Chase Freeman (cfreeman@baosystems.com) and he will connect you with the necessary parties.

Support

Email support@baosystems.com explaining the issue and add cfreeman@baosystems.com as a CC’d party. 

Integration Driver

The integration driver is a component for pulling and pushing information over HTTP between Web API-enabled system instances. It is accessible through BAO manager’s interface as well as via an API. 

Integration Driver Overview

What it does

  • Moves data between systems
  • Can be used for connect any sort of system that has an HTTP-based API and understands the same data formats (JSON, XML, ADX for DHIS2)
  • Pulls information from a source API, stores the information temporarily, and then pushes it to a destination API.
  • Can be scheduled to run at regular intervals or triggered on-demand
  • Note: the data payload will be stored temporarily in the BAO cloud while in transition, but will not be kept after the integration driver transaction has completed.

Steps to create an Integration Route

  1. Login to BAO Manager
    1. Be sure you’re using an account that has permission to administer Integration Routes and ensure your organization has signed up to use Integration Routes.
  2. Select “Integration Routes” from the menu on the left
    1. This switches your view from your organization’s “cloud instances” view to the overview table of integration routes (if there are any).
  3. Select “Create Routes” from the top right corner.
  4. Complete all of the required fields for each route, and select “Create Route” to save the route’s configuration
    1. Use the table below to reference the field requirements.
  5. Once back at the “Overview” table of routes, you can click on the name of a chosen route to:
    1. View it’s details
    2. Trigger the route
    3. Download the payload (the data being integrated)
    4. View Logs and Server Responses(Pull/Push Statuses).

Table of requirements

Configuration Panel

Info Panel: Trigger, Download, Delete, Status & Logs

Creating Integration Route from PDAC-Generated Indicator Group

  1. Create an indicator using the instructions above.
  2. On the “source” instance, use the Maintenance app to find and select an indicator group that was created by PDAC.
  3. To query for the data values associated with that group, copy the the API query string from the “name” field.
  4. Paste the query string in the “Pull channel” field of the integration route and make sure to prepend the URL to the query string.
  5. Add the two required dimensional parameters - Period & Org Unit.
    1. &dimension=pe:&dimension=ou:
  6. Use the instance’s “dataValueSets” api endpoint for the “Push Channel” value. Include any of the import parameters that are needed. (Reference table below).

Import Parameters

Parameter Values Description
dataElementIdScheme uid | name | code | attribute:ID Property of the data element object to use to map the data values.
orgUnitIdScheme uid | name | code | attribute:ID Property of the org unit object to use to map the data values.
categoryOptionComboIdScheme uid | name | code | attribute:ID Property of the category option combo and attribute option combo objects to use to map the data values.
dataSetIdScheme uid | name | code| attribute:ID Property of the data set object to use to map the data values.
categoryIdScheme uid | name | code| attribute:ID Property of the category object to use to map the data values (ADX only).
categoryOptionIdScheme uid | name | code| attribute:ID Property of the category option object to use to map the data values (ADX only).
idScheme uid | name | code| attribute:ID Property of any of the above objects if they are not specified, to use to map the data values.
preheatCache false | true Indicates whether to preload metadata caches before starting to import data values, will speed up large import payloads with high metadata cardinality.
dryRun false | true Whether to save changes on the server or just return the import summary.
importStrategy CREATE | UPDATE | CREATE_AND_UPDATE | DELETE Save objects of all, new or update import status on the server.
skipExistingCheck false | true Skip checks for existing data values. Improves performance. Only use for empty databases or when the data values to import do not exist already.
skipAudit false | true Skip audit, meaning audit values will not be generated. Improves performance at the cost of ability to audit changes. Requires authority "F_SKIP_DATA_IMPORT_AUDIT".
async false | true Indicates whether the import should be done asynchronous or synchronous. The former is suitable for very large imports as it ensures that the request does not time out, although it has a significant performance overhead. The latter is faster but requires the connection to persist until the process is finished.
force false | true Indicates whether the import should be forced. Data import could be rejected for various reasons of data set locking for example due to approval, data input period, expiry days, etc. In order to override such locks and force data input one can use data import with force=true. However, one needs to be a *superuser* for this parameter to work.

Period Formats

The period format is described in the following table.

Interval Format Example Description
Day yyyyMMdd 20040315 March 15, 2004
Week yyyyWn 2004W10 Week 10 2004
Week Wednesday yyyyWedWn 2015WedW5 Week 5 with start Wednesday
Week Thursday yyyyThuWn 2015ThuW6 Week 6 with start Thursday
Week Saturday yyyySatWn 2015SatW7 Week 7 with start Saturday
Week Sunday yyyySunWn 2015SunW8 Week 8 with start Sunday
Bi-week yyyyBiWn 2015BiW1 Week 1-2 20015
Month yyyyMM 200403 March 2004
Bi-month yyyyMMB 200401B January-February 2004
Quarter yyyyQn 2004Q1 January-March 2004
Six-month yyyySn 2004S1 January-June 2004
Six-month April yyyyAprilSn 2004AprilS1 April-September 2004
Year yyyy 2004 2004
Financial Year April yyyyApril 2004April Apr 2004-Mar 2005
Financial Year July yyyyJuly 2004July July 2004-June 2005
Financial Year Oct yyyyOct 2004Oct Oct 2004-Sep 2005

Relative Periods

The relative periods are relative to the current date and allow e.g. for creating dynamic reports. The available relative period values are:

  • THIS_WEEK
  • LAST_WEEK
  • LAST_4_WEEKS
  • LAST_12_WEEKS
  • LAST_52_WEEKS
  • THIS_MONTH
  • LAST_MONTH
  • THIS_BIMONTH
  • LAST_BIMONTH
  • THIS_QUARTER
  • LAST_QUARTER
  • THIS_SIX_MONTH
  • LAST_SIX_MONTH
  • MONTHS_THIS_YEAR
  • QUARTERS_THIS_YEAR
  • THIS_YEAR
  • MONTHS_LAST_YEAR
  • QUARTERS_LAST_YEAR
  • LAST_YEAR
  • LAST_5_YEARS
  • LAST_10_YEARS
  • LAST_10_FINANCIAL_YEARS
  • LAST_12_MONTHS
  • LAST_3_MONTHS
  • LAST_6_BIMONTHS
  • LAST_4_QUARTERS
  • LAST_2_SIXMONTHS
  • THIS_FINANCIAL_YEAR
  • LAST_FINANCIAL_YEAR
  • LAST_5_FINANCIAL_YEARS

Attribute Mapping

The integration driver is simply a tool to push and pull data from one api to another. It is based on HTTP GETS and POSTs and it really isn’t much smarter than knowing where to GET and POST the data you tell it to. There is no transformation performed via the integration driver. 

Attribute mapping is something the integration driver can leverage--not perform. Furthermore, attribute mapping is only specific to the DHIS2 in this case. 

The primary use of attribute mapping is to output data from the source server using values that map to metadata in the target server. In other words -- Attribute mapping is used to import source data values which do not match target metadata.

How to achieve attribute mapping in DHIS2

Note: PDAC takes care of some of this for you. To see how it works in PDAC, visit the section above: “How it works: Aggregate from tracker”

  1. Create an attribute (type of DHIS2 metadata) add it as a field to the relevant metadata (e.g. orgUnits, indicators) in Target instance.
  2. Add the UID or Code of the metadata from the Target instance as the value of the attribute you created on the source instance  (i.e. the value of the metadata from the previous step).
  3. The Pull Channel in the integration driver needs to query for the data but using the Attribute you defined previously which will show the matching value from the target server in place of the source server’s UID or Code.
  4. To Accomplish this - use the parameter “ouputIdScheme” like this, where is the uid of the Attribute you want to substitute (the one you created):
    1. ?outputIdScheme=ATTRIBUTE:

Doing this removes the need for transformations outside of the DHIS2 environment keeping the integration driver a safe piece of integration software. There are other output ID schemes for orgUnits and dataElements that can be found in the documentation. 

Integration Driver Tips

Identifier Schemes

Familiarize yourself with how, when, and why to use the various identifier schemes in DHIS2 - it really is a powerful feature set

https://docs.dhis2.org/en/full/develop/dhis-core-version-master/developer-manual.html#webapi_identifier_schemes

Import/Export Parameters for Data

Learn the parameters that are available that can be applied to the push and pull channels. In particular, have a fundamental understanding of the ASYNC option when dealing with large data, as well as the aforementioned parameters for the identifier schemes.

https://docs.dhis2.org/en/full/develop/dhis-core-version-master/developer-manual.html#webapi_data_values

Integration Driver API

Import this collection via the link here:

https://www.getpostman.com/collections/ff914b8dc87d4b8b299b

Resources

The following resources are available.

Get all routes or create single routes

GET /api/routes

POST  /api/routes

Methods for Routes by ID

GET /api/routes/{id}

PUT /api/routes/{id}

DELETE /api/routes/{id}

Trigger Routes

POST /api/routes/{id}/trigger

Examples

This section describes examples of API usage.

Create new integration route

POST /api/routes

Example JSON payload:

{

  "name": "Data values from demo to dev",

  "runTrigger": "EVERY_15_MIN",

  "pullChannel": {

    "url": "https://play.dhis2.org/demo/api/24/dataValueSets?dataSet=BfMAe6Itzgt&period=201601&orgUnit=DiszpKrYNg8",

    "username": "admin",

    "password": "district"

  },

  "pushChannel": {

    "url": "https://play.dhis2.org/dev/api/24/dataValueSets",

    "username": "admin",

    "password": "district"

  }

}

Example curl request:

curl -d @route.json "https://manager.baosystems.com/api/routes" -H "Content-Type:application/json" -v