Program Data Set Connector & Integration Driver: Tracker-to-Aggregate Integration Across Dissimilar DHIS2 Instances
Goals
The goal of this guide is to highlight how to organize the effective use of the PDAC tool and Integration Driver tool send Tracker data from to one or more aggregate data sets across different systems.
Note that this Guide is for aggregating Tracker data between two systems. Replicating this for using PDAC on a single instance should be slightly more straightforward.
Overview
This document describes the process of implementing a complex metadata and data synchronization across multiple instances of DHIS2 that have similar metadata without matching UIDs. Furthermore, it describes how to use the Program Data Set Connector tool to aggregate tracker data and setup integration routes via BAO System’s Integration Driver to automate reporting to a centralized instance.
Below is the general process for setting up this architecture.This document details the following process:
- Organize your process:
- Understand which server is the Source server (source of data) and which is the Target server (data being transferred to).
- Install and configure PDAC on the Source server.
- Ensure the Aggregate data set from the Target server is replicated onto the Source server
- UID’s for data set metadata must be the same.
- Org Units can be different but it is best practice to keep a standardized hierarchy.
- Use PDAC to create mappings for aggregating and creating dimensional disaggregation on the Source server’s tracker-based data.
- Schedule automated integration routes in BAO Manager to send data from the Source server to the Target server.
PDAC
About
The Program Data Set Connector automates the process of configuring DHIS2 to support data transfer from the tracker to the aggregate data model. It allows you to create custom disaggregations on program indicators, and use them as true data dimensions in DHIS2. PDAC uses the concept of category option filters to map a single program indicator value to multiple disaggregations for aggregate data.
Step-by-Step Mapping Configuration Instructions
- Install PDAC via the App Hub in the App Management app.
- Alternatively, download the latest version of PDAC HERE and load the .zip file into the App Management app via manual install.
- Open the app
- To aggregate the data from program indicators, select “Add Row”.
- Then, Configure the Mapping by selecting the data set, program indicator, and data element.
- Complete the filters for your category options according to the requirements of your configuration.
- Click “Update” for each filter to save it to the PDAC tool.
- Make sure all filters have a value.
- Click “Save” to save the mapping row.
- If you need to make changes, use the “Edit” button for the row you’d like to change.
- Removing row’s is possible via the “Delete” button in that row.
- Once you’re ready to generate the aggregate indicator necessary for the mappings, choose “Generate Mapping” for one or many rows.
- Verify success in the PDAC app, and then navigate to the Maintenance app to verify the creation of the appropriate indicator group.
- To verify the data for export, see the section below titled: "How it works: Data transfer process”.
When to use PDAC
The event reports and event visualiser apps are powerful and may suit your needs in some cases.Reasons the PDAC app might be needed:
- Not just counting events
- Cannot create filter in event reports
- Want to show values alongside other data
- Want to use the data visualiser app (additional chart types ect)
- Custom disaggregations on numeric types
- Data needs to be in aggregate
PDAC Installation
- Using the latest stable release via the app hub
- App Management
- App Hub
- Search “Program DataSet Connector”
- The app is not currently in the app store for 2.36 due to a bug in DHIS2
PDAC Functionality
- Create a program indicator based of the original for every disaggregation
- Modify each program indicator filter so it only returns that disaggregation of the total
- Link each program indicator to a disaggregation on a data element
- DHIS2 will handle displaying these disaggregations as part of the total aggregate data element
Generating Program Indicator Disaggregations
The user provides three things:
- Data target: Aggregate data element
- Data source: Program indicator
- A filter for each category option on the data element
The app then does the following:
- Find the coc’s and aoc’s from the data element and data set
- Build PI filters for each disaggregation
- Create PIs from the source PI with each disaggregation
- Creates Indicators referencing each PI sn storing the appropriate AOC, COC, and DE uids.
- Finally, it also creates an indicator group with a url that tells you have to access the data
How it works: Aggregate from tracker
Step 1: Indicators to data elements
- The app creates a custom attribute which adds a new field to indicators
- The app puts the uid of the data element the indicator will map to
- When exporting the indicator data, use outputIdScheme=ATTRIBUTE
Step 2: Indicators to disaggregations
- By default, DHIS2 indicators and program indicators have two special fields:
- Category option combination for aggregate data export
- Attribute option combination for aggregate data export
- When exporting indicator data, the indicator values will get category option combos and attribute option combos based on these fields.
Result: Indicator Data element with disaggregations
How it works: Data transfer process
Step 1: Export the indicator data
- The app creates an indicator group for each mapping which includes the url needed to export all the related data, simply add the periods and ous needed. For example, see the bold text below:
- /api/analytics/dataValueSet.json?dimension=dx:IN_GROUP-XmPd1r6Lbqp&outputIdScheme=ATTRIBUTE:b8KbU93phhz&dimension=pe:LAST_MONTH&dimension=ou:EpNcvgH3Dld
- Because of the configuration the app has created, the exported values will have the data element UIDs and disaggregations on them already.
Step 2: Import the data element data
- The data is in the correct format, so can be immediately imported to the dataValueSets endpoint.
This process can be automated with the Integration driver (BAO tool), or via a custom cron service run from the back end.
PDAC Features
Add Row
Click to open the modal to CREATE the configuration mappings for Dataset:DataElement:ProgramIndicator (aka “Row”) and assign the filters that are used to create the indicators in the data set.
Edit
Click to open the modal to EDIT the configuration mappings for Dataset:DataElement:ProgramIndicator (aka “Row”) and assign the filters that are used to create the indicators in the data set.
Generate Mappings
Generate mappings creates a new indicator group based on the indicators that are generated by the configuration of the mapping rows. Mappings can be generated any number of times resulting in new indicator groups representing the same calculations. Be careful here.
Delete
When clicking “Delete”, it will find all the metadata associated with a mapping row and it will remove it from the database.
PDAC Benefits
In depth program analysis
The single value program indicator can now be broken down by any of the applied disaggregations. Allowing you to see values by Age, Gender, Key population for example and any possible combination of these as true data dimensions in DHIS2. The disaggregations can be anything you like, if you can create a filter for it, it can be disaggregated using the app.
Help those analysing the data to spot trends and observe potential inequalities in the data more easily.
Faster analytics
Because the program indicator values have been stored against aggregate, they do not need to be calculated on the fly in the analytics apps, so pivots, charts and maps load much faster.
Planned improvements
There are still many plans for enhancements in the app, and using it on client projects has highlighted several key features which would be very useful:
- Individual CO filters per mapping, using base values to autofill where possible
- Ability to add a non CO specific filter which applies to all mapped PIs
- Improved user feedback when mappings are generated
- Retaining IDs when mappings are regenerated
- Provide a way to run the data transfer periodically for active mappings
PDAC Challenges & Solutions
PDAC Version Support
PDAC is currently (as of Oct 1, 2021) supported on DHIS2 versions 2.33-2.36, with the caveat that 2.36 currently doesn’t work due to a bug in DHIS2.
Metadata Alignment
Data Set Import - PDAC Requirement
When creating the metadata for integrating Tracker data to Aggregate data sets across DHIS2 Instances - the aggregate dataset must exist on both servers using the same UIDs during the process of using PDAC. The data set from the target instance can be deleted after the mappings are created/generated. However, if changes or updates are made on the Target server the data set will need to be re-imported to update the mappings via PDAC.
- PDAC requires the dataset in order to create the mappings
- The UIDs need to be the same for
- Best practice for standardizing metadata
- Exporting/Importing data to the corresponding DE/COC mappings in the central server.
- PDAC to assign the correct UIDs to the indicator attributes.
Importing Target Server’s Data Set to Source Server
Reminder:
Source Server is the server with the tracker data being aggregated and imported into the target server’s aggregate data set.
Target Server is the server with the metadata and aggregated data.
In order to use PDAC to generate the necessary metadata and API request for the integration driver, the Target server’s dataset(s) must be imported into the source server.
For instances that have matching metadata (including matching UIDs and Org Unit hierarchies) this is achieved simply through a metadata dependency export.
For Instances that DO NOT have matching metadata, this is a complex and multi-layered challenge. For large amounts of data it is best achieved through a script that can alter a metadata dependency export or can otherwise query the metadata from the target server’s data set via the API. That is, there are two approaches:
- Generate a complete metadata dependency export, including the unnecessary org units and/or category-related metadata, and create a script to remove the unnecessary metadata objects and references.
- Generate a list of metadata and their UIDs, and query for them specifically to build a metadata dependency file that mimics only the needed parts of the data set for import in the source system.
Which option you choose is up to you and your team’s capabilities and is dependent upon knowing what metadata is needed vs. not needed.
Category Structure Metadata Requirements
Each data set needs to be brought over with all of the relevant category structure metadata including COCs, while leaving out any metadata that is not relevant to the source server. In the case of integration with Infolink monthly HIV datasets, the Implementing Partners, Support Types, and Funding Sources that are not relevant to the source country should be excluded. It is important to make sure all category options, categories, category combinations, and category option combinations are imported.. If any are left out then COC generation will fail, PDAC will fail, and Integration with the central server will fail.
Note: UID’s will need to be managed over time as well to maintain synchronization. Integration Driver routes succeed even if metadata is not aligned, however, the import summaries will show a failed data import if the metadata is not aligned.
COC Generation Failure
After importing the metadata from a target data set, the category structure in DHIS2 needs to be “shared” with the analytics “side” of the database. This will not change the UIDs of the imported COCs. The category structure, which relates to both metadata and analytics, has two key system processes that need to be run:
- Resource tables need to be run (the category structure is built and thus available to analytics as dimensions).
- COCs need to be generated for all possible pairs of category options based on the category combo metadata
If the resource tables are not run (process #1) before the COC regeneration process (#2) is run, there will likely be a failure since the COC regeneration process will not have a full picture for which to process all category option combos. This has been seen to result in an error that may look like this:
Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.hisp.dhis.category.CategoryOptionCombo#7056298]
The resolution for failed COC regeneration, in this case, is to run resource tables to get the category structure properly built in the database.
Indicators that need to be calculated across more than one program
This is currently not possible.
orgUnitOutputIdScheme Bug
https://jira.dhis2.org/browse/DHIS2-11769
A current challenge with no solution is that the key functionality of integrating data between servers is dependent on org unit UID’s being transformed on output to represent the target server’s OUs. The issue is currently under development and will be tested soon.
Program Indicator Data Types and Special Character Errors
This is just a reminder to make sure your filters are valid and taking into account the data types being compared or operated upon - this can cause errors. Also be on the lookout for special character exceptions or errors that may result in errors such as “Invalid string token”.
TIP: A best practice is to test any filters in the Program Indicator app, and ensure they pass validation before adding them in the PDAC tool.
PDAC Q&A
Question 1
Can you explain what happens when you press update, and what happens if you don’t press update?
Answer:
Pressing update saves the category option filter to the datastore and makes the change available in the app. If you do not click update next to a row the changes to that row will not be saved.
Question 2
When editing filters for a row/mapping, what do we put as the filter value if there is an irrelevant category option?
Answer:
Using false in the filter will make all values on the aggregate side that have that disaggregation in return nothing, so if certain category options are not relevant, putting false is a reasonable solution (but you might as well just delete these irrelevant PIs and indicators).
Question 3
We have multiple category options available in the category combo assigned to the data set. We want to use the filters to map to only one of them. Can we mark “true” for the valid category option and “false” for the others?
Answer:
Yes this will work. The app will still generate all the program indicators for both, your valid category option will give you what you need and then you can delete any involving the invlaid category option. Because of the string naming convention the app uses, selecting the PIs you don’t need by category option for deletion is easy.
Question 4
Does PDAC know NOT to create program indicators for the combinations that include things we want to exclude?
Answer:
There is no skip logic for this currently, when we next update the app to store the filters on a per mapping basis we will also be able to add in an option to disable category options. Currently putting false will just mean all the generated program indicators with that category option in will return nothing.
PDAC Repository, Issues/Feature Requests, & Support Channels
Public Repository
https://github.com/plinnegan/program-dataset-connector
Open Issues & Feature Requests
Feel free to open an issue on the repository to document any known bugs, or issues. Feature requests can also be added to the repository in the form of an issue. For any major features or development reach out to Chase Freeman (cfreeman@baosystems.com) and he will connect you with the necessary parties.
Support
Email support@baosystems.com explaining the issue and add cfreeman@baosystems.com as a CC’d party.
Integration Driver
The integration driver is a component for pulling and pushing information over HTTP between Web API-enabled system instances. It is accessible through BAO manager’s interface as well as via an API.
Integration Driver Overview
What it does
- Moves data between systems
- Can be used for connect any sort of system that has an HTTP-based API and understands the same data formats (JSON, XML, ADX for DHIS2)
- Pulls information from a source API, stores the information temporarily, and then pushes it to a destination API.
- Can be scheduled to run at regular intervals or triggered on-demand
- Note: the data payload will be stored temporarily in the BAO cloud while in transition, but will not be kept after the integration driver transaction has completed.
Steps to create an Integration Route
- Login to BAO Manager
- Be sure you’re using an account that has permission to administer Integration Routes and ensure your organization has signed up to use Integration Routes.
- Select “Integration Routes” from the menu on the left
- This switches your view from your organization’s “cloud instances” view to the overview table of integration routes (if there are any).
- Select “Create Routes” from the top right corner.
- Complete all of the required fields for each route, and select “Create Route” to save the route’s configuration
- Use the table below to reference the field requirements.
- Once back at the “Overview” table of routes, you can click on the name of a chosen route to:
- View it’s details
- Trigger the route
- Download the payload (the data being integrated)
- View Logs and Server Responses(Pull/Push Statuses).
Table of requirements
Configuration Panel
Info Panel: Trigger, Download, Delete, Status & Logs
Creating Integration Route from PDAC-Generated Indicator Group
- Create an indicator using the instructions above.
- On the “source” instance, use the Maintenance app to find and select an indicator group that was created by PDAC.
- To query for the data values associated with that group, copy the the API query string from the “name” field.
- Paste the query string in the “Pull channel” field of the integration route and make sure to prepend the URL to the query string.
- Add the two required dimensional parameters - Period & Org Unit.
- &dimension=pe:
&dimension=ou:
- &dimension=pe:
- Use the instance’s “dataValueSets” api endpoint for the “Push Channel” value. Include any of the import parameters that are needed. (Reference table below).
Import Parameters
Parameter | Values | Description |
dataElementIdScheme | uid | name | code | attribute:ID | Property of the data element object to use to map the data values. |
orgUnitIdScheme | uid | name | code | attribute:ID | Property of the org unit object to use to map the data values. |
categoryOptionComboIdScheme | uid | name | code | attribute:ID | Property of the category option combo and attribute option combo objects to use to map the data values. |
dataSetIdScheme | uid | name | code| attribute:ID | Property of the data set object to use to map the data values. |
categoryIdScheme | uid | name | code| attribute:ID | Property of the category object to use to map the data values (ADX only). |
categoryOptionIdScheme | uid | name | code| attribute:ID | Property of the category option object to use to map the data values (ADX only). |
idScheme | uid | name | code| attribute:ID | Property of any of the above objects if they are not specified, to use to map the data values. |
preheatCache | false | true | Indicates whether to preload metadata caches before starting to import data values, will speed up large import payloads with high metadata cardinality. |
dryRun | false | true | Whether to save changes on the server or just return the import summary. |
importStrategy | CREATE | UPDATE | CREATE_AND_UPDATE | DELETE | Save objects of all, new or update import status on the server. |
skipExistingCheck | false | true | Skip checks for existing data values. Improves performance. Only use for empty databases or when the data values to import do not exist already. |
skipAudit | false | true | Skip audit, meaning audit values will not be generated. Improves performance at the cost of ability to audit changes. Requires authority "F_SKIP_DATA_IMPORT_AUDIT". |
async | false | true | Indicates whether the import should be done asynchronous or synchronous. The former is suitable for very large imports as it ensures that the request does not time out, although it has a significant performance overhead. The latter is faster but requires the connection to persist until the process is finished. |
force | false | true | Indicates whether the import should be forced. Data import could be rejected for various reasons of data set locking for example due to approval, data input period, expiry days, etc. In order to override such locks and force data input one can use data import with force=true. However, one needs to be a *superuser* for this parameter to work. |
Period Formats
The period format is described in the following table.
Interval | Format | Example | Description |
Day | yyyyMMdd | 20040315 | March 15, 2004 |
Week | yyyyWn | 2004W10 | Week 10 2004 |
Week Wednesday | yyyyWedWn | 2015WedW5 | Week 5 with start Wednesday |
Week Thursday | yyyyThuWn | 2015ThuW6 | Week 6 with start Thursday |
Week Saturday | yyyySatWn | 2015SatW7 | Week 7 with start Saturday |
Week Sunday | yyyySunWn | 2015SunW8 | Week 8 with start Sunday |
Bi-week | yyyyBiWn | 2015BiW1 | Week 1-2 20015 |
Month | yyyyMM | 200403 | March 2004 |
Bi-month | yyyyMMB | 200401B | January-February 2004 |
Quarter | yyyyQn | 2004Q1 | January-March 2004 |
Six-month | yyyySn | 2004S1 | January-June 2004 |
Six-month April | yyyyAprilSn | 2004AprilS1 | April-September 2004 |
Year | yyyy | 2004 | 2004 |
Financial Year April | yyyyApril | 2004April | Apr 2004-Mar 2005 |
Financial Year July | yyyyJuly | 2004July | July 2004-June 2005 |
Financial Year Oct | yyyyOct | 2004Oct | Oct 2004-Sep 2005 |
Relative Periods
The relative periods are relative to the current date and allow e.g. for creating dynamic reports. The available relative period values are:
- THIS_WEEK
- LAST_WEEK
- LAST_4_WEEKS
- LAST_12_WEEKS
- LAST_52_WEEKS
- THIS_MONTH
- LAST_MONTH
- THIS_BIMONTH
- LAST_BIMONTH
- THIS_QUARTER
- LAST_QUARTER
- THIS_SIX_MONTH
- LAST_SIX_MONTH
- MONTHS_THIS_YEAR
- QUARTERS_THIS_YEAR
- THIS_YEAR
- MONTHS_LAST_YEAR
- QUARTERS_LAST_YEAR
- LAST_YEAR
- LAST_5_YEARS
- LAST_10_YEARS
- LAST_10_FINANCIAL_YEARS
- LAST_12_MONTHS
- LAST_3_MONTHS
- LAST_6_BIMONTHS
- LAST_4_QUARTERS
- LAST_2_SIXMONTHS
- THIS_FINANCIAL_YEAR
- LAST_FINANCIAL_YEAR
- LAST_5_FINANCIAL_YEARS
Attribute Mapping
The integration driver is simply a tool to push and pull data from one api to another. It is based on HTTP GETS and POSTs and it really isn’t much smarter than knowing where to GET and POST the data you tell it to. There is no transformation performed via the integration driver.
Attribute mapping is something the integration driver can leverage--not perform. Furthermore, attribute mapping is only specific to the DHIS2 in this case.
The primary use of attribute mapping is to output data from the source server using values that map to metadata in the target server. In other words -- Attribute mapping is used to import source data values which do not match target metadata.
How to achieve attribute mapping in DHIS2
Note: PDAC takes care of some of this for you. To see how it works in PDAC, visit the section above: “How it works: Aggregate from tracker”
- Create an attribute (type of DHIS2 metadata) add it as a field to the relevant metadata (e.g. orgUnits, indicators) in Target instance.
- Add the UID or Code of the metadata from the Target instance as the value of the attribute you created on the source instance (i.e. the value of the metadata from the previous step).
- The Pull Channel in the integration driver needs to query for the data but using the Attribute you defined previously which will show the matching value from the target server in place of the source server’s UID or Code.
- To Accomplish this - use the parameter “ouputIdScheme” like this, where
is the uid of the Attribute you want to substitute (the one you created): - ?outputIdScheme=ATTRIBUTE:
- ?outputIdScheme=ATTRIBUTE:
Doing this removes the need for transformations outside of the DHIS2 environment keeping the integration driver a safe piece of integration software. There are other output ID schemes for orgUnits and dataElements that can be found in the documentation.
Integration Driver Tips
Identifier Schemes
Familiarize yourself with how, when, and why to use the various identifier schemes in DHIS2 - it really is a powerful feature set
Import/Export Parameters for Data
Learn the parameters that are available that can be applied to the push and pull channels. In particular, have a fundamental understanding of the ASYNC option when dealing with large data, as well as the aforementioned parameters for the identifier schemes.
Integration Driver API
Postman Collection Link
Import this collection via the link here:
https://www.getpostman.com/collections/ff914b8dc87d4b8b299b
Resources
The following resources are available.
Get all routes or create single routes
GET /api/routes
POST /api/routes
Methods for Routes by ID
GET /api/routes/
{id
}
PUT /api/routes/
{id
}
DELETE /api/routes/
{id
}
Trigger Routes
POST /api/routes/
{id
}/trigger
Examples
This section describes examples of API usage.
Create new integration route
POST /api/routes
Example JSON payload:
{
"name": "Data values from demo to dev",
"runTrigger": "EVERY_15_MIN",
"pullChannel": {
"url": "https://play.dhis2.org/demo/api/24/dataValueSets?dataSet=BfMAe6Itzgt&period=201601&orgUnit=DiszpKrYNg8",
"username": "admin",
"password": "district"
},
"pushChannel": {
"url": "https://play.dhis2.org/dev/api/24/dataValueSets",
"username": "admin",
"password": "district"
}
}
Example curl request:
curl -d @route.json "https://manager.baosystems.com/api/routes" -H "Content-Type:application/json" -v