Zeaware Data Loader for Dynamics overiew
Microsoft has made changes over the past couple of years the way Dynamics exports bulk data for the use in reporting and analytics. Both Dynamics F&O and Customer Engagement have shifted away from providing options to sync with a SQL database, instead offering support for syncing data incrementally to an Azure data lake storage account. Exporting data to Azure Data Lake provides customers with more flexibility in how they consume data.
To support export to a data lake, Microsoft provides options which includes Azure Synapse Link for Dataverse (CE) and Dynamics Data Lake Writer (F&O). However, utilizing the exported data can sometimes be challenging. The data is written in a CDM format that is essentially CSV, but with a JSON model or manifest files that contain details of the schema structure of the CSV data. As a result, the loading tool needs to understand this specific format to load the data correctly, otherwise it may struggle with schema changes and data mapping. There are also idiosyncrasies in the CSV file itself, such as double quotes and carriage returns within data, that can cause some off-the-shelf data loading tools that support CSV natively to fail to load the data.
To make the loading of this data simple, Zeaware has developed a tool called Zeaware Data Loader for Dynamics that natively understands the CDM/CSV format. It can efficiently track progress, run different entities in parallel, and automatically adjusts target database schemas when changes to the source schema occur.
Zeaware Data Loader currently supports different modes of operation which are “current state target” or “change history target”. These modes can be used together and also allows a single source table to populate multiple target tables using both modes.
Current State Target
The Current State Target mode maintains the target table based on primary key and applies row additions, updates and deletes on a single replica of a row within the target table. Deletes can either physically delete the row, or flag the row as deleted.
This is suitable when, for example, reporting queries are directly occurring on the target table and reports are only conserved with current data (e.g. what is my current inventory, how many appointments are schedule next week, what is my current delay on cases etc.).
This mode is also suitable when CDC is used on the target database to generate a change feed for further downstream ingestion.
Change History Target
The Change History Target mode is more of an audit trail of changes, where the target table is updated based on the primary key and a row history identifier. This means a copy of each row version as supplied by the Dynamics export process. Zeaware Data Loader can automatically add a flag to indicate the latest row version.
Reconciliation
One of the challenges with any data loading solution is being able to reconcile and validate the target database is an accurate copy and is in sync with the source. To support this Zeaware Data Loader automatically keeps a comprehensive audit of data loading activities which includes:
- Group history – For each CDM group, the tables that have changed in the group are captured including any schema changes that have occurred within this group.
- Partition history – When each partition is loaded the file is retained, meta data such as number of rows ingested, size of file, end position in file etc.
- Latency – Latency statistics are also retained and can be used to report delays between data landing in Azure Data Lake and being loaded into the target database.
Error Handling
Error handling is critically important, and Zeaware Data Load provides comprehensive error handling that deals with situations where corrupt data is detected. Error handling options include:
- Fail Process – Fail the entire process and raise an alert.
- Fail Entity – Fail the entity, raise an alert but continue loading unaffected entities.
- Skip & Recover – see below.
Skip & Recover error handling method will create a copy of a corrupt file and meta data file in a holding area. An error record is also written which details where in the file the corruption occurred. If the file can be manually corrected, an administrator can make adjusts and then place the recovered file in a special recovery folder. Zeaware Data Load will automatically detect the recovery file and will attempt to load, however will only load row changes that have not been superseded by subsequent loads (determined by row version information) when using “Current State Target” mode. This provides recovery and ensures all data is in sync and the latest changes are not override.
While the file is in the holding area, a meta data record is written to indicate the target table may be incomplete. This can be used to warn users that the data in their report may not be accurate and to take suitable precautions.
Supported Target Databases
Zeaware is adding additional target database formats but currently supports the following database in the BETA release:
- Azure SQL database
- SQL Server
- AWS Redshift