How to expose SAP ECC data with mass parallelism

Data extraction from SAP ECC

A leading global resource company requested Datavard to help them with exposing ECC application data from SAP system for the external auditor.

This data extraction was planned to be executed several times a year, so we agreed that the main deliverable will be ready-to-use setup, reusable whenever needed. As the runtime was of importance, we leveraged our SLO trick: mass parallelism.

The very first step was the analysis, which showed that there is circa 60 ECC tables to be exposed to the external auditing company. We started in the development environment and created three custom development components:

  • Datavard Glue Extractors for the tables in scope
  • Execution variants to execute the biggest tables in parallel portions
  • Execution program to control and monitor the parallelism

Then came the testing and tuning the parallelization. For example, on the quality system (similar size as the production), we increased the number of parallel portions for BSEG table to 100. Not that we would be running the extraction of a single table in 100 processes necessarily, but to make sure that we treat the source database (SAP HANA in this case) kindly – smaller portions means smaller cursors and database locks.

After some tweaking, we have reached the equilibrium at circa 800M data records in a little more than three hours, running in 30 parallel jobs. The performance is affected by both source and target systems as well as the infrastructure throughput.

At the end, the project was a success and a strong starting point for further collaboration with the customer.

Leave a Reply

Your email address will not be published. Required fields are marked *