Parallel Extraction – How to Extract Large Amount of SAP Data Faster

Extracting large amounts of data is a challenge

Large enterprises running SAP systems extend their infrastructure to data lakes or Cloud. According to the Insight report, in the past year 52% of organizations have migrated services and workloads to cloud-based technology. But, effective extraction of large amounts of data is always a challenge. In this blog we will show you how you can accomplish this faster by using Datavard Glue.

How does it work?

Long story short, more data you need to extract, more time you need for extraction. A simple workaround to decrease this time is to split one large task into several smaller ones. Five painters can paint the same wall faster than one.

What are the requirements?

You need Datavard Glue to perform ETL (Extraction-Transformation-Loading) from SAP to the data lake storage of your choice whether it’s cloud service like Microsoft Azure, Google Cloud, AWS, Hadoop or any standard RBMS.
What does it look like?

What does it look like?

Following standard SAP principles, Datavard Glue uses variants to specify data selection and start the data extraction. The variant defines what data is replicated and sets the maximum size of packages. After executing the variant, the standard SAP job is created to replicate the data. The post about real customer use case is described here.

Variants can be created based on any key field from the table – where ranges make sense. One background job is used for variant so you would need a free background job for each variant.

It is simple 🙂

There is a package size where you can specify the number of records, which are transported to external storage in a single package. If a higher number of records is about to be transported, they are divided into multiple packages. Once the variant is created and activated, you can execute the data extraction.

Real life numbers

To conclude, variant creation with Datavard Glue is simple and with parallel extraction it makes data extraction much more efficient. As you can see above, real life numbers from customer are quite interesting. Heavy lifting, that mean 1.6bn rows was extracted in 6 hours and ongoing extraction for 118,138 rows take 34 sec. from a real time InfoCube.

Leave a Reply

Your email address will not be published. Required fields are marked *