At Datavard, we work with the integration between SAP and Big Data platforms such as Hadoop. Data in SAP systems can’t compete with Big Data in terms of sheer data volume. In comparison to your average Hadoop cluster, SAP systems are rather small. However, in terms of both value and TCO, business data in SAP systems easily take the lead. After all, SAP’s business solutions are widely used for finance, logistics, and most major companies on earth are relying on SAP solutions to run their ERP processes from data entry to reporting in SAP.

But even well-structured data in SAP environments grows and keeps piling up. This can result in performance problems, but also in significant costs. It’s a simple formula: the more data you’ve got, the more SAP HANA you need, the more the system landscape will cost you. There is a simple strategy which can be implemented in SAP to overcome this: housekeeping and data offloading.

The concept of “data offloading” is typically referred to as “archiving” for SAP ERP solutions, and as “Nearline Storage” for SAP BW systems. What they have in common is the idea to cut data out of the online SAP database and simply store the data somewhere else, while keeping data as accessible as possible, and as fast as necessary. Obviously, storing data on Hadoop is the recommended way to go if you’re aiming at flexibility, scalability, and “bang for the buck” (also known as TCO).

Benefits of Data Offloading

Data offloading can improve the system’s performance and at the same time significantly lower the TCO of the overall SAP system landscape. While TCO reduction is simple to see and evaluate, the impact on performance is two-fold. In some areas, you will see heavy performance improvements, while in other areas you will want to find a compromise between performance and costs:
• If you only keep recent data in the online database of your SAP BW systems, certain processes in the ETL data flows from ERP source systems will be faster. For example, request activation will be accomplished faster.
• The performance of reporting and availability of data will decrease, though. How much it may decrease. How much that is will depend on your choice of target storage and the nature of your queries.

Data Management Strategies for hot, cold and frozen data

Offloading and archiving in SAP for all modules and applications requires the groundwork of implementing a data management strategy. Essentially, this involves categorizing data into Hot, Warm, and Cold data. Different goals and strategies exist for each of these categories.

Hot data is considered business critical and needs to be available for business users at all times.
Warm and cold data can be offloaded to reduce the SAP footprint. This data should still be available, but it does not need to be available at the speed of light.
Dead and frozen data can be purged. This activity is called “housekeeping”

Housekeeping is a challenge by itself: while SAP systems can include up to 40% data which can be purged to help reduce the SAP footprint and simplify SAP system operation it is sometimes challenging to identify such data. Datavard’s OutBoard Housekeeping includes a collection of Bots which can analyze and implement data purging across complete landscapes, orchestrated from a central place (e.g. SAP Solution Manager).


To integrate SAP with Hadoop, Datavard provides a central component: Storage Management. Using Storage Management, the SAP Netweaver ABAP stack can be connected to various storage types such as classical RDBMS, but more importantly HDFS, HIVE, or Impala running on CDH. Storage Management bridges and translates between SAP technologies such as Open SQL and RFC on the one hand and REST, JDBC, ODBC on the other hand.

Different solutions by Datavard leverage Storage Management:

  • Datavard OutBoard for Analytics allows you to implement data management processes for SAP BW, and implements a fully certified NLS (Nearline Storage) for SAP Business Warehouse
  • OutBoard Data Tiering implements data management for SAP HANA, including native HANA databases running custom applications, and SAP’s next generation data warehouse BW/4HANA
  • OutBoard for transactional systems implements ArchiveLink to use classical SAP archiving to Hadoop, and comes with a range of accelerators for implementing SAP archiving
  • DataFridge is Datavard’s solution for SAP system decommissioning, allowing you to offload SAP business data to CDH, including WORM storage (Write-Once-Read-Many), retention management, legal hold, and data purging. You can efficiently offload data from legacy SAP systems and keep the business data available for business users and auditors
  • Finally, Datavard Glue allows you to flexibly implement integration processes, ETL, and data modelling between SAP and Hadoop

Important considerations

Data management extends beyond offloading and archiving. Good data management includes several additional aspects:

  • Data Quality: you will want to make sure that you have relevant data, especially in terms of master data. Redundant master data, incorrectly used fields in master data, or “blank” fields will impact business processes.
  • Analysis and identification: Datavard supports you with the Datavard FitnessTest and Datavard Insights to identify and classify data in SAP.
  • Redundant data (e.g. in reporting) can make reconciliation difficult. Test automation (such as Datavard Validate) may help you
  • Housekeeping: offloading is relevant for business data which you want to keep. However, SAP systems tend to accumulate a lot of technical data which can be purged.

Offload up to 45% of Data

You may be wondering what would be your benefit of data offloading. Of course, without the magic crystal ball we are not able to tell. Why don’t we uncover first the typical potential behind offloading of cold and warm SAP data?

The customer example below shows how much data aged data could be offloaded outside of SAP while keeping the seamless access to it. As you can see depending which data is going to be offloaded we can cut the size of the SAP system either by 32% (conservative approach) or 45% (confident approach).

The aged data are split into two categories:

  • Cold data – data older than 2 years and not accessed by users (within the analyzed period)
  • Warm data – data older than 2 years and accessed by users only rarely (less than 100 times within the analyzed period)

How to Calculate ROI

Considering your benefits (reduce TCO for SAP) and investments (external storage, additional software and implementation effort) you should  calculate the return of investments of data offloading. This is usually very good as the storage costs, especially for SAP HANA, are much higher than investments into storage for your cold and warm data.

In our example, the return of investments into data offloading around in 2 years.

This is based on Datavard customer details and based on offloading of cold SAP data into an external Hadoop cluster to achieve:

  • Offloading of aged cold SAP data outside of SAP HANA into the cheaper storage Hadoop. Initially system will be shrunk by 32% by offloading aged SAP data.
  • Ensuring seamless access to offloaded data using NLS interface (using same queries, DTPs, etc.)
  • Improvement of system performance accessing the online SAP HANA data
  • Estimated SAP HANA storage saving 3TB by 2023
  • ROI of offloading is 2 years