Two Data Scenarios for Hadoop: Offloading and Integration
In my previous blog posts about integrating Hadoop and SAP applications I highlighted two main scenarios that are interesting for SAP customers: offloading and integration. Such integration involves ETL processes, populating data lakes, and retrieving the results of Hadoop applications (e.g. machine learning) into the SAP universe. Offloading is usually referred to as “archiving” in the SAP world.
Archiving for SAP ERP is nothing new – the need to offload data from transactional SAP systems is (nearly) as old as SAP itself. The SAP ERP solution includes the on-board SARA transaction to help customers offload (“Archive”) data as needed. Offloaded data is read-only and available for end users in a subset of all SAP applications only. However, offloaded data within the SARA application is safe and certified. With the ADK (Archive Development Kit) customers can extend the archiving as needed to also cover extensions and programs.
In the Business Warehouse area, archiving using SARA can be used but does not offer advantages. There is a more powerful tool available: NLS (Nearline storage). Using NLS, data is available for end users in BW queries. SAP offers an NLS implementation based on their in-house Sybase IQ database. At Datavard, we offer Outboard as the best-in-breed NLS solution which supports virtually any database or storage (be it DB2, Oracle, MSS, Hadoop with Hive or Impala, HDFS – just to name a few).
Archiving in the Cloud and Why BLOBs Are Better on Azure
In the beginning of the cloud era one question which our customers asked for both ERP and BW offloading was whether it made sense to think of cloud based archiving. Obviously, cloud archiving may solve several issues with storing and securing data. Ideally this solution comes at a very affordable cost, and would even allow for the integration of data lakes and further access / processing of offloaded data.
Our architecture team cooked up a simple test-drive of our OutBoard for Transactional Systems and Outboard for BW solutions. This showcases how easy it is to offload data from an SAP landscape into the cloud. For our test drive, we used Azure. The easiest way is to simply use data files and mount Azure as a file storage through a secured tunnel from the on premise system. While this is technically the easiest approach, it may have drawbacks from data security point of view. What is worse, data in archive files is not easily accessible.
A better option is to use BLOBs (Binary Large Objects) on Azure, where data is stored in a database in the cloud. This Azure BLOB storage has advantages when compared to the file storage in terms of performance, security, and handling of system restores (e.g. a point of time recovery is possible).
Here are some details on our implementation:
- Of course, an Azure account is required (there are demo accounts available for trials)
- you need to create a BLOB storage and set up security. One of the steps is to define the authentication method to be used in HTTP requests (e.g. token generator).
- All connections between Azure the SAP backend and the implementation of applications in ABAP work via HTTP requests.
Then we connected our SAP backend with Azure as a storage using Datavard Storage Management component which is at the heart of Outboard. From SAP ERP, this storage type is used as an Archivelink implementation to push ADK files easily into the cloud. Using Outboard’s WORM certified storage, we can ensure that archives are not manipulated.
The Result: Why We Love Cloud Archiving
Accessing these files from SAP ERP works through Archivelink for ADK files. For example, SAP transaction FB03 can be used to display FI documents from the cloud archive. In our benchmarks, the speed of such data access was very good. But what’s more: using our integration of storage management and Datavard Glue – our Big Data interface and middleware – the binary archive content can be made transparently available for further data processing in data lakes.
Why It Is Worth to Archive into the Cloud
- cloud storage offers secure, safe, and easy storage
- storage is affordable
- extension of storage is easy
- no worries about backups
- extending to new technologies (e.g. Hadoop data lakes) is easy