The latest trends in the SAP world point to the importance of gathering structured and unstructured data from various sources, e.g. SAP systems, file systems or social media, in one place – the data lake. This will enable further analytics, diagnostics, and predictions thus helping to make business decisions easier. Cloud-based storage options from Google, Microsoft and Amazon or on-premise Hadoop data lakes are suitable platforms which can be integrated with SAP ECC, SAP S/4HANA, SAP BW, SAP BW/4HANA and SAP HANA Native, using Datavard’s solutions Outboard DataTiering, OutBoard ERP Archiving, Glue or DataFridge. The typical architecture with SAP historical data integrated into the data lake as well as a demo of archiving financial documents are outlined in the blog post written by Jan Meszaros.
In this blog, I will elaborate on two important aspects of moving historical SAP archives to a cloud based or on–premise data lake. The first aspect is the architecture with two possible integration scenarios of SAP with a data lake. The second aspect is the security when connecting SAP with data lakes.
Integration scenarios and their key benefits
Based on the complexity of SAP ECC, S/4HANA landscapes, we recommend two possibilities for the integration of Outboard ERP Archiving as a holistic archiving solution that moves data between the SAP database and external storage. Both are regardless of the storage vendor (e.g. cloud-based, or on-premises data lakes) and according to usage or age of the data within a customer’s landscape.
The first option is a centralized architecture, where Outboard ERP Archiving is installed on an SAP system which is not subject of the archiving. The system’s so called “client systems”, which are subject of the archiving, communicate with Outboard ERP Archiving deployed on the central system via the ArchiveLink interface within an internal corporate network. A centralized architecture is recommended for organizations with complex SAP landscapes containing multiple SAP production systems. As you can see in the below infographic, one of the key pros of a centralized deployment, is the fact that the archive service is installed only on the central system. Archiving of the aged data or migration of the historical archive, however, is enabled from all connected clients.
The second integration type is a decentralized architecture, where Outboard ERP Archiving is installed on each client system. This second type of deployment is recommended for organizations with only one production system line. The biggest advantage here, is having the archiving client and archiving service on the one SAP system, which mitigates any potential bottleneck in the network connection between the central system and the client. An additional dedicated SAP system for hosting Outboard ERP Archiving is not necessary.
A security concept matters
Security of the archived data and the communication interfaces is crucial nowadays, especially when connecting with cloud data lakes. Both centralized and decentralized architectures are built on secure communication between archiving client and archiving service, enabled by the ArchiveLink Signature concept and Secure Network Communication (SNC). Since the outbound communication or communication towards storage media is often outside of the internal corporate network, it must always be protected. The secure communication between Outboard DataTiering and cloud solution is achieved by using a secure protocol e.g., HTTPS, TCP/IP with SSL, Secured NFS, depending on the API used by the storage connector and platform specific authentication/authorization concept, e.g., Kerberos, Shared Access Signature (SAS) Token, Active Director together with user permission management
OutBoard ERP Archiving is the only available solution that enables secured storing of archive data in the cloud data lake and makes it available for further data analytics.