Different options for your data lake in the cloud
Datavard Glue is a strong solution for tight, native, integration of SAP systems and Data Lakes running on big data platforms. Many companies decide to run their Data Lake in the cloud – obviously this addresses the topic of scalability very well, while minimizing the work of running, patching, and updating components on Hadoop. While there are many different cloud providers offering Hadoop or similar technologies in the cloud, many SAP customers prefer Microsoft Azure over these offerings.
Users must determine which product from Microsoft’s Azure offering to choose for the SAP integration. Looking at the offering there are several options when planning to integrate Microsoft Azure into the SAP landscape:
· Plain storage: which means using Azure as additional space to the SAP landscape. This is a good choice for storing cold and aged SAP data to simply offload the data to improve SAP system operation and TCO. This option also has the potential to connect a reporting solution directly with this storage space. Available storage options are Azure Blob Storage and Data Lake Storage.
· Database as a service allows companies to rent a database in the cloud, and to use it as a middle layer to connect to further processing or application development. Users have a variety of databases, ranging from PostgreSQL to MsSQL.
· Analytics platform: this is a platform for data processing with several options including Hortonworks Hadoop (e.g. an HDInsight cluster), the Databricks Spark fork, or Microsoft SQL data warehouse.
All those options bring various pros and cons that would need to be evaluated based on a concrete use case. For example, to create a reporting dashboard for purchase orders, users may select a different option to user behavior analysis. What is important is to use a flexible solution to keep options and the future integration scenarios open and as wide as possible. Datavard Glue and its modular storage management makes this possible.
For example, to connect an SAP ERP/BW system with HDInsight and Azure Data Lake storage, the Storage Management Layer of Datavard Glue needs to be configured on the SAP system with the correct logon and connection data for the cloud solution:
1. HDInsight: setup connection to underlying storage and Hive database login.
2. ADLS: an Azure user with access and security certificates is required
Once the configuration is completed, it is possible to set up and execute data flows for extraction of SAP data, including transformations, contextualization, lookups, enrichment, cleansing, masking and so on.