BACK

SAP and Data Lakes: How Much Does SLT Help?

data lake help datavard glue

SLT – a handy tool, but not all you need

With the SAP Landscape Transformation Replication Server (SLT), SAP provides a handy tool to tap into SAP data and replicate it to HANA for fast analytics and data processing. This is a proven solution for simple and easy data access and replication which can break open some of the uglier parts of SAP data model, e.g. SLT can read data from pool and cluster tables where classical ETL tools fail.

When it comes to replicating data to data lakes, however, we see that SLT is only a piece of the overall system landscape and architecture you need to consider: From the HANA database which you populate (maybe even in real time) with ERP data through ETL you still need to bring data over to your data lake. Even more so, data ends up in the HANA database in SAP’s data model. You still need to “de-mystify” the data through lookups, joins, data cleansing, etc. This is where a classical ETL tool such as SAP Data Services helps tremendously.

At Datavard, we try to simplify this, and provide an out-of-the-box solution to implement native ABAP-based SAP to Big Data implementation, including content and accelerators through our BPL (Business Process Library).

This screen shot shows how the Datvard Glue BPL “de-mystifies” SAP standard fields into friendly fields when using this feature to create tables on a Big Data platform such as Hive on Hadoop:

datavard glue

SAP Data Services application helps, but also comes with disadvantages

With SAP Data Services you can extract the data from your HANA database and either load it to some SQL database, or create files for further processing. Such good old fashioned CSV files have the advantage that you can easily copy them to your Big Data platform, e.g. to HDFS on Cloudera Hadoop. From there, you can use Hadoop native tools to read the data, e.g. you can ingest them into HIVE tables, use Sqoop or Pig, etc.

This is a perfectly working data flow and a solution which I come across frequently with our SAP and Datavard customers. There are some disadvantages to this seemingly “obvious” solution:

  1. You need a whole bunch of tools and technologies. You need to be skilled with all of them to run the solution on a daily basis and perform troubleshooting.
  2. The TCO of this solution is high. You need not only software, but also hardware for additional HANA databases.
  3. Even while SAP SLT helps you with real time streaming of data out of SAP ERP, you do not have this data immediately available on your data lake.
  4. None of these tools and technologies in this long data flow help you actually crack open the SAP data model, i.e. you end up with mostly a 1:1 copy of SAP data, instead of ready-to-use contextualized data
  5. You need to implement data security on all levels, starting with SLT, then in your HANA database, then in Data Services, and then again on your data lake. This is perfectly possible of course, but you will most likely solve it by simply ring-fencing the complete data flow, while implementing security at the end-points ERP and Data Lake only. That leaves some possible vulnerabilities because you persist data along the way.
  6. Finally, the activities required are complex: you need to set up SLT, and run the data extraction to HANA. SLT can implement some transformations, but is somewhat limited. SAP BODS (Data Services) compensates this, but requires again configuration and development. The excellent Guru99 web site has a nice overview of the required activities (https://www.guru99.com/sap-ds-sap-data-services-in-sap-hana.html)

The real lifesaver when it comes to integrating SAP and Data Lakes

When looking at this list of tools and stations the data need to pass through, and the down sides of the complexity involved, I simply cannot help but give one piece of advice: Look into our solution Datavard Glue, which can simplify the integration between SAP and Data Lakes tremendously. If you want to get a high-level introduction on how easy it can be to tap into ERP data from SAP, check the next blog post in this series where I show how to reads SAP data, filter, enrich and contextualize, and store natively on HIVE for further data processing.

Glue is natively integrated into SAP as an ABAP addon, making security easy. However, Glue also natively integrates with Big Data platforms, such as Hadoop – no matter whether this is running on premise, or in the cloud (e.g. on Azure). Glue provides change data capture through database triggers similar as SLT does, but above and beyond triggers offers other methods (e.g. using the change log in SAP BW). You can use Business Logic in SAP ERP to capture (for example) the output of business transactions, have procedural data extraction, …

The following figure shows a Glue extractor with the various ways of change data capture which Glue offers:

datavard glue

Even better, Glue comes with prepared business content for various scenarios, such as prepared data models, ETL data flows, field mappings, a database of “friendly” field names, and business functions which help you contextualize and de-mystify cryptic SAP data without the need of implementing lookups or code in ABAP.

Last not least: Datavard Glue is a software only solution. You do not need additional hardware, and no additional databases, because Glue natively integrates between SAP and Big Data platforms. On the SAP side, Glue supports all SAP business applications, but also the “cool” new SAP tools such as Vora, Data Hub, and Leonardo. On the Big Data side, Glue supports all major distributions and vendors for Hadoop, Spark, Machine Learning, and cloud computing.