Today, Industry 4.0, the Internet of Things, digitalization and data-driven business models are no longer only theories, but models in testing. Expectations have cooled and the first models are being realistically discussed.
But traditional systems and structures are neither configured to master the flood of data, nor to employ methods and models that allow for effective data mining.
That brings us to the core of the problem and the opportunity: Big Data can only become big business when the right technology can be used for these two important tasks:
- Storage of large data volumes
- Processing of large data volumes
The two tasks cannot be intermixed as the objectives of each are different. Data volumes use high value components for massive parallelization and analysis in the main storage (in-memory).
These technologies, such as SAP HANA, IBM BLU or Oracle Exadata, are extremely efficient and expensive. In contrast, the objective of the storage of large data volumes is above all cost optimization. This is necessary because the value of most Big Data applications is only clear after the first tests.
Flexibility is essential in storing data volumes within a short time frame and without exceeding budget constraints – thus the pressure to perform. The open source platform, Hadoop, was created particularly for this purpose: the storage, processing, and analysis of very large volumes – hundreds of terabytes or petabytes – of data.
Hadooponomics: The numbers speak for themselves
Hadoop is not only an option but essential for Big Data scenarios, according to the market research company Forrester Research. In order to emphasize the financial uses of open source software, Forrester analysts coined the term “Hadooponomics”.
The numbers truly speak for themselves. According to data from Forrester, the costs for large Hadoop distributions amount to 2.000 to 3.000 dollars per node per year. In contrast, a HANA node costs approximately 750.000 dollars per year.
A well-known UK company compared the costs of conventional data storage with the estimated costs for a Hadoop replacement. One TB in an Oracle database generates costs of around 50,000 Euro per year. For the storage of the same data volume, but in Hadoop the costs equate to roughly 3% of that generated from using the Oracle database. In the face of these immense cost differences, it is economical to work on key data (the most valuable and most often used) in SAP HANA and keep the remaining data available in Hadoop.