In this SAP HANA Cloud tutorial, we will discuss SAP HANA Cloud Data Lake and learn about its characteristics, and components, also learn how to create a data lake instance in SAP Hana Cloud and access it.
What is SAP HANA Cloud Data Lake?
A data lake is a place where all kinds of data are stored. To make data-driven decisions, data from this repository can be accessed, reviewed, and used.
Using a cloud data lake enables your company to cut costs, boost performance, and gain faster access to large-scale, inexpensive data that produces insights. The cloud data lake has relational capabilities and acts on object storage and files.
Why do we need to build a cloud data lake in SAP Hana Cloud
Data lakes are built to manage costs while storing data at petabyte size, and many more reasons like:
- Cost-effective capacity: Use elastic scalability, start with small, and expand your data lake to petabytes.
- Combine data from various temperatures: Easily shift data between warm, cold, and hot states.
- Get rid of the data swamps: Don’t move data for data science purposes, instead, give a single, integrated data lake to your specific use cases and lines of business.
- Create applications for data science: Make use of a lot of data and models to power your data science applications.
Characteristics of a data lake in SAP Hana Cloud
A data lake has three key characteristics:
- Landing Zone: A location where your raw data can reside.
- Staging Zone: A staging area where data is transformed for analytical purposes.
- Data exploration Zone: A region of data exploration where analytics, apps, and machine learning models use data.
Component of a data lake in SAP Hana Cloud
The data lake Relational Engine component and the default data lake files component are the two essential parts of the SAP HANA Cloud data lake.
Data Lake Files
Data lake files provide controlled access to the storage of semistructured, unstructured, and structured data files.
Multiple file containers are a technique used by Data lake files. The diagnostics file container and the default file container are both provided when you build a data lake instance. You control the file container and SAP is in charge of the diagnostics file controller.
- File Container: It is referred to as the managed object store, this is a cloud storage repository that is owned and administered by SAP
The client controls who has access to the file container. You can manage the content of the file container and write to it directly through its REST API or indirectly through services like data lake Relational Engine or SQL on file.
- The file container enables you to use Files as a ‘data lake’ which is a repository for Big Data. In the file container, you can put structured, unstructured, and semi-structured files.
The data lake Relational Engine must be enabled in order for the diagnostics file container to exist during the creation of the data lake instance.
Data Lake Relational Engine
Large volumes of data are stored and analyzed by relational engines. To reduce the expense, it makes use of cheap storage solutions while preserving top-notch performance and completed SQL access to the data.
Structured data is kept in a relational engine. Use data lake files, if you want to keep unstructured data. The data lake Relational Engine is by default activated when you provision a data lake instance, but it can be deactivated.
How to create Data Lake in SAP HANA Cloud
A data lake can be built independently or integrated with a HANA database. It is optional in either case to enable the data lake Relational Engine. The data lake files component, however, is always active regardless of how you build the data lake.
Creating a standalone Data Lake instance
We are going to use the SAP HANA Cloud Central to create the instance of a data lake with the enabled Relational Engine component.
Click on Create button in SAP HANA Cloud and select the Data Lake.
Then choose the SAP HANA Cloud, Data Lake, and click on Next Step button as shown in the below picture.
The next step appears for providing the details about your data lake instance. In the Basic section, give the instance a name such as hana_data_lake and an optional description if you want. Then click the Next Step button.
Choose the Allow All IP Addresses under the section Allowed Connections, then click on the Next Step button.
The next step appears for setting the parameters for your data lake Relational Engine:
- The data lake instance has the Data Lake Relational Engine component activated by default.
- Provide a password for the user HDLADMIN under the Credentials section. Then re-enter the password to confirm it. Keep this password in mind.
- While creating a data lake Relational Engine, the default user created is the HDLADMIN user.
Accept the Coordinator and Workers default values for Size on the same page, then click Next Step.
The next step appears for setting up the instance of your data lake with advanced options.
- Choose Configure to be most compatible with SAP IQ under Initialization Mode.
- Accept the default values under the General and Nchar options.
- For your data lake instance, the Automatically Backup Database check box is already checked, if you have paid user, but are on trial it is disabled.
- Click on the Review and Create button
The next step appears for reviewing the setting of the instance, after reviewing click on the Create button to create a data lake instance.
After performing the above steps, you will see the created instance on SAP HANA Cloud Central.
How to access data lake in SAP HANA Cloud
You can access the data lake and begin storing, accessing, and manipulating the data after creating the data lake. Before accessing the data lake make sure that the data lake is active.
To check the data lake instance whether it is running or not, use the SAP BTP cockpit to access the SAP HANA Cloud Central.
- On the page, you will see the running instances. By choosing to view all the instances, you can find the information of your standalone data lake instance, if it is running, then ok, otherwise start the data lake instance.
To connect to a data lake instance, you can use a number of different approaches.
- Interactive SQL command line interface (CLI)
- SAP HANA Database explorer
- Interactive SQL (dbisql) – a graphical interface
- isql
Now, we will use each approach to connect to the data lake one by one:
Using SAP HANA Database Explorer
Once the running state of the data lake instance is verified, click on the three dots and chose to open the SQL Console.
After clicking on OPEN SQL Console, it asks for credentials of the data lake instance that we have to define while creating it. Enter username and password for the instance.
After performing the above steps, now you have access to the data lake instance using the SAP HANA Database Explorer.
- In the database explorer’s left-hand database menu, search for the name of your data lake instance.
- Now to launch the new SQL Console, click the SQL icon in the top-left area.
To connect with another method, you can follow the official documentation of SAP HANA Cloud.
Conclusion
In this SAP HANA Cloud tutorial, we have learned what is Data Lake in SAP HANA Cloud and its characteristics with components of the data lake like data lake files and Relational Engine. Also, we covered, how to create Data Lake in SAP HANA Cloud and different approaches to access the data lake instance.
You may also like:
- What is SAP BTP Cockpit?
- Create a Database Instance in SAP HANA Cloud
- How to manage roles and privileges for users in SAP HANA Cloud?
- SAP HANA Cloud ABAP Environment
I am Chris Waldron, working as a Senior SAP HANA Consultant at Halliburton, Houston, Texas, United States. I have been working in SAP for more than 15 years, especially in SAP IT consulting and business consulting. I worked in various industries in Sales & Distribution, Customer Relationship Management, banking, Risk Management, etc. And I am an SAP Certified Development Specialist – ABAP for SAP HANA 2.0 and SAP HANA Modeling Certified consultant. Read more