Hadoop Eco System Installation - PART 1
Hey Guys!
Let us start with the definition of Big Data
stable/hadoop-project-dist/ hadoop-common/SingleCluster. html
2. The Second Method
You can use Cloudera Virtual-box
3. The Third Method
You can Use Hortonworks Virtual-box
To begin your hands-on and deploy virtual box click here:
https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/1/
Courtesy for the Image: https://www.slideshare.net/LiorSidi/hadoop-ecosystem-65935516
The below mentioned links might be useful for self-learning on 'Hadoop ecosystem'.
http://hadooptutorial.info
https://data-flair.training/blogs/hadoop-tutorial/
Sqoop Hands-on
https://sqoop.apache.org/docs/ 1.4.6/SqoopUserGuide.html
Suggested YouTube Channels
eureka!
https://www.youtube.com/user/edurekaIN
Intellipaat
https://www.youtube.com/user/intellipaaat/videos
This article might be useful for the beginners those who wish to learn Big Data frameworks and choose their career in the stream of "Big Data Analytics, Big Data Scientist and Big Data Administrator". It is good that most of the tools and frameworks are open source.
Let us start with the definition of Big Data
Big data is the domain which deals with voluminous of data both in veracity and velocity in-terms of Mega, Giga, Tera, Peta, Exa and Zetta bytes. For example social media analysis (Recommenders), Online Transactional Data Processing (OLTP), e-commerce data (eBay, Amazon, FlipKart), ERP, Online Analytical Processing (OLAP), Financial Data, XML, JSON Objects, Customer Relationship Management (CRM), Sparse Data (Non IT Devices like sensors), larger volumes of image, video, audio data (NETFLIX AND AMAZON PRIME) etc.
To start with hand-on practices there are three methods are suggested.
1. The first method
You can setup your own 'Virtual Machine' using Apache Hadoop & Hadoop Distributed File System (HDFS) and the required eco systems for data ingestion, data processing, data analytics and user interface based on the type of database either SQL or NoSQL (NoSQL stands for Not Only SQL and provides mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. ).
Hadoop Installation requires Java and SSH (Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network) Essentials.
http://hadoop.apache.org/docs/2. The Second Method
You can use Cloudera Virtual-box
Cloudera QuickStart VMs (single-node cluster) make it easy to quickly get hands-on with CDH for testing, demo, and self-learning purposes, and include Cloudera Manager for managing your cluster. Cloudera QuickStart VM also includes a tutorial, sample data, and scripts for getting started.
Virtual Box download site https://www.virtualbox. org/wiki/Downloads select your host operating system and download it and install.
Cloudera QuickStart VM download link https://www.cloudera.com/ downloads/quickstart_vms/5-13. html Select your platform as Virtual Box and click download. To download you need to create Cloudera account its free.
3. The Third Method
You can Use Hortonworks Virtual-box
To begin your hands-on and deploy virtual box click here:
https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/1/
Mindmap for Hadoop Ecosystem
Courtesy for the Cheat Sheet: http://mattturck.com/wp-content/uploads/2018/07/Matt_Turck_FirstMark_Big_Data_Landscape_2018_Final.png
Since Hadoop and its eco system have got vast number of tools and framework, I would recommend you to start with...
Courtesy for the Image: https://www.slideshare.net/LiorSidi/hadoop-ecosystem-65935516
Be familiar with any one of the following NoSQL Database
MongoDB, CouchDB, Cassandra, HBase
Be familiar with any one of the following SQL Database
Oracle, MYSQL, PostgreSQL, IBM Informix & DB2
Data Ingestion Tool
One data Ingestion tool for SQL (Ex. Sqoop)
One data Ingestion tool for NoSQL (Ex. Flume)
Data Streaming
Spark Stream or Apache Storm
Data Streaming
Spark Stream or Apache Storm
Data Processing
SQL Data Processing tool (Ex. HDFS & MapReduce)
NoSQL Data Processing tool (Ex. HBase)
Data Analytics
SQL Processing (Ex. HIVE)
NoSQL Processing (Ex. PIG Latin)
Machine Learning Algorithms
Spark Machine Learning Library (MLib)
Mahout Framework
User Interface
Hadoop User Experience (HUE) is recommended for Programmer
Cloudera Search is recommended for non IT users
Once you become versatile in using the above mentioned tools/framework you can move with the following:
Tableau, Ozzie, Zookeeper, Impala, HIVE + TEZ, Spark, Phenix, and Kafka, security tools, authentication and management etc.
The below mentioned links might be useful for self-learning on 'Hadoop ecosystem'.
http://hadooptutorial.info
https://data-flair.training/blogs/hadoop-tutorial/
Good article for MR1, MR2 (YARN)
Excellent Website for list of NoSQL
Sqoop Hands-on
https://sqoop.apache.org/docs/
Suggested YouTube Channels
eureka!
https://www.youtube.com/user/edurekaIN
Intellipaat
https://www.youtube.com/user/intellipaaat/videos
Stay tuned for future episodes - Learn Hadoop Eco System
Python Training in Hyderabad. Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace.
ReplyDeleteAmazon web services Training in Hyderabad
Amazon web services Course in Hyderabad