Posts

Showing posts from November, 2018

SQL and NoSQL Databases - PART 4

To read previous related articles. Hadoop Eco System Installation - PART 1  Hadoop Introduction & Terminologies - PART 2 Hadoop Data Format, Ingestion, Streaming - PART 3 Big Data Hadoop Eco System handles both SQL and NoSQL  databases based on the data source and business application.  SQL Database What is SQL database? SQL  stands for Structured Query Language.  SQL  lets you access and manipulate  databases . Where it is needed? SQL is important because of the following  main reasons  - a.  SQL helps you to find the needed information or data easily. b.  SQL is a query language, not a programming language. You can easily write commands almost same as you write English. c.  It quickly stores and gets data from the database quickly. SQL is used for the query, insert, collect and manages data from the database. d.  Almost every database system will need SQL for further processing. Popular SQL Databases  MYSQL Oracle PostgreS

Free e-Books and YouTube Gallery for Researchers

Image
 Dear Researcher, This article might be useful for researcher in all stages of your research activity right from problem statement, literature review, hypothesis formulation, design of experiments, experimentation and research writings. YouTube video to watch Art of Research Writing: https://www.youtube.com/watch?v=02iqxtigkK4 Roadmaps and Important links. To learn languages based on projects. Github:  https://github.com/tuvtran/ project-based-learning Python Machine Learning Book Github:  https://github.com/rasbt/ python-machine-learning-book Coding Practice and Algorithms Github:  https://github.com/jwasham/ coding-interview-university What every programmer should know Github:  https://github.com/mtdvio/ every-programmer-should-know Awesome public datasets Github:  https://github.com/ awesomedata/awesome-public- datasets Awesome Machine Learning Github:  https://github.com/ josephmisiti/awesome-machine- learning Awesome Deep V

Hadoop Data Format, Ingestion, Streaming - PART 3

Image
To read previous related articles. Hadoop Eco System Installation - PART 1  Hadoop Introduction & Terminologies - PART 2 Data Formats Data Formats :  Way of representing and storing the raw data in the secondary storage devices.  Each of the data file format have got its own list of pros and cons depending upon the business context and usecase. Plain Text Files ( CSV, TSV, XML or JSON files ), binary files , rich file formats like Avro, ORC and Parquet To know more on data format click here:  https://techmagie.wordpress.com/category/big-data/data-formats/ Data Ingestion &  Data Streaming Ingest data in Batches or Stream data in real time. Both of the processes allows to collect, load, transfer, integrate and process data from wide range of data sources.  Data Ingestion Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. To ingest something or to take something in batches or chunks

Hadoop Introduction & Terminologies - PART 2

Image
Please refer the previous article for continuity. Click here for the link. https://educationforempowerment.blogspot.com/2018/11/hadoop-eco-systems-part-1.html Fundamentals The foundation of Big Data & Hadoop Eco System lays on Distributed Operating System, Distributed File System, Data Structures and Database Management System. A distributed system is a model in which components on networked computers communicate and coordinate their actions by passing messages.  How does a distributed System Work? Single machine having multiple I/O channels and each channel is cable to stream the data 100MB's. In recent times, distributed systems have been replaced by Hadoop. Hadoop is used to overcome the shortfalls  of distributed systems like high chances of system failure, limited bandwidth and high programming complexity. Hadoop Hadoop is  framework that allows distributed processing of large datasets across clusters of computers using single programming model. Dou