BIG DATA ANALYTICS

– A Complete E2E Solution

Join Us on 06-02-2017 (to 08-02-2017.)

Amar Sharma (Founder and CEO)

We are pleased to announce on boarding of Mr. Amar Sharma. He is a pass out from IIT Roorkee (Electrical ) and IIT Delhi ( Computers ). He has rich knowledge and the possibly among the best experience in Big Data and Analytics. He has served many big companies ( Yahoo/Microsoft/Motorola/Synopsys etc. ) and provided consultancy to successful startups( Tiffin Ala-Carte, Cloud Theta etc.).

BIG DATA ANALYTICS – A Complete E2E Solution

With the pace of business today, it’s easy to lose track of what’s going on. It’s also becoming increasingly difficult to derive value from data quickly enough that the data is still relevant. Oftentimes companies struggle with a situation where by the time the data has been crunched and visualized in a meaningful way.One of the strategies that many organizations are using to make sense of the vast amounts of helpful data their infrastructure generates is by collecting all the logging information from various infrastructure components, crunching it by correlating time stamps and using heuristics that take relationships between infrastructure entities into account, and presenting it in a report or dashboard that brings the important metrics to the surface.

Elasticsearch, along with Logstash and Kibana, provides a powerful platform for indexing, searching and analyzing your data. ELK Stack is one way modern organizations choose to accomplish this. As the name (“stack”) implies, ELK is not actually a tool in itself, but rather a useful combination of three different tools – Elasticsearch ,Logstash, and Kibana – hence ELK. All three are open source projects maintained by Elastic. The descriptions of each tool from Elastic on their website are great, so I have opted not to re-write them. Elastic says they are:
the tools respectively provide fast searching over a large data set, collect and distribute large amounts of log data, and visualize the collected and processed data.

ActiveMQ,RabbitMQ,Kafka : A Brief

Messaging is one of the most important aspects of modern programming techniques. Majority of today’s systems consist of several modules and external dependencies. If they weren’t able to communicate with each other in an efficient manner, they wouldn’t be very effective in carrying out their intended functions.


ActiveMQ is used to reliably communicate between two distributed processes. When you try to scale that up communicating thousands of messages per second, databases tend to fall over.


Message oriented middleware(MOM) like ActiveMQ on the other hand are built to handle those use cases. They assume that messages in a healthy system will be deleted very quickly and can do optimizations to avoid the overhead.


It can also push messages to consumers instead of a consumer having to poll for new message by doing a SQL query. This further reduces the latency involved in processing new messages being sent into the system.No, arbitrary applications can communicate with each other over ActiveMQ.


For example, applications A and B could create queues A.B and B.A (read: messages


for A from B and the other way round) and send messages for each other to the matching queue.


ActiveMQ, is designed for the purpose of sending messages between two applications, or two components inside one application.JMS, which is an API ActiveMQ implements, is an important corner stone in Java Enterprise applications.


This makes messages share a rather common format and semantic, which makes integration between different applications easier.


Of course, there are a lot of more detailed features that are only in ActiveMQ, wire protocols likeOpenWire,STOMP and MQTT, JMS, EIP together with Apache Camel, message patterns like "request/reply" and "publish/subscribe", JMS Bridging, clustering ("network of brokers"), which allow scaling and distributions etc.


RABBITMQ

RabbitMQ is a message queue system for processing tasks asynchronously or where it makes sense to decouple your application from another application or service.Using RabbitMQ ,Messaging enables software applications to connect and scale. Applications can connect to each other, as components of a larger application, or to user devices and data.Messaging through RabbitMQ is asynchronous, decoupling applications by separating sending and receiving data.


kafka

Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.


It gets used for two broad classes of application:


1.Building real-time streaming data pipelines that reliably get data between systems or applications


2.Building real-time streaming applications that transform or react to the streams of data


Kafka has several features that make it a good fit for our requirements:


scalability, data partitioning, low latency, and the ability to handle large number of diverse consumers. Unlike traditional message system, a message stored in Kafka system doesn’t have explicit message ids.


Program Schedule:

Day 1- Introduction to Big Data and Hadoop

http://woir.in/day-1-assignments-tutorial-for-pvpsit-vijyawada/


Day 2 : NoSQL(Elasticsearch) and Real Time Analytics

http://woir.in/day-2-assignments-tutorial-for-pvpsit/


Day 3 : Fitment of Cloud, Message Broker into Analytics

http://woir.in/day-3-assignments-tutorial-for-pvpsit/


Resources e-Books