Showing posts with label MongoDB. Show all posts
Showing posts with label MongoDB. Show all posts

Monday, June 30, 2014

Big Data - Overview

Everyday we hear lot about Big Data. What is really Big Data ? The Big Data can be defined using Velocity, Variety and Volume. The different types of high volumes of data produced with high rates like TB/GB per day is considered as Big Data. The data variety can be: structured data, Semi-structured data and Unstructured data. The examples for big data are clickstream logs, web logs, customer support chat and emails, social network posts, electronic health records, stock market data, weather data etc. This data if analyzed effectively can give us valuable actionable insights. The big data is gold mine of knowledge helping in predicting user behaviors, patterns and trends, recommending the items and services as per the users profile, predict weather phenomenas, diseases, stock market trends etc.

There are various tools for analyzing the data from software like simple spreadsheets, RDBMS, Hadoop, DWHs, NoSQL databases on the basis of data complexity. The small and structured dataset can be analyzed with spread sheets, but when this dataset grows beyond the size then it can be analyzed using RDBMS. The semi and unstructured data is tough to be analyzed with spread sheets and RDBMS. The problem gets aggravated with massive size of dataset. Hadoop and NoSQL technologies help to overcome these issues. The Hadoop and its ecosystem components like Hive, Pig solves the problem in batch oriented manner whereas NoSQL technologies like Cassandra, HBase, MongoDB provides real time environment for data analysis.

The big data mainly involves techniques like machine learning, statistical modeling, natural language processing, etc.

References:

  1. TeraData Vs Hadoop

  2. Statistical Model

  3. Statistical Inference

  4. Nonlinear Systems

  5. Descriptive Statistics

  6. Big Data