Technologies

Hadoop

Hadoop is a distrubuted system that is implemented over a cluster of computers. There can be many nodes in a cluster but there can only be one control node known as namenode. A cluster can contain many nodes that are slaves to the namenode these are known as datanodes. These datanodes can usually hold the storage space for HDFS or Hive but also have resources to perform processes like Map/Reduce

Hive

Hive is a Hadoop implementation of a relational database that is used for data warehousing for large data sets. Each query written in its own language called HiveQl, acts like a normal SQL query but runs as a Map/Reduce job. Indexing is made easier and makes querying large datasets quicker and more efficient

HDFS

HDFS is the distributed storage facility that is included within Hadoop. Any datanode that is in the cluster and configured to use HDFS will be a replication of all other datanodes, this means that all data are synchronized across the cluster. In the event that a datanode has to be excluded from the cluster, the data that is stored on that datanode is already available within the cluster.

Oozie

It can be difficult to run a chain of processes together, but Oozie is technology that provides workflow functionality to the processes available in Hadoop.

Oozie also provides functionality to manage workflows. Through Oozie and the Red Sqirl Process Manager tab, you can review job details and control running jobs.

return to Red Sqirl help

Pig

Apache Pig is a platform for analyzing large data sets, which consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties: