streaming Archives - Oracle Alchemist

MapReduce with Hadoop Streaming in bash – Part 3

October 23, 2013 Steve Karam Big Data, Development, News 2 comments

In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in the collection appeared in the collection. In the second part, we calculated ‘document frequency’

MapReduce with Hadoop Streaming in bash – Part 2

October 22, 2013 Steve Karam Big Data, Development, News 6 comments

In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce jobs tokenized the words and found their frequency among the entire set. The

MapReduce with Hadoop Streaming in bash – Part 1

October 21, 2013 Steve Karam Big Data, Development, News 8 comments

So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming. Hadoop Streaming Hadoop Streaming allows you to write MapReduce code in any language that can process stdin and stdout. This includes Python, PHP, Ruby, Perl, bash, node.js, and