In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in the collection appeared in the collection. In the second part, we calculated ‘document frequency’
Category: Development
Anyone who says developers and DBAs can’t work together is wrong; it is vital that we work together. In many ways we share job roles and responsibilities, DevOps or no DevOps. These articles are developer-centric, or at the least feature important information for the DBA as it pertains to the development role.
MapReduce with Hadoop Streaming in bash – Part 2
In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce jobs tokenized the words and found their frequency among the entire set. The
MapReduce with Hadoop Streaming in bash – Part 1
So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming. Hadoop Streaming Hadoop Streaming allows you to write MapReduce code in any language that can process stdin and stdout. This includes Python, PHP, Ruby, Perl, bash, node.js, and
Hadoop Developer Training – Day 4
This will be a short(ish) post, as my brain is relatively fried from the obscene amount of knowledge imparted by the Hadoop class (and it’s Friday so I’m allowed to be lazy nanny nanny boo boo). To be honest, this was probably the most fun day of class. While the
Hadoop Developer Training – Day 3
Cloudera Developer Training for Apache Hadoop is almost over, and I’m somewhat sad that my Hadoopin’ days are nearly done–in the classroom at least. However, the breadth of this training has been great and I can definitely say I’ve gotten my (company’s) money’s worth. Being that I’m three days in,
Hadoop Developer Training – Day 2
Yesterday I completed the second day of Cloudera Developer Training for Apache Hadoop. While the first day focused on Hadoop core technology like HDFS, the second day was all about MapReduce. That means it was the day that whole ‘developer’ thing was thrown into sharp relief. I’ve been a DBA
Hadoop Developer Training – Day 1
What’s the best way to follow up a week of Oracle OpenWorld? Cloudera Developer Training for Apache Hadoop of course. So today I had my first day. I won’t detail the course itself (though I hope there will be many Hadoop posts to come). But I would like to share
Daydreams 2 – Rollbacks (Ace Comic)
At the Council of IT… Related Pay attention Ace! Daydreams #1 Rolling back parameter changes? The Sin of Band-Aids
Linked Data, RDF, and SPARQL
Getting Into Linked Data and the Semantic Web This is a post about the RDF metadata model and SPARQL query language, two components of the Linked Data structured data methodology. All of this is part of the collective vision known as the Semantic Web–a Web of Data that can be
Genesis (Ace Comic)
The magic of creation… Related It was probably a database problem: Changes (Ace Comic) Avoiding Armageddon: High Performance Tuning Tools