I wanted to evaluate Apache Flume for the ingress of some web server access logs to get a feel for how Apache Flume works. In this post I will go over some of the things that I found out about Apache Flume, as well as a few examples of situations in which it would be … Continue reading Using Apache Flume for log ingress?
Purpose In this post I have captured my work configuring Pentaho Data Integration (PDI) for use with Hadoop. It is formatted as a tutorial on how to setup PDI 4.4 with Hadoop 1.2.0 for your use. Prerequisites Java 1.6 or later (Not the OpenJDK distro as it is not compatible with this version of PDI) … Continue reading Configuring PDI for use with Hadoop 1.2.0
Purpose In this blog post, I have captured the work I did in setting up a small (four node) Apache Hadoop cluster and have documented it in a manner that makes it easy to follow if you want to setup your own. I plan to use this cluster with Pentaho Data Integration (PDI) to do … Continue reading Setting up a small Hadoop cluster