Home Data Lab and Apache Spark Cluster

Recently I purchased an old server off of Craigslist. I wasn't looking for a server, I was looking for a few newer-ish laptops that people were selling for cheap. I wanted to add the laptops to my two laptops I had already networked together which were VERY old. But imagine my suprise when I found a 2 processors, 12 cores, 12 Gb of RAM and three 146 Gb hard drives, server for $125. And it was just a few minutes from my house! I had to at least go look. It was a HP Proliant DL365 G6. It was clean. And he was willing to part with it for $100. SOLD!

A little research I found out it could hold up to 128 G of RAM. eBay had it for $150 - SOLD!

I was hooked.

I have since bought two more servers both for $25 a piece. That isn't a typo - $25 A PIECE! One was a HP Proliant DL165 G6, single processor with 6 cores, without RAM or HDD, but it was $25 The other was a HP Proliant DL385 G5, 2 processors, 8 cores total, no HDD and something like 18 or 20 Gb of RAM. Oh and no power supply ... but it was $25.

eBay to the rescue again. I bought a 2 Tb 3.5" SATA HDD for $40 for the DL165. I put in 16 Gb of memory, part of which came with the DL385 G5 and some of it came from the DL385 G6 when I bought it.

I am going to wait a bit to buy the power supply and HDDs for the G5.

My home network consists of the following:

  • HP Pavilion P6320y 4 cores and 8 GB of RAM, running Arch Linux
  • HP Proliant DL385 G6, 12 cores, 128 GB of RAM, 3x 146 G HDD, running Centos 7
  • HP Proliant DL165 G6, 6 cores, 16 GB of RAM, 1x 2Tb HDD, running Centos 7
  • Total: 22 cores and 152 Gb of RAM

All for $300. Not bad for not setting out to have a cluster of servers. The Pavilion was given to me so no expense there.

I have Apache Spark running on all three with a NFS directory set up.

The very near future plans

I plan on buying a purchasing a power supply and a couple of 146 Gb HDDs for the DL385 G5. It too will be loaded with Centos 7 and Apache Spark. Not sure how much RAM I will install. I think all I have left are 1 Gb modules so however many I can fit in the G5 I'll fill it up. I think it will be around 8 G of RAM.

This will bring my total cores to 30 and RAM to 160 Gb.

Distant future plans

I would eventually like to

  • add another processor to the DL165 to bring the total cores to 12.
  • max out the RAM on the DL165 and DL385 G5
  • add a firewall
  • obtain a static IP address so I can VPN into it whenever I want

Final thoughts

I've read threads on various forums that say doing what I've done is a waste of time and money. I would have to disagree. Prior to the servers my Linux experience was getting Arch Linux up and running on 3 or 4 computers. That helped immensely with my Linux knowledge.

Installing Centos was super easy. In the grand scheme of everything setting up SSH and NFS were pretty painless as well. I found all the answers on the internet.

Setting up Apache Spark to run and talk to each other was a bit more work but nothing that was super frustrating. Almost every issue I ran into were caused by mistakes and typos I made.

Thus far it has been an enjoyable learning experience. The whole point of this is to do some "real" data analysis on some EMS and Fire data to the point where I take it to my department and they realize that have to have a setup like mine. Pipe dream maybe but I don't think it is so far fetched!