Tuesday, January 28, 2014

Week 1 : Zipfian Academy - Priming the Pump , some Unix shell, python, recommenders and data wrangling

So things got off to a speedy start for the Zipfian Academy winter cohort. I think this first week was used to establish a pace for the course and it was intense.  We probably spent about 8 - 10 hours a day on lectures / sprints and working on problems. I would imagine some folks probably spent a few more hours working on code / readings when they got back home.

For the first week, we covered the following in no particular order :  overview of probability , Git / Github, some Bayesian Statistics,  Bash / Unix shell, python, pdb, ipython, TDD, OOP, SVD, matrix factorization, overview of Linear Algebra and built a recommender system based on a large Amazon dataset

A few interesting themes I noticed
  • Pair programming is woven into the fabric of the program. We'll be pair programming for the first few weeks and will break off after about the first month or so. It does take a lot of getting used to especially if you haven't done it before or not used to it.
  • Git / Github which is probably the standard version control tool at most tech shops is also tightly integrated into the program. From the first day, we were expected to fork repos and also push and pull code. 
  • Emergence of ipython as a truly awesome collaborative tool for data analysis. If you do python and you haven't used ipython and ipython notebooks, please stop reading this blog post and google them, you'll thank me.. seriously.
  • The ever powerful and omni-present Bash and Unix command line. Every now and then, you're reminded how powerful these tools are for data analysis and data pipelining tasks.
  • It is amazing how much you can learn by osmosis when you find yourself immersed in a collaborative environment with very like-minded colleagues.
  • In thinking like a Data Scientist, your mind always has to be on the business problem you're trying to solve. It's not always about running the fastest or most exciting machine learning algorithm.
Looking at the curriculum for the next couple of weeks. Saying it will be intense will probably be an understatement, but I'll try and put out weekly updates as time permits.

Sunday, January 19, 2014

Raspberry Pi + Motorola Atrix Lapdock

A few weeks ago, I got my hands on a Raspberry Pi and was able to connect it to a Motorola Atrix Lapdock (this is basically a glorified monitor / keyboard) . You could also run it headless and connect to it via ssh or vnc server


The Lapdock was probably ahead of its time when it was released a few years ago. It was initially overpriced and didn't do well so Motorola eventually pulled the plug on it.

You can get one of these for 30 - 40 bucks on ebay. The whole setup cost less than 100 bucks even with the extra cables that had to be purchased and the Raspberry Pi.

I've seen some interesting projects built with this miniature board like a Hadoop Compute Cluster based on the Pi, a 64 Node Supercomputer and the countless home automation / robot projects out there

There are a few websites dedicated to projects with the Raspberry Pi. Of note are Ada Fruit and this aggregation site. You can also check out this very detailed guide with wiring instructions to pull this hack off

The type of projects you could build with the Pi are only limited by your imagination. I've also seen projects where it was used as a remote data logging / aggregation device in bee hives, connected security cameras, motion detectors and you can probably guess this list goes on.