This is the last week for the structured part of the curriculum....moving forward, it's projects, hiring day and interviews.

We started the week by reviewing material from the first three weeks : Statistics, distributions, hypothesis testing, Bayesian A/B testing and different distance metrics (jaccard, euclidean, hamming, cosine) and then we moved to reviewing material from the second four weeks : web scraping with beautifulsoup and Naive Bayes. We had a breakout on advanced web scraping with scrappy and mechanize. You use these advanced tools when you have complex scraping requirements like javascript loading, pagination, etc. Tools like kimono and import.io were also introduced. These tools can be a lifesaver especially when you're working on a tight schedule.

By midweek, we reviewed some of the material we covered earlier : more Naive Bayes. regression and outliers. Next day, we worked on an assessment. It was an NLP classification problem from one of our partner companies. The dataset was fairly small and the problem was well defined. Things got a bit interesting when I tried doing "extensive" grid search and cross validation on a pipeline of different models. It took more than 12 hours to go through one model.... this is where you either push things to a beefy machine on AWS or perform a randomized grid search to help you find the best hyperparameters for your model. We also reviewed sample interview problems.

We finished off the week with another assessment. This was more involved and had a few parts to it : some data wrangling of machine generated log data for a video content site, classification, regression, clustering and building a recommendation engine. It really feels like things are winding down as we move to personal projects next week.

Highlights from the week:

- We had a guest lecture from @enjalot. He's working on the bleeding edge of data visualization. Some of the things he's worked on include this visualization of BART employee salaries that made the rounds during the BART strike

## No comments:

## Post a Comment