Chris McCormick    About    Membership    Blog Archive

Become an NLP expert with videos & code for BERT and beyond → Join NLP Basecamp now!

Getting Started with mlpack

I’ve recently needed to perform a benchmarking experiment with k-NN in C++, so I found mlpack as what appears to be a popular and high-performance machine learning library in C++.

DBSCAN Clustering

DBSCAN is a popular clustering algorithm which is fundamentally very different from k-means.

Interpreting LSI Document Similarity

In this post I’m sharing a technique I’ve found for showing which words in a piece of text contribute most to its similarity with another piece of text when using Latent Semantic Indexing (LSI) to represent the two documents. This has proven valuable to me in debugging bad search results from “concept search” using LSI. You’ll find the equations for the technique as well as example Python code.

Word2Vec Resources

While researching Word2Vec, I came across a lot of different resources of varying usefullness, so I thought I’d share my collection of links and notes on what they contain.