This post was written in my role as a researcher at Nearist, and will soon be on the Nearist website as well.
I recently created a project on GitHub called wiki-sim-search where I used gensim to perform concept searches on English Wikipedia.
I’ve recently needed to perform a benchmarking experiment with k-NN in C++, so I found mlpack as what appears to be a popular and high-performance machine learning library in C++.
In part 2 of the word2vec tutorial (here’s part 1), I’ll cover a few additional modifications to the basic skip-gram model which are important for actually making it feasible to train.
DBSCAN is a popular clustering algorithm which is fundamentally very different from k-means.