Show HN: Want something better than k-means? Try BanditPAM Want something better than k-means? I'm happy to announce our SOTA k-medoids algorithm from NeurIPS 2020, BanditPAM, is now publicly available! `pip install banditpam` or `install.packages("banditpam")` and you're good to go! k-means is one of the most widely-used algorithms to cluster data. However, it has several limitations: a) it requires the use of L2 distance for efficient clustering, which also b) restricts the data you're clustering to be vectors, and c) doesn't require the means to be datapoints in the dataset. Unlike in k-means, the k-medoids problem requires cluster centers to be actual datapoints, which permits greater interpretability of your cluster centers. k-medoids also works better with arbitrary distance metrics, so your clustering can be more robust to outliers if you're using metrics like L1. Despite these advantages, most people don't use k-medoids because prior algorithms were too slow. In our NeurIPS 2020 paper, BanditPAM, we sped up the best known algorithm from O(n^2) to O(nlogn) by using techniques from multi-armed bandits. We were inspired by prior research that demonstrated many algorithms can be sped up by sampling the data intelligently, instead of performing exhaustive computations. We've released our implementation, which is pip- and CRAN-installable. It's written in C++ for speed, but callable from Python and R. It also supports parallelization and intelligent caching at no extra complexity to end users. Its interface also matches the sklearn.cluster.KMeans interface, so minimal changes are necessary to existing code. PyPI: https://ift.tt/ErPjAhJ CRAN: https://ift.tt/Bo3JfmW Repo: https://ift.tt/PmSi5hb Paper: https://ift.tt/1UNrLys If you find our work valuable, please consider starring the repo or citing our work. These help us continue development on this project. I'm Mo Tiwari (motiwari.com), a PhD student in Computer Science at Stanford University. A special thanks to my collaborators on this project, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, and Ilan Shomorony, as well as the author of the R package, Balasubramanian Narasimhan. (This is my first time posting on HN; I've read the FAQ before posting, but please let me know if I broke any rules) https://ift.tt/PmSi5hb April 5, 2023 at 01:46AM
Show HN: Want something better than k-means? Try BanditPAM https://ift.tt/C1qDydO
Related Articles
Show HN: NeutronSync encrypted dot/config file synchronization https://ift.tt/3F82OUtShow HN: NeutronSync encrypted dot/config file synchronization https:/… Read More
Show HN: Automate Job Application Process https://ift.tt/3oiPU05Show HN: Automate Job Application Process https://lazyapply.com Septem… Read More
Show HN: ParEvil – Structural Editing Keybindings for ParEdit into Evil Mode https://ift.tt/3a3MEgJShow HN: ParEvil – Structural Editing Keybindings for ParEdit into Evi… Read More
Show HN: Pepper – Find Low Carbon Food https://ift.tt/3olQA4NShow HN: Pepper – Find Low Carbon Food https://ift.tt/3B0Sqvv October … Read More
Show HN: AirSS is a web-based feed reader that put your privacy first https://ift.tt/3ux5eagShow HN: AirSS is a web-based feed reader that put your privacy first … Read More
Show HN: WebSockets Explained with TypeScript https://ift.tt/3Dbf3xXShow HN: WebSockets Explained with TypeScript https://ift.tt/2Yi8Y3S O… Read More
Show HN: Built Codegame for developers to create programming games with Markdown https://ift.tt/3zOGQlvShow HN: Built Codegame for developers to create programming games wit… Read More
Show HN: New Oberon+ IDE based on the Mono CLR – lean and fast https://ift.tt/3mjTcgHShow HN: New Oberon+ IDE based on the Mono CLR – lean and fast https:/… Read More
0 Comments: