Show HN: Want something better than k-means? Try BanditPAM Want something better than k-means? I'm happy to announce our SOTA k-medoids algorithm from NeurIPS 2020, BanditPAM, is now publicly available! `pip install banditpam` or `install.packages("banditpam")` and you're good to go! k-means is one of the most widely-used algorithms to cluster data. However, it has several limitations: a) it requires the use of L2 distance for efficient clustering, which also b) restricts the data you're clustering to be vectors, and c) doesn't require the means to be datapoints in the dataset. Unlike in k-means, the k-medoids problem requires cluster centers to be actual datapoints, which permits greater interpretability of your cluster centers. k-medoids also works better with arbitrary distance metrics, so your clustering can be more robust to outliers if you're using metrics like L1. Despite these advantages, most people don't use k-medoids because prior algorithms were too slow. In our NeurIPS 2020 paper, BanditPAM, we sped up the best known algorithm from O(n^2) to O(nlogn) by using techniques from multi-armed bandits. We were inspired by prior research that demonstrated many algorithms can be sped up by sampling the data intelligently, instead of performing exhaustive computations. We've released our implementation, which is pip- and CRAN-installable. It's written in C++ for speed, but callable from Python and R. It also supports parallelization and intelligent caching at no extra complexity to end users. Its interface also matches the sklearn.cluster.KMeans interface, so minimal changes are necessary to existing code. PyPI: https://ift.tt/ErPjAhJ CRAN: https://ift.tt/Bo3JfmW Repo: https://ift.tt/PmSi5hb Paper: https://ift.tt/1UNrLys If you find our work valuable, please consider starring the repo or citing our work. These help us continue development on this project. I'm Mo Tiwari (motiwari.com), a PhD student in Computer Science at Stanford University. A special thanks to my collaborators on this project, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, and Ilan Shomorony, as well as the author of the R package, Balasubramanian Narasimhan. (This is my first time posting on HN; I've read the FAQ before posting, but please let me know if I broke any rules) https://ift.tt/PmSi5hb April 5, 2023 at 01:46AM
Show HN: Want something better than k-means? Try BanditPAM https://ift.tt/C1qDydO
Related Articles
Show HN: Data Formulator – AI-powered data visualization from Microsoft Research https://ift.tt/UxyuSkQShow HN: Data Formulator – AI-powered data visualization from Mic… Read More
Show HN: I made a Sonic runner game in JavaScript https://ift.tt/3eHKqsXShow HN: I made a Sonic runner game in JavaScript https://ift.tt/mJMj3… Read More
Show HN: Floating point arithmetic types in C++ for any size and any base https://ift.tt/F4Mpot6Show HN: Floating point arithmetic types in C++ for any size and any b… Read More
Show HN: OVault – Simple and Local OTP Management for iOS/macOS https://ift.tt/KdLhgsOShow HN: OVault – Simple and Local OTP Management for iOS/macOS OVault… Read More
Show HN: HN Update – Hourly News Broadcast of Top HN Stories https://ift.tt/gUyzWZuShow HN: HN Update – Hourly News Broadcast of Top HN Stories I feel li… Read More
Show HN: I made a tool for curating and sharing links as lists https://ift.tt/PKn4Ez1Show HN: I made a tool for curating and sharing links as lists https:/… Read More
Show HN: I built a tool that helps people scan and clean any repo for secrets https://ift.tt/Nu9px8nShow HN: I built a tool that helps people scan and clean any repo for … Read More
Show HN: Semantic Macros Text Editor https://ift.tt/2zc604yShow HN: Semantic Macros Text Editor https://ift.tt/6Rt8jTv October 21… Read More
0 Comments: