quitefastmst: Euclidean and Mutual Reachability Minimum Spanning Trees

Keywords: Euclidean minimum spanning tree, MST, EMST, mutual reachability distance, nearest neighbours, k-nn, k-d tree, Borůvka, Prim, Jarník, Kruskal, Genie, HDBSCAN*, DBSCAN, clustering, outlier detection.

Package features:

  • Euclidean Minimum Spanning Trees using single-, sesqui-, and dual-tree Borůvka algorithms – quite fast in spaces of low intrinsic dimensionality,

  • minimum spanning trees with respect to mutual reachability distances based on the Euclidean metric (used in the definition of the HDBSCAN* algorithm; see [3]),

  • Euclidean nearest neighbours with nicely-optimised K-d trees,

  • relatively fast fallback algorithms for spaces of higher dimensionality,

  • supports multiprocessing via OpenMP (on selected platforms).

Possible applications in topological data analysis: clustering (HDBSCAN*, Lumbermark, Genie, Single linkage, etc.), outlier detection (Deadwood), density estimation, dimensionality reduction, and many more.

Contributing

quitefastmst is distributed under the open source GNU AGPL v3 license. Its source code can be downloaded from GitHub.

The Python version is available from PyPI. The R version can be fetched from CRAN.

The core functionality is implemented in the form of a C++ library. It can thus be easily adapted for use in other environments. New contributions are welcome, e.g., Julia, Matlab/GNU Octave wrappers.

Author and Maintainer: Marek Gagolewski