Superminhash - A New Minwise Hashing Algorithm for Jaccard Similarity Estimation

Data Structures and Algorithms,, arXiv preprint arXiv:1706.05698, 2017

This paper presents a new algorithm for calculating hash signatures of sets which can be directly used for Jaccard similarity estimation. The new approach is an improvement over the MinHash algorithm, because it has a better runtime behavior and the resulting signatures allow a more precise estimation of the Jaccard index.