TR2011-077

Secure Binary Embeddings for Privacy Preserving Nearest Neighbors



TR Image

We present a novel method to securely determine whether two signals are similar to each other, and apply it to approximate nearest neighbor clustering. The proposed method relies on a locality sensitive hashing scheme based on a secure binary embedding, computed using quantized random projections. Hashes extracted from the signals preserve information about the distance between the signals, provided this distance is small enough. If the distance between the signals is larger than a threshold, then no information about the distance is revealed. Theoretical and experimental justification is provided for this property. Further, when the randomized embedding parameters are unknown, then the mutual information between the hashes of any two signals decays to zero exponentially fast as a function of the distance between the signals. Taking advantage of this property, we suggest that these binary hashes can be used to perform privacy-preserving nearest neighbor search wit significantly lower complexity compared to protocols which use the actual signals.