Fast image similarity search by distributed locality sensitive hashing


Durmaz O., BİLGE H. Ş.

PATTERN RECOGNITION LETTERS, cilt.128, ss.361-369, 2019 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 128
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1016/j.patrec.2019.09.025
  • Dergi Adı: PATTERN RECOGNITION LETTERS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.361-369
  • Anahtar Kelimeler: Fast retrieval, Hashing, Image search, Distributed processing, Locality sensitive hashing, LARGE-SCALE, BIG DATA, RETRIEVAL, QUANTIZATION
  • Gazi Üniversitesi Adresli: Evet

Özet

Approximate Nearest Neighbor (ANN) search approaches that use possible neighbors instead of exact neighbors are widely investigated by researchers in recent years. ANN approaches are usually applied in a centralized manner. However, in real world applications data is usually stored in a distributed manner. This situation led to the need for implementing ANN methods in a distributed way. In this study, our goal is to perform fast and accurate search on large size image datasets by using distributed environments. For this purpose, we propose an approach called as Randomized Distributed Hashing (RDH) which uses Locality Sensitive Hashing (LSH) in a distributed scheme. In this approach, we have randomly distributed data to different nodes on a cluster. After the distribution of data, in each node we have used same randomized hash function set for indexing the local data. Then at the query stage, the query sample is locally searched in different nodes. By exploiting from parallelism, the query time performance is significantly increased. We have a speed up of 8 for the query performance in the distributed scheme with 10 nodes. The level of Mean Average Precision (MAP) scores are quite high which are comparable to other methods. We have also investigated the usage of different and selected randomized hash functions in different nodes rather than using same indexing. We create selected hash functions according to their data division property before indexing. Since LSH is data independent method, we have obtained similar results with using same hash functions. We compared our experimental results with state-of-the-art methods given in a recent study. The proposed distributed scheme is promising for searching images in large datasets with multiple nodes. (C) 2019 Elsevier B.V. All rights reserved.