Performance-Aware Big Data Management for Remote Sensing Systems


Creative Commons License

Pekturk M. K., Ünal M., Gökçen H.

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, cilt.49, sa.3, ss.3845-3865, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 49 Sayı: 3
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s13369-023-08172-2
  • Dergi Adı: ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Aerospace Database, Communication Abstracts, Metadex, Pollution Abstracts, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.3845-3865
  • Anahtar Kelimeler: Big data analytics, Distributed computing, Geo-distributed cloud, Lagrange relaxation, Optimization model, Remote sensing
  • Gazi Üniversitesi Adresli: Evet

Özet

Remote sensing data, whose dimensions increase exponentially and turn into big data with the new technologies, cause significant difficulties in transferring, storing, and processing because of consisting of gigantic coarse-grained files. This article proposes a novel two-phase big data management system on the geo-distributed private cloud that takes advantage of network topology and resource utilization in a distributed manner. The system optimizes resource allocation to facilitate efficient and extensive data analysis for remote sensing applications by minimizing file fragmentation, resulting in faster analysis. In order to simulate the proposed system, different network topologies are created using virtual machines. Moreover, the proposed method named performance-aware assignment is compared with well-known methods such as random assignment, Hungarian algorithm, and Hadoop Distributed File System, also famous in the big data era. The experimental results indicate that performance-aware assignment outperforms random assignment, Hungarian algorithm, and Hadoop Distributed File System, achieving 36%, 26%, and 71% more stored data, respectively, within the same time while also exhibiting lower IOPS values. In addition, it optimizes resource usage in data centers, which is particularly important for preventing resource exhaustion.