Big Data Anonymization with Spark


Canbay Y., SAĞIROĞLU Ş.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.833-838 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/ubmk.2017.8093543
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.833-838

Özet

Privacy is an important issue for big data including sensitive attributes. In the case of directly sharing or publishing these data, privacy breach occurs. In order to overcome this problem, previous studies were focused on developing big data anonymization techniques on Hadoop environment. When compared to Hadoop, Spark facilitates to develop faster applications with the help of keeping data in memory instead of hard disk. Despite a number of projects were developed on Hadoop, now this trend is shifting to Spark. In addition, the problem of anonymizing big data streams for realtime applications can be solved with Spark technology. Hence to sum up, Spark is the main technology facilitates developing both faster anonymization applications and big data stream anonymization solutions. In this study, anonymization techniques, big data technologies and privacy preserving big data publishing was reviewed and a big data anonymization model based on Spark was proposed for the first time. It is expected that the proposed model might help to researchers to solve big data privacy issues and also provide solutions for new generation privacy violations problems.