Big Data Anonymization with Spark

Canbay Y., SAĞIROĞLU Ş.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.833-838, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/ubmk.2017.8093543
Basıldığı Şehir: Antalya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.833-838
Anahtar Kelimeler: big data, anonymization, privacy preserving, hadoop, spark, model, review, PRIVACY
Gazi Üniversitesi Adresli: Evet

Özet

Privacy is an important issue for big data including sensitive attributes. In the case of directly sharing or publishing these data, privacy breach occurs. In order to overcome this problem, previous studies were focused on developing big data anonymization techniques on Hadoop environment. When compared to Hadoop, Spark facilitates to develop faster applications with the help of keeping data in memory instead of hard disk. Despite a number of projects were developed on Hadoop, now this trend is shifting to Spark. In addition, the problem of anonymizing big data streams for realtime applications can be solved with Spark technology. Hence to sum up, Spark is the main technology facilitates developing both faster anonymization applications and big data stream anonymization solutions. In this study, anonymization techniques, big data technologies and privacy preserving big data publishing was reviewed and a big data anonymization model based on Spark was proposed for the first time. It is expected that the proposed model might help to researchers to solve big data privacy issues and also provide solutions for new generation privacy violations problems.