A Mondrian-based Utility Optimization Model for Anonymization


CANBAY Y., SAĞIROĞLU Ş., VURAL Y.

2019 4th International Conference on Computer Science and Engineering (UBMK), Samsun, Turkey, Türkiye, 11 - 15 Eylül 2019 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2019.8907117
  • Basıldığı Şehir: Samsun, Turkey
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Anonymization, Mondrian, outliers, utility optimization
  • Gazi Üniversitesi Adresli: Evet

Özet

Anonymization is a privacy-preserving approach facilitating to protect the identities of data subjects. Besides privacy, anonymization presents a data utility that is very important for data analytics. The accuracy of a data analytics model, which is created on anonymous data, depends on the utility provided by anonymization. While there exist some factors directly affecting data utility such as privacy level, anonymization operators, anonymization algorithm, etc. Recent studies presented that outliers are another factor that affects data utility too. In the anonymization domain, outliers are accepted as "a group of data that decreases total data utility". In this paper, in order to maximize the data utility, a new data utility model is proposed and applied to Mondrian for the first time. A general outlier-based utility evaluation function is also introduced to measure this utility for the first time. Experimental results have shown that the proposed model improves data utility and presents higher utility than Mondrian. The proposed function might be also used as a reliable tool for outlier-based utility-aware anonymization models.