Apache Flink and clustering-based framework for fast anonymization of IoT stream data

Published in Intelligent Systems with Applications (Elsevier), 2023

Recommended citation: Sadeghi-Nasab, A., Ghaffarian, H., Rahmani, M. Apache Flink and clustering-based framework for fast anonymization of IoT stream data. Intelligent Systems with Applications (2023). https://doi.org/10.1016/j.iswa.2023.200267 https://www.sciencedirect.com/science/article/pii/S2667305323000923

In this paper, we present a novel framework that considers the expiration period time of the Internet of Things (IoT) data stream to anonymize it. IoT stands among one of most fast-growing technology in the world. Also, anonymity is one of the safeguards in place to protect data privacy. Because of the dynamic nature, vastness, and rapid changes in data streams, traditional approaches cannot be used to anonymize IoT data. The anonymization framework proposed in this paper performs its operation using a new clustering method and Apache Flink flow data processing engine. In this framework, firstly, we cluster received data. Then, if the size of the clusters doesn’t meet the K-anonymity threshold, our review will continue to suppress and delete them; otherwise, the data would be anonymized and published. In this way, the framework handles both numerical and categorical data. At the end of the stream, the final remaining data will be merged and anonymized. Implementing and evaluating the framework using Scala and Apache Flink shows that the proposed approach reduces data delay by 12.33–66.62% compared with the other methods. Furthermore, in the end, combining the leftover clusters avoids information loss. In comparison with similar methods, information loss is reduced by 5.68–18.26%. The evaluation results show better performance in terms of data delay and information loss.

Download paper here

Cite as:

@article{SADEGHINASAB2023200267,
title = {Apache Flink and clustering-based framework for fast anonymization of IoT stream data},
journal = {Intelligent Systems with Applications},
volume = {20},
pages = {200267},
year = {2023},
issn = {2667-3053},
doi = {https://doi.org/10.1016/j.iswa.2023.200267},
url = {https://www.sciencedirect.com/science/article/pii/S2667305323000923},
author = {Alireza Sadeghi-Nasab and Hossein Ghaffarian and Mohsen Rahmani},
keywords = {Internet of Things, Data privacy, Streaming data, Data anonymity, Apache Flink, Data processing engine},
abstract = {In this paper, we present a novel framework that considers the expiration period time of the Internet of Things (IoT) data stream to anonymize it. IoT stands among one of most fast-growing technology in the world. Also, anonymity is one of the safeguards in place to protect data privacy. Because of the dynamic nature, vastness, and rapid changes in data streams, traditional approaches cannot be used to anonymize IoT data. The anonymization framework proposed in this paper performs its operation using a new clustering method and Apache Flink flow data processing engine. In this framework, firstly, we cluster received data. Then, if the size of the clusters doesn't meet the K-anonymity threshold, our review will continue to suppress and delete them; otherwise, the data would be anonymized and published. In this way, the framework handles both numerical and categorical data. At the end of the stream, the final remaining data will be merged and anonymized. Implementing and evaluating the framework using Scala and Apache Flink shows that the proposed approach reduces data delay by 12.33–66.62% compared with the other methods. Furthermore, in the end, combining the leftover clusters avoids information loss. In comparison with similar methods, information loss is reduced by 5.68–18.26%. The evaluation results show better performance in terms of data delay and information loss.}
}

Leave a Comment