A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique

Standard

A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique. / Chelly Dagdia, Zaineb; Zarges, Christine.

In: Fundamenta Informaticae, Vol. 182, No. 2, 30.09.2021, p. 111-179.

Research output: Contribution to journalArticlepeer-review

Author

Chelly Dagdia, Zaineb ; Zarges, Christine. / A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique. In: Fundamenta Informaticae. 2021 ; Vol. 182, No. 2. pp. 111-179.

Bibtex - Download

@article{145a2fc711e24fc28787ba9d454efc04,
title = "A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique",
abstract = "In the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost.",
keywords = "Granular Computing, Rough Set Theory, Big Data, Feature Selection, Locality Sensitive Hashing, Distributed Processing",
author = "{Chelly Dagdia}, Zaineb and Christine Zarges",
year = "2021",
month = sep,
day = "30",
doi = "10.3233/FI-2021-2069",
language = "English",
volume = "182",
pages = "111--179",
journal = "Fundamenta Informaticae",
issn = "0169-2968",
publisher = "IOS Press",
number = "2",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique

AU - Chelly Dagdia, Zaineb

AU - Zarges, Christine

PY - 2021/9/30

Y1 - 2021/9/30

N2 - In the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost.

AB - In the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost.

KW - Granular Computing

KW - Rough Set Theory

KW - Big Data

KW - Feature Selection

KW - Locality Sensitive Hashing

KW - Distributed Processing

U2 - 10.3233/FI-2021-2069

DO - 10.3233/FI-2021-2069

M3 - Article

VL - 182

SP - 111

EP - 179

JO - Fundamenta Informaticae

JF - Fundamenta Informaticae

SN - 0169-2968

IS - 2

ER -

Show download statistics
View graph of relations
Citation formats