Fuzzy-Rough Intrigued Harmonic Discrepancy Clustering

Fuzzy clustering decomposes data into clusters using partial memberships by exploring the cluster structure information, which demonstrates the comparable performance for knowledge exploitation under the circumstance of information incompleteness. In general, this scheme considers the memberships of objects to cluster centroids and applies to clusters with the spherical distribution. In addition, the noises and outliers may significantly influence the clustering process; a common mitigation measure is the application of separate noise processing algorithms, but this usually introduces multiple parameters, which are challenging to be determined for different data types. This article proposes a new fuzzy-rough intrigued harmonic discrepancy clustering (HDC) algorithm by noting that fuzzy-rough sets offer a higher degree of uncertainty modeling for both vagueness and imprecision present in real-valued datasets. The HDC is implemented by introducing a novel concept of harmonic discrepancy, which effectively indicates the dissimilarity between a data instance and foreign clusters with their distributions fully considered. The proposed HDC is thus featured by a powerful processing ability on complex data distribution leading to enhanced clustering performance, particularly on noisy datasets, without the use of explicit noise handling parameters. The experimental results confirm the effectiveness of the proposed HDC, which generally outperforms the popular representative clustering algorithms on both synthetic and benchmark datasets, demonstrating the superiority of the proposed algorithm.


I. INTRODUCTION
C LUSTERING refers to dividing existing unlabeled data instances into a number of clusters according to the similarity between objects without any prior information, leading to high intercluster similarity and low intracluster similarity between instances.Clustering analysis typically uses a precise similarity measure to gauge the similarity between instances and then determines the division of clusters according to specific clustering strategies [1].A broad spectrum of clustering algorithms have been developed successfully using fuzzy sets and rough sets [2], [3].They convey two crucial and mutually orthogonal aspects of imprecision implied by data and knowledge; the former qualifies that instances belong to a set to a certain degree, and the latter delivers approximations of concepts under circumstances of incomplete information [4], [5].
Fuzzy clustering offers a soft scheme against the conventional hard measurement methods, which can be generally grouped into three categories according to the types of fuzzy sets: Type-1, Type-2, and Intuitionistic [6].Among the many types of existing fuzzy clustering solutions, fuzzy c-means (FCM) is the most representative Type-1 algorithm [7].Compared with the popular partitional clustering k-means [8], FCM classifies the instances into all clusters simultaneously by calculating their (partial) memberships regarding each cluster, which gives the flexibility to consider full, partial, or none belonging of a data point to all clusters.Despite its success, FCM is sensitive to noises, outliers, and cluster sizes.Subsequent research has made various improvements, such as possibilistic c-means (PCM) [9] and possibilistic fuzzy c-means (PFCM) [10].The representative algorithms of Type-2 fuzzy clustering include T2FCM and kernelized T2FCM (KT2FCM) [11].Due to the characteristics of Type-2 fuzzy sets, the specific data elements contributing more to the computation of appropriate cluster centroids result in an improvement of FCM, but the challenge remains for the processing of nonspherical and more complex data.By introducing the tangent function and the Lagrangian method, KT2FCM further improves the performance of T2FCM.The third type, Intuitionistic fuzzy set-based clustering, merges the hesitation degree and membership, leading to intuitionistic FCM (IFCM), IFCM-σ, kernelized IFCM (KIFCM) among other [12], [13], [14], [15].These methods extend conventional FCM by adding intuitionistic features to memberships and objective functions, improving the computational efficiency and clustering performance of nonspherically separable data.
Rough k-means (RKM) and its advancements enhances the traditional k-means algorithm using the rough set theory [16], such as three-way k-means [17], interval Type-2 fuzzy local enhancement based rough k-means [16], and spatial rough kmeans [18].These algorithms divide the instances that belong to a specific cluster into the lower approximate set and the instances that do not belong to a specific cluster into the boundary set, which well solves the problem of fuzzy and uncertain data clustering and demonstrates a more efficient performance in overlapping datasets.However, RKM uses an artificial setting of fixed weights and thresholds, which may negatively affect the clustering performance in addition to the challenge of determining these parameters.Fuzzy-rough k-means [19] and rough-fuzzy k-means [16] further integrate rough theory and fuzzy theory into FCM and RKM, respectively, to allow the algorithm to enjoy the advantages of both fuzzy clustering and rough clustering.
The occurrences of noisy data points degrade the clustering algorithm significantly.There are two leading solutions to mitigate or address this.One solution uses a separate algorithm to process the noisy data before clustering, such as Gaussianbased statistical detection methods [20], kNN distance-based local outliers searching algorithms, and density-based detection methods [21].The other focuses on the reduction of the negative influence of noisy data points in the clustering process.For example, a possibilistic c-means (PCM) is proposed to process datasets containing noises and outliers [9], and this approach has a guaranteed convergence [22].Also, the FCM and PCM algorithms are combined leading to the possibilistic fuzzy c-means (PFCM) algorithm; this algorithm is supposed to well handle noises and outliers by its possibilistic terms but avoid coincident clusters and sensitivity to initialization by its fuzzy terms [10].However, the experiments do not show much better performance as expected.In addition, an improved PFCM algorithm is presented for noisy data by modifying the objective function of the PFCM algorithm [23].Although this algorithm is more accurate than the FCM, PCM, and PFCM based on the experimental results, it suffers from high computational complexity and thus long running time.
In addition to the aforementioned methods, which only consider the centroids of clusters, further improvements are made to partitional clustering in view of the distributions of clusters, including intercluster and intracluster.In [24], a dissimilarity measure is recommended and incorporated for the benefit of considering the intercluster difference of clusters.In [25], a new scheme for scaling the membership degrees of the chosen samples is suggested to boost the effect of the incluster samples and to weaken the effect of the out-of-cluster samples in the clustering process.This scheme not only accelerates the convergence of the algorithm but also maintains the high clustering quality.When it comes to the intracluster distribution, in [26], an elastic fuzzy c-means (EFCM) is proposed to better recognize intrinsic cluster structure.EFCM provides a sparser description for reliable points and a fuzzier description for marginal points of clusters, thus, the roles of reliable and margin points are more balance.In [27], Gaussian mixture model and collaborative technology are combined with FCM to enhance the ability of recognizing the distribution of intracluster.This approach is effective in dealing with noise, nonspherical clusters, and size-imbalanced clusters.In [28], the local densities of instances in intracluster are considered in FCM, and the instances with the local maximum density are used as the initial centroids to improve the stability of FCM.
This article proposes a new concept of harmonic discrepancy to allow the full consideration of the distributions of clusters when evaluating the dissimilarity between a data instance and foreign clusters.In addition, a new cluster centroid updating scheme is proposed by ignoring the abnormal data elements of a cluster during the cluster centroid updating process.These jointly leads to a novel fuzzy-rough intrigued harmonic discrepancy clustering (HDC) algorithm in an effort to address the aforementioned challenges.The proposed HDC algorithm is applied to a set of synthetic and benchmark datasets and gone through a comparative study by employing existing popular clusters.The experimental results confirm a better stability of the proposed HDC algorithm on real-world datasets in comparison with its competitors.The contribution of the article is threefold: 1) Proposing the novel concept of harmonic discrepancy through an innovative application of fuzzy-rough approximation to enable the comprehension of cluster distributions during cluster centroid updating; 2) Developing a nonparameterized noise and outlier processing method, which effectively reduces the negative impact of abnormal data instances in clustering and improves the practical applicability; 3) Establishing the HDC algorithm with its superiority confirmed through a comparative study and statistical analysis.The remainder of the article is structured as follows.Section II briefly reviews the preliminaries of the rough set and fuzzy-rough set.The proposed harmonic discrepancy clustering algorithm is described in Section III.Results of comprehensive experiments are presented in Section IV, leading to conclusions in Section V.

II. PRELIMINARIES
This section reviews the concepts concerning rough sets and fuzzy-rough sets, which underpins the proposed discrepancy metric.

A. Rough Set
The rough set theory provides a methodology to extract knowledge from a domain in a concise way by minimizing information loss while reducing the amount of information involved [16].Central to rough set theory is the concept of indiscernibility.Let (U , A) be an information system, where U is a set of objects and A is a set of attributes such that a : U → V a for every a ∈ A; V a is the set of values that attribute a may take.For each feature subset P ⊆ A, an associated P -indistinguishable relation can be determined by (1) Obviously, IN D(P ) is an equivalence relation on U .The partition of U determined by IN D(P ) is herein denoted by U /P which can be defined as where ⊗ regarding fuzzy sets A and B is defined as follows: For any object x ∈ U , the equivalence class determined by IN D(P ) is denoted by [x] P .Let X ⊆ U .X can be approximated using only the information contained in P by constructing the P -lower and P -upper approximations of X [29]: The pair P X, P X is called a rough set.Informally, the former depicts the set of those objects, which can be said with certainty to belong to the concept to be approximated, and the latter is the set of objects, which either definitely or possibly belong to the concept to be approximated.The difference between the upper and lower approximations is the area known as the boundary region and thus, representing the area of uncertainty.When the boundary region is empty, there is no uncertainty regarding the concept, which is being approximated and all objects belong to the subset of objects of interest with full certainty.

B. Fuzzy-Rough Set
Fuzzy-rough sets encapsulate the related but distinct concepts of vagueness (usually concerned by fuzzy sets) and indiscernibility (usually concerned by rough sets) [30], both of which occur as a result of uncertainty in data or knowledge.Compared to rough sets, fuzzy-rough sets offer a higher degree of flexibility in enabling the vagueness and imprecision present in real-valued data to be simultaneously and effectively modeled.In fuzzy-rough sets, the fuzzy lower and upper approximations to approximate a fuzzy concept X can be defined as where I is a fuzzy implicator and T is a T -norm.R P is a T -transitive fuzzy similarity relation induced by the subset of features P : where μ R a (x, y) represents the degree to which objects x and y are similar to each other based on feature a.This degree may be defined in a number of ways such as where δ 2 a indicates the variation for feature a.The fuzzy lower and upper approximations express the same physical meaning with their crisp counterparts.In particular, μ R P X (x) shows the extent to which the object x must belong to the approximated fuzzy concept X, while μ R P X (x) represents the extent to which the object x may belong to the approximated fuzzy concept X.

III. FUZZY-ROUGH INTRIGUED HARMONIC DISCREPANCY CLUSTERING
The existing partitional clustering algorithms, e.g., kmeans [8] and FCM [7], group an instance into a cluster if it is a full or the highest membership as induced by the nearest prototype or expectation of the cluster.These types of clustering methods are usually performed by considering the memberships of the objects to the cluster centroids and ignoring the distributions of the clusters; and a small amount of noisy data points or outliers can have significant, and often negative impact to the clustering results.A novel harmonic discrepancy clustering (HDC) strategy is presented in this section to ease the restriction of the partitional clustering algorithms by an innovative application of fuzzy-rough sets, for a sound and robust clustering performance.

A. Discrepancy Inspired by Fuzzy-Rough Set
In this article, discrepancy refers to the degree of separating an object from a cluster.Given an information system (U , A), suppose that there are n instances, i.e., U = {x 1 , . . ., x n }, and k clusters will be partitioned, including C 1 , . . ., C k .The degree to which a data instance x i belongs to a cluster C j with regard to attributes A can be gauged by the fuzzy lower approximation μ R A C j (x i ).
Fuzzy implication I calculates the fulfillment degree of a fuzzy rule: where the antecedent (p is X) and the consequence (q is Y ) are fuzzy.For any fuzzy implication, it holds that That is, if the consequence establishes in any case (i.e., q = 1), the truth value of the fuzzy rule ( 11) is 1.Due to (12), Moreover, let N be a strong negation (i.e., a continuous, strictly decreasing, involutive function such that N (0) = 1), I is contrapositive symmetry with respect to N if and only if As proved in [31], if I belongs to S-implications, QLimplications, or R-implications, which enjoys the contrapositive symmetry, the equation I(x, 0) = N (x) holds with N being a strong negator to induce I.By considering the classical strong negation N C (x) = 1 − x, (13) can be further modified to In particular, typical fuzzy implicators of S-implications, QLimplications, or R-implications include but not limited to 1) Łukasiewicz implicator: I L (p, q) = min(1 − p + q, 1); 2) Kleene-Dienes implicator: By replacing " min and "y / ∈ C j , respectively, in (15) with " max and "y ∈ C j , the discrepancy of a data instance x i in reference to a cluster C j with regard to attributes A can be expressed as It makes intuitive sense that the discrepancy function indicates the degree to which the most dissimilar data instance y in cluster C j to the referencing data instance x i .In fact, the discrepancy function as given in ( 16) can be deemed as the max-link distance [32] between x i and C j .However, if a cluster suffers from a decentralized distribution, the discrepancy of a data instance to it may undergo a high probability of inaccuracy.In this case, the discrepancy function as expressed in ( 16) can be improved by taking into account the distribution of cluster C j .Let M = [m ij ] n×k be the partition matrix of U , i.e., Each element of m ij = 1 indicate that the ith instance is assigned to the jth cluster.Based on M , the centroid c j of cluster C j can be calculated by Since the centroid of a cluster represents the expectation of the instances belonging to this cluster, in this article, the distribution of cluster C j is approximated via the membership of c j to C j .If cluster C j enjoys a compact distribution, the extent to which c j belongs to C j is supposed to be large accordingly.In the light of the concept of fuzzy-rough sets, this membership can be represented by fuzzy upper approximation μ R A C j (c j ).
T -norm T generalizes the logical conjunction to fuzzy logic.For two fuzzy variables p and q, T (p, q) indicates an "and" operator to metric a unified truth degree when p and q are established at the same time.For any T -norm operator T , it holds that Equation (19) indicates that if a term q is established in any case (i.e., q = 1), the degree of meeting both p and q depend on the value of p. Equation (20) indicates that if a term q acts as null element (i.e., q = 0), the chance to meet both p and q is 0. Due to ( 19) and ( 20), μ R A C j (c j ) can be simplified to Based on (21), μ R A C j (c j ) can be interpreted as the similarity of c j to its nearest neighbor in cluster C j .Therefore, it is rational to use this metric as the indicator of cluster distribution.
To synthesize the roles of both ( 16) and ( 21), a representative object ỹji is sought to metric the harmonic discrepancy (HD) of x i to cluster C j , which is designed as follows: The harmonic average of 1 − μ R A (x i , y) and μ R A (c j , y) in ( 22) is illustrated in Fig. 1.It can be seen that by maximizing this harmonic average, both 1 − μ R A (x i , y) or μ R A (c j , y) can be guaranteed large values, which are consistent with ( 16) and (21), respectively.Therefore, (22), can locate a sample y ∈ C j which is distant to x i (i.e., a large value of 1 − μ R A (x i , y)) but close to the centroid c j of cluster C j (i.e., a large value of μ R A (c j , y)), from both separability and rationality perspectives.
With the support of ( 22), the HD value of x i to cluster C j can be expressed as

B. Anomaly Reduction
Despite the comprehensive strategy of HD to distinguish the degrees of data instances belonging to a cluster, its efficacy may still be compromised by the anomalies associated with the decentralized distribution of the cluster.The misclustering of the peripheral objects always triggers a chain of reactions in subsequent iterations, resulting in unexpected centroid deviation and thus poor clustering results.A novel cluster centroid updating strategy is therefore proposed with an aim to reduce the negative effects of peripheral objects.In particular, the core and peripheral objects are distinguished after each iteration, and the centroid update will depend only on the core objects by ignoring the peripheral instances.
To identify the peripheral objects of cluster C j , j ∈ {1, . . ., k}, the HD acceptance threshold of a data instance to a cluster C j is defined as where ave(δ R A C j (x)) and std(δ R A C j (x)) represent the average and the standard deviation of δ R A C j (x) for all x ∈ C j , respectively.With the use of this threshold j , if a sample x ∈ C j suffers from δ R A C j (x) > j , its affiliation with C j can be considered as unreliable, naturally.Therefore, all such instances are regarded as the peripheral objects of C j ; otherwise, they are labeled as a member of the core set.By omitting the peripheral objects, if there is any, during the calculation of the centroid of each cluster, the likelihood of the occurrence of the offset centroid is mitigated.The peripheral object detection procedure is outlined in Algorithm 1.The main structure of the procedure is a loop over the k clusters to be identified, as shown between Lines 1 and 10.Within this loop, the current clustering is provided in Line 2, and acceptance threshold j of C j is calculated by using (24) in Line 3. The inner loop between Lines 4 and 9 compares the HD value of each instance in cluster C j with the value of the calculated threshold j to determine whether the instance is a peripheral object.From this, the memberships of all identified peripheral objects m ij are set to 0, to bypass these objects in the calculation of the centroid of C j in the next iteration.After the main loop, the algorithm returns the updated partition matrix M .

C. Harmonic Discrepancy Clustering
To avoid the accident that the centroid are initialized in the peripheral region of data, the random partition (RP) algorithm [33] is used to initialize clusters.As shown in Algorithm 2, rather than straightly initializes centroids.The RP algorithm randomly assigns each instance to a cluster by initializing the partition matrix.In so doing, the RP algorithm avoids selecting outliers to act as centroids from the border areas and the centroids, resulted from the initialized partition matrix, are concentrated in the central area of the data due to the averaging.
Inspired by fuzzy-rough sets, the membership of an instance belonging to a cluster can be obtained from the discrepancy expressing the degree of separating the instance from other clusters.Intuitively, the more significant discrepancy of an instance C j ← set of instances where m ij = 1, i = 1, . .., n. 3: m ij = 1 5: end 6: return M in reference to other clusters, the greater the membership of this instance to the current computing cluster.According to the definition of HD as expressed in (23), the membership of x i to C j can be defined as With the support of ( 25), the proposed HDC algorithm is summarized as Algorithm 3. First, Line 1 initializes the iteration counter iter as 0. In Line 2, the partition matrix M is initialized by the RP algorithm.Next, a referencing partition matrix T indicating the partition result of the current iteration is prepared for future use (as the algorithm will terminate when there is no change on clusters between two consecutive iterations) in Line 3.
Lines 4-20 show the overall iterative process for clustering.The HD values of all instances in reference to all clusters is obtained by the inner loop expressed in Lines 5-11.After computing the discrepancy values, the memberships of all instances to each cluster are calculated by applying the HD values to (25) foreach j = 1 to k do 6: iter = iter + 1 20: end 21: return M as depicted in Line 12.Then, the partition matrix can be readily updated by the following equation: By using (26), both the core objects and the peripheral objects detected via Algorithm 1 are assigned into the clusters with the most significant memberships (25).
The algorithm will jump out of the loop either after reaching the maximum number of iterations or the clusters are stabilized, as controlled in Lines 13 and 14; otherwise, the algorithm will move to Lines 16 and 17, which reset the referencing partition matrix T and invokes Algorithm 1 to detect peripheral objects.Correspondingly, m ij values of the detected peripheral objects in the partition matrix are all set to 0, so as to avoid the influence of such objects in the next iteration of centroid update.In doing so, the peripheral objects are detected progressively over iterations.Instances that have been assigned as peripheral objects in the past iteration may become core objects in the subsequent iterations.Likewise, the previously identified core objects may be transferred to the peripheral set.Regardless of the shifts, the centroid update always relies only on the core set, to maximally guarantee that the centroids resulted from each iteration has a minimal influence from the outliers or other peripheral instances.
Finally, the stabilization of clusters is examined by comparing the partition matrices between the current iteration and the

TABLE II INITIALIZATION OF THE PARTITION MATRIX M
previous iteration in this work.If the partition matrices between two consecutive iterations are exactly the same, the algorithm will terminate.
The proposed algorithm is able to automatically identify outliers without the use of any predefined parameters and prior knowledge about outliers.Thus, it can effectively avoid the often negative influence of peripheral objects, but require additional computational resources for this extra functionality.In HDC, each data instance needs to traverse all elements of all clusters when finding the largest memberships.The time complexity of this part is O(n 2 ).The algorithm needs to be iterated t times, and denoising is performed in each iteration, which computationally contributes to O(nk).Note that k is usually much smaller than n, so the final time complexity of the proposed HDC algorithm is O(n 2 t).
To illustrate the proposed HDC algorithm, some exemplar instances are given in Table I and displayed in Fig. 2. Let the number of clusters to be 2.By using the RP algorithm, the partition matrix M and the associated unlabeled instances are initialized as shown in Table II and Fig. 3, respectively.Specifically, in Fig. 3, two pentacles represent the resulting centroids of the respective clusters.Due to the use of the RP algorithm, both of these two centroids locate in the central region of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.data, roughly.Thus, the risk of the initialized centroids falling into the border is reduced, effectively.
Given the initialization in Fig. 3, by taking Line 17, i.e., the POD algorithm, out of the proposed HDC algorithm, the resulting clusters are illustrated in Fig. 4. Here, the HDC algorithm is implemented by using the Algebraic T -norm: T P (a, b) = ab and the fuzzy similarity in (10).It can be observed that, without the process of anomaly reduction, a small number of peripheral objects form an independent category and their underlying connections to other instances are ignored.
On the contrary, when using the complete HDC algorithm, C 2 in Fig. 4 is merged with the right part of C 1 in Fig. 4, as depicted in Fig. 5.The HD values of all instances at the last iteration are summarized in Table III.By applying (24), the acceptance thresholds of these instances are III, the HD value of x 1 is larger than 1 and those of x 19 , x 20 , x 21 , and x 22 are larger than 2 .Therefore, these instances are deemed respective peripheral objects for C 1 and C 2 and marked with squares in Fig. 5.The remaining samples are the members in the core set of C 1 and C 2 .This results demonstrate the ability of HDC to detect peripheral objects and exploit the latent cluster structures.

IV. EXPERIMENTAL EVALUATION
The experimental processes and results are reported in four parts in this section.The specification of the experiment is detailed first.Then, the proposed HDC algorithm is applied to lane segmentation with the clustering results visually analyzed.This is followed by a comparative study of HDC in reference to other competitive methods on a set of benchmark datasets.Finally, a statistical analysis is performed to show any statistical significance between different approaches.The codes of HDC can be downloaded from the Github release page. 1

A. Experimental Setup
All datasets used in this work are derived from the Google image and UCI2 repository, including three common lane images, and the benchmark datasets Iris, Heart, Led7digit, Glass, Newthyroid, Seeds, Hepatitis, Breast, and Wine.Specifically, the lane images include a solid line image (i.e., Image1), a mixture of solid and dashed lines image (i.e., Image2), and a curve line image (i.e., Image3) as shown in Fig. 6, are employed to test the practical applicability of HDC in lane line segmentation.The lane lines in these images are extracted from the original location and represented as 2-D datasets.The corresponding results are visualized as subfigures below the original images in Fig. 6.Overall, the details of the used datasets are summarized in Table IV.
Note that redundant features may present in the original datasets.Principal component analysis (PCA) [42] is used for all datasets.For the lane image datasets, the first two principal components are extracted to identify the underlying dependencies and reduce the feature correlation of the two road line dimensions.For benchmark datasets, the accumulative contribution rate is set to 90%.To ensure the fairness of the experiment, this settings are used for all comparison algorithms.Also, all instances are randomized and standardized to ensure that clustering results are not affected by the order of data instances.
The parameter settings of the compared algorithms are implemented based on the recommendation in the original publications or optimal settings in the parameters pool.There are no extra parameters for k-means except the number of clusters.For MS, the maximum number of iterations is set to 300.The density peaks for DPC are selected automatically using the scheme reported in [38].It is challenging for DBSCAN to deal with various datasets with fixed parameters; in this work, the value of minpts is set to ln|n| as recommended in [43] and the value of eps is set to the optimal set in the pool of [1,5] with the step being valued as 0.2.For PCM, the fuzzy parameter and error are set to 1.2 and 0.001, respectively.For EPCM, parameters m, θ are set to 2, 3, respectively.In the case of IT2PFCM, parameters m 1 , m 2 are set to 2, 4, respectively.As for the ensemble approaches SEC and LWEA, the required parameters μ and θ are set to 1 and 0.4, respectively, in line with the recommendation as reported in [40] and [41].For the proposed HDC, the Algebraic T-norm and (10) are used as the metric for fuzzy similarity calculation and the similarity parameters are set to the optimal value of the standard deviation and variance; the maximal number of iterations t is set to 15.Likewise, FCM, PCM, EPCM, IT2PFCM, and traditional k-means all use 15 as the maximal number of iterations.
Each clustering algorithm was run 100 times on each dataset.Normalized mutual information (NMI) [41] and homogeneity score (HS) [44] are used as the evaluation criteria of all datasets for a more objective comparison.More specifically, the NMI measure provides a sound indication of the shared information between the real and predicted clusters, which is a normalization of the mutual information (MI) score scaling the results between 0 (no mutual information) and 1 (perfect correlation).As for the HS, a clustering result satisfies homogeneity if all of its clusters contain only data points, which are members of a single class; HS is ranged from 0 to 1, where 1 represents perfectly homogeneous labeling.For clear description, the best results are highlighted in bold in Tables V, VI, and VII.

B. Performance on Road Line Segmentation
The best experimental results of each approach for the three types of lanes are illustrated in Figs. 7, 8, and 9.Note that for MS and DBSCAN, there are no parameters about the number of clusters, so the cluster number may not be consistent with the number of lane separation lines when selecting the optimal parameters [such as that shown in Fig. 9(b)].It can be seen that only DBSCAN and HDC have generated the correct clustering results, but all other methods were not able to segment the lane separation lines correctly.More specifically, for Road1 and Road3, both HDC and DBSCAN have successfully clustered the lane separation lines in the sampled roads.However, the lane segmentation problem is too challenging for other clusters.For Road2, the DBSCAN can separate each short line into a cluster when selecting appropriate parameters.Nonetheless, due to the intervals between dotted lines, the two middle lane separation lines cannot be clustered into a complete line no matter how the parameters are set for DBSCAN.Interestingly, the proposed HDC is the only clustering approach leading to correct clustering results, demonstrating superior clustering performance over others.
Moreover, for each lane image dataset, four images are captured from the process of clustering to show the changes of centroids and noises.The respective results are recorded in Figs. 10, 11, and 12.Among these four images, the first one is the moment of initialization; the last one indicates the resulting clusters; the second and the third ones show the main segments Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.  in the process of clustering.For each cluster, " " indicates the centroid, and "×" indicates the peripheral object.It can be seen that due to the use of the RP method, the centroid of each cluster is initialized in the center region of each image.In the process of clustering, these centroids move toward the intermediate area of the corresponding lane lines, gradually.During the segmentation process of each lane line, it is interesting to note that the proposed HDC algorithm intends to treat the upper and lower ends of lines as the peripheral objects of each cluster.In the light of Algorithm 3, the noises would not be used to update the centroids at each iteration.But at last, the noise objects are assigned to the clusters according to the maximum membership principle shown in (26).
The corresponding average results of NMI and HS over 100 experiments are detailed in Table V. HDC shows the best performance based on both metrics, given that the value of 1 indicates the method has always successfully clustered all lane separation lines in all repeated experiments.The second best performer is DBSCAN, which does not show the best result for Road2, but it has got a tie with the proposed HDC for the other two roads.None of the rest referenced approaches has demonstrated any close performance.This clearly shows the superiority of the proposed HDC clustering algorithm.

C. Performance on Benchmark Datasets
To verify the noise-resistance ability of the studied algorithms, 5%, 10%, and 15% random Gaussian noises are added to each benchmark dataset following the experiments reported in [45].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Each newly added noise is assigned to the class, where the centroid is closed to this noise.All of these noise samples will be taken into account by the evaluation metrics NMI and HS.The average NMI and HS results (and the best NMI and HS results as shown in brackets) of these algorithms over 100 repeated experiments on different datasets are detailed in Tables VI and VII.
Considering the NMI evaluation index, HDC shows an overall better clustering performance on most datasets.Especially for Newthyroid, the average results of HDC surpass the results of all other competing methods under different noise conditions.However, as the noise scale increases, the performance of the noise-sensitive algorithms declines quickly.Take Iris as an example, the average result of k-means decreases from 0.72 (5% added noise) to 0.46 (10% added noise) and 0.18 (15% added noise).For MS, SEC, and LWEA, the performance also drops with the enlargement of noise; especially when the noise ratio reaches 15%, the evaluation values of those contrasting methods are reduced to the lowest.For datasets Newthyroid, Seeds, and Wine, the performance of k-means, AC, SEC, and LWEA also degrades as the noise level rises.As for the robust clustering algorithms: PCM, EPCM, and IT2PFCM, the obtained results are more steady no matter how much noise is included, and occasionally, able to surpass HDC even with 15% added noise.Nevertheless, without noise parameters, HDC still outperform these robust clustering algorithms in most cases.It is worth noting that for datasets Led7digit and Glass, with the expansion of noise, the performance of diverse approaches does not change significantly, which is related to the added random noise and the specific distribution of the data.Interestingly, the density-based algorithms DPC and DB-SCAN seem to suffer less from the increased noise, although negatively affected by noise like other compared approaches.Again, take the Iris dataset as an example, the effect of noises to these two algorithms is marginal, and other datasets also show similar trends, indicating that the density-based clustering approach is more resistant to data noises.
In terms of the best NMI values, the results of k-means are rather different from the corresponding average on many datasets, such as Iris, Hepatitis, and Wine, when the noise level is 15%.This shows its instability to deal with noise.For the remaining comparison strategies and the proposed HDC, although there is a gap between the average and best values, it is not significantly noticeable.Especially for MS and AC, their clustering performance is highly stable, and the same results can always be observed in repeated experiments.Overall, HDC has a stable ability to outperform the compared algorithms.
Regarding the HS metric, the performance of each algorithm is consistent to its NMI results under most circumstances.Again, HDC displays outperformance in comparison with other competitor.Especially for datasets Newthyroid, and Wine, HDC outperforms nearly all the compared methods in terms of both average and best results, and it exceeds more than half of the compared approaches for the remaining datasets in most cases.In general, many clustering algorithms are noise sensitive, which means they have difficulties to well handle datasets with noises and/or outliers.For the robust clustering algorithms, the noise preprocessing helps provide more stable results but it may challenge to determine the noise threshold and it often requires bespoke optimization for different datasets.Experimental results    show that HDC can better deal with noise data without the requirement of predefined noise parameters.

D. Statistical Analysis
Paired t-test is applied to all the experiments to explore any statistically significant differences between the proposed HDC algorithm and the referenced clustering approaches.The threshold of significance is set to 0.05 for all experiments, which ensures that the results are not obtained by chance.The t-test results are summarized at the end of each subtable of Tables VI and VII, by counting the number of statistically better, equivalent, or worse cases for HDC in comparison to other compared algorithms; in particular, better and worse cases are indicated by "*" and "v" in the tables while equivalent cases are represented by blank spaces.For example, (10/1/0) in the column Iris with 10% noise in Table VI expresses that the average clustering result led by the proposed HDC algorithm is better than ten compared methods, equally well with 1 compared method, and worse than 0 compared method.It can be clearly seen from these tables that the statistical results of HDC are better than other methods in most cases based on both metrics NMI and HS.Especially based on NMI, the proposed HDC algorithm surpasses all other compared approaches on nearly half of the datasets.For other datasets and the HS metric, HDC outperforms most of the compared algorithms as well.Statistical analysis based on 100 repeated experiments proves the better stability of the proposed HDC algorithm in reference to the employed competitors in this work.

V. CONCLUSION
Inspired by the fuzzy-rough set theory, this article proposes the concept of harmonic discrepancy and associated harmonic discrepancy clustering (HDC) algorithm, which clusters data from the perspectives of both separability and rationality of clusters.Also, this HDC algorithm benefits from a nonparametric noise detection strategy for better applicability on noisy datasets.Experimental results demonstrate that HDC enjoys sound effectiveness and stability on the lane segmentation and benchmark datasets.
While promising, the work also opens up an avenue for further development.For instance, it would be interesting to investigate an extension of HDC for multidensity clusters.In addition, an investigation into potential time efficiency improvement remains active research.Moreover, HDC can deal with noise without the assistance of predefined noise parameters, which is promising for complex and large-scale real-world datasets.Therefore, further applications, such as medical image analysis [46], [47], would construct the foundation for a broader spectrum of future research.

TABLE III HD
OF THE INSTANCES IN C 1 AND C 2

TABLE V NMI
AND HS RESULTS OF DIFFERENT CLUSTERING ALGORITHMS ON EXTRACTED LANE SEPARATION LINES TABLE VI NMI RESULTS OF DIFFERENT CLUSTERING ALGORITHMS ON BENCHMARK DATASETS WITH DIFFERENT PROPORTIONS OF NOISE Authorized licensed use limited to the of the applicable license agreement with IEEE.Restrictions apply.

TABLE VII HOMOGENEITY
SCORES OF DIFFERENT CLUSTERING ALGORITHMS ON BENCHMARK DATASETS WITH DIFFERENT PROPORTIONS OF NOISE