A New Approach to FuzzyRough Nearest
Neighbour Classification
Richard Jensen
1
and Chris Cornelis
2
1
Dept. of Comp. Sci., Aberystwyth University, Ceredigion, SY23 3DB, Wales, UK
rkj@aber.ac.uk
2
Dept. of Appl. Math. and Comp. Sci., Ghent University, Gent, Belgium
Chris.Cornelis@UGent.be
Abstract. In this paper, we present a new fuzzyrough nearest neigh
bour (FRNN) classification algorithm, as an alternative to Sarkar?s fuzzy
rough ownership function (FRNNO) approach. By contrast to the latter,
our method uses the nearest neighbours to construct lower and upper
approximations of decision classes, and classifies test instances based on
their membership to these approximations. In the experimental analysis,
we evaluate our approach with both classical fuzzyrough approxima
tions (based on an implicator and a tnorm), as well as with the recently
introduced vaguely quantified rough sets. Preliminary results are very
good, and in general FRNN outperforms both FRNNO, as well as the
traditional fuzzy nearest neighbour (FNN) algorithm.
1 Introduction
The Knearest neighbour (KNN) algorithm [6] is a wellknown classification
technique that assigns a test object to the decision class most common among
its K nearest neighbours, i.e., the K training objects that are closest to the
test object. An extension of the KNN algorithm to fuzzy set theory (FNN)
was introduced in [8]. It allows partial membership of an object to di?erent
classes, and also takes into account the relative importance (closeness) of each
neighbour w.r.t. the test instance. However, as Sarkar correctly argued in [11],
the FNN algorithm has problems dealing adequately with insu?cient knowledge.
In particular, when every training pattern is far removed from the test object,
and hence there are no suitable neighbours, the algorithm is still forced to make
clearcut predictions. This is because the predicted membership degrees to the
various decision classes always need to sum up to 1.
To address this problem, Sarkar [11] introduced a socalled fuzzyrough owner
ship function that, when plugged into the conventional FNN algorithm, produces
class confidence values that do not necessarily sum up to 1. However, this method
(called FRNNO throughout this paper) does not refer to the main ingredients of
rough set theory, i.e., lower and upper approximation. In this paper, therefore,
we present an alternative approach, which uses a test object?s nearest neigh
bours to construct the lower and upper approximation of each decision class,
and then computes the membership of the test object to these approximations.
C.C. Chan et al. (Eds.): RSCTC 2008, LNAI 5306, pp. 310?319, 2008.
c? SpringerVerlag Berlin Heidelberg 2008
A New Approach to FuzzyRough Nearest Neighbour Classification 311
The method is very flexible, as there are many options to define the fuzzyrough
approximations, including the traditional implicator/tnorm based model [10],
as well as the vaguely quantified rough set (VQRS) model [3], which is more
robust in the presence of noisy data.
This paper is structured as follows. Section 2 provides necessary details for
fuzzy rough set theory, while Section 3 is concerned with the existing fuzzy (
rough) NN approaches. Section 4 outlines our algorithm, while comparative ex
perimentation on a series of crisp classification problems is provided in Section 5.
The paper is concluded in section 6.
2 Hybridization of Rough Sets and Fuzzy Sets
Rough set theory (RST) [9] provides a tool by which knowledge may be extracted
from a domain in a concise way; it is able to retain the information content whilst
reducing the amount of knowledge involved. Central to RST is the concept of
indiscernibility. Let (U, A) be an information system
1
,whereU is a nonempty
set of finite objects (the universe of discourse) and A is a nonempty finite set
of attributes such that a : U ? V
a
for every a ? A. V
a
is the set of values
that attribute a may take. With any B ? A there is an associated equivalence
relation R
B
:
R
B
= {(x, y) ? U
2
?a ? B, a(x)=a(y)} (1)
If (x, y) ? R
B
,thenx and y are indiscernible by attributes from B. The equiv
alence classes of the Bindiscernibility relation are denoted [x]
B
.LetA ? U. A
can be approximated using the information contained within B by constructing
the Blower and Bupper approximations of A:
R
B
?A = {x ? U  [x]
B
? A} (2)
R
B
?A = {x ? U  [x]
B
? A negationslash= ?} (3)
The tuple ?R
B
?A, R
B
?A? is called a rough set.
The process described above can only operate e?ectively with datasets con
taining discrete values. As most datasets contain realvalued attributes, it is
necessary to perform a discretization step beforehand. A more intuitive and
flexible approach, however, is to model the approximate equality between ob
jects with continuous attribute values by means of a fuzzy relation R in U, i.e.,
a U ? [0, 1] mapping that assigns to each couple of objects their degree of simi
larity. In general, it is assumed that R is at least a fuzzy tolerance relation, that
is, R(x, x)=1andR(x, y)=R(y,x)forx and y in U.Giveny in U,itsforeset
Ry is defined by Ry(x)=R(x, y) for every x in U.
Given a fuzzy tolerance relation R and a fuzzy set A in U, the lower and
upper approximation of A by R can be constructed in several ways. A general
definition [4,10] is the following:
1
In the classification problems considered further on in this paper, A = C?{d},where
C represents the set of conditional attributes, and d is the decision or class attribute.
312 R. Jensen and C. Cornelis
(R?A)(x)=inf
y?U
I(R(x, y),A(y)) (4)
(R?A)(x)=sup
y?U
T(R(x, y),A(y)) (5)
Here, I is an implicator
2
and T atnorm
3
.WhenA is a crisp (classical) set and
R is an equivalence relation in U, the traditional lower and upper approximation
are recovered.
Just like their crisp counterparts, formulas (4) and (5) (henceforth called the
FRS approximations) are quite sensitive to noisy values. That is, a change in
a single object can result in drastic changes to the approximations (due to the
use of sup and inf, which generalize the existential and universal quantifier,
respectively). In the context of classification tasks, this behaviour may a?ect
accuracy adversely. Therefore, in [3], the concept of vaguely quantified rough sets
(VQRS) was introduced. It uses the linguistic quantifiers ?most? and ?some?,
as opposed to the traditionally used crisp quantifiers ?all? and ?at least one?, to
decide to what extent an object belongs to the lower and upper approximation.
Given a couple (Q
u
,Q
l
) of fuzzy quantifiers
4
that model ?most? and ?some?,
the lower and upper approximation of A by R are defined by
(R?A)(y)=Q
u
parenleftbigg
Ry ? A
Ry
parenrightbigg
= Q
u
?
?
summationtext
x?X
min(R(x, y),A(x))
summationtext
x?X
R(x, y)
?
?
(6)
(R?A)(y)=Q
l
parenleftbigg
Ry ? A
Ry
parenrightbigg
= Q
l
?
?
summationtext
x?X
min(R(x, y),A(x))
summationtext
x?X
R(x, y)
?
?
(7)
where the fuzzy set intersection is defined by the min tnorm and the fuzzy set
cardinality by the sigmacount operation. As an important di?erence to (4) and
(5), the VQRS approximations do not extend the classical rough set approxi
mations, in a sense that when A and R are crisp, R?A and R?A may still be
fuzzy.
3 Fuzzy Nearest Neighbour Classification
The fuzzy Knearest neighbour (FNN) algorithm [8] was introduced to classify
test objects based on their similarity to a given number K of neighbours (among
the training objects), and these neighbours? membership degrees to (crisp or
2
An implicator I is a [0, 1]
2
? [0, 1] mapping that is decreasing in its first and
increasing in its second argument, satisfying I(0, 0) = I(0,1) = I(1, 1) = 1 and
I(1, 0) = 0.
3
AtnormT is an increasing, commutative, associative [0, 1]
2
? [0, 1] mapping sat
isfying T(x,1) = x for x in [0, 1].
4
By a fuzzy quantifier, we mean an increasing [0, 1] ? [0, 1] mapping such that
Q(0) = 0 and Q(1) = 1.
A New Approach to FuzzyRough Nearest Neighbour Classification 313
fuzzy) class labels. For the purposes of FNN, the extent C(y) to which an un
classified object y belongs to a class C is computed as:
C(y)=
summationdisplay
x?N
R(x, y)C(x)(8)
where N is the set of object y?s K nearest neighbours, and R(x, y) is the [0,1]
valued similarity of x and y. In the traditional approach, this is defined in the
following way:
R(x, y)=
y ? x
?2/(m?1)
summationtext
j?N
y ? j
?2/(m?1)
(9)
where  ?  denotes the Euclidean norm, and m is a parameter that controls
the overall weighting of the similarity. Assuming crisp classes, Figure 1 shows
an application of the FNN algorithm that classifies a test object y to the class
with the highest resulting membership. The complexity of this algorithm for the
classification of one test pattern is O(U + K ?C).
FNN(U,C,y,K).
U, the training data; C, the set of decision classes;
y, the object to be classified; K, the number of nearest neighbours.
(1) N ? getNearestNeighbours(y,K);
(2) ?C ?C
(3) C(y)=
summationtext
x?N
R(x,y)C(x)
(4) output arg max
C?C
(C(y))
Fig.1. The fuzzy KNN algorithm
Initial attempts to combine the FNN algorithm with concepts from fuzzy
rough set theory were presented in [11,12]. In these papers, a fuzzyrough own
ership function is constructed that attempts to handle both ?fuzzy uncertainty?
(caused by overlapping classes) and ?rough uncertainty? (caused by insu?cient
knowledge, i.e., attributes, about the objects). The fuzzyrough ownership func
tion ?
C
of class C was defined as, for an object y,
?
C
(y)=
summationtext
x?U
R(x, y)C(x)
U
(10)
In this, the fuzzy relation R is determined by:
R(x, y)=exp
parenleftBigg
?
summationdisplay
a?C
?
a
(a(y) ? a(x))
2/(m?1)
parenrightBigg
(11)
where m controls the weighting of the similarity (as in FNN) and ?
a
is a para
meter that decides the bandwidth of the membership, defined as
314 R. Jensen and C. Cornelis
?
a
=
U
2
summationtext
x?U
a(y) ? a(x)
2/(m?1)
(12)
?
C
(y) is interpreted as the confidence with which y can be classified to class C.
The corresponding crisp classification algorithm, called FRNNO in this paper,
can be seen in Figure 2. Initially, the parameter ?
a
is calculated for each attribute
and all memberships of decision classes for test object y are set to 0. Next,
the weighted distance of y from all objects in the universe is computed and
used to update the class memberships of y via equation (10). Finally, when
all training objects have been considered, the algorithm outputs the class with
highest membership. The algorithm?s complexity is O(CU + U?(C + C)).
By contrast to the FNN algorithm, the fuzzyrough ownership function con
siders all training objects rather than a limited set of neighbours, and hence
no decision is required as to the number of neighbours to consider. The rea
soning behind this is that very distant training objects will not influence the
outcome (as opposed to the case of FNN). For comparison purposes, the K
nearest neighbours version of this algorithm is obtained by replacing line (3)
with N ? getNearestNeighbours(y,K).
FRNNO(U,C,C,y).
U, the training data; C, the set of conditional features;
C, the set of decision classes; y, the object to be classified.
(1) ?a ? C
(2) ?
a
= U/2
summationtext
x?U
a(y) ? a(x)
2/(m?1)
(3) N ?U
(4) ?C ?C, ?
C
(y)=0
(5) ?x ? N
(6) d =
summationtext
a?C
?
a
(a(y) ? a(x))
2
(7) ?C ?C
(8) ?
C
(y)+ =
C(x)?exp(?d
1/(m?1)
)
N
(9) output arg max
C?C
?
C
(y)
Fig.2. The fuzzyrough ownership nearest neighbour algorithm
It should be noted that the algorithm does not use fuzzy lower or upper
approximations to determine class membership. A very preliminary attempt to
do so was described in [1]. However, the authors did not state how to use the
upper and lower approximations to derive classifications.
4 FuzzyRough Nearest Neighbour (FRNN) Algorithm
Figure 3 outlines our proposed algorithm, combining fuzzyrough approximations
with the ideas of the classical FNN approach. In what follows, FRNNFRS and
FRNNVQRS denote instances of the algorithm where traditional, and VQRS,
A New Approach to FuzzyRough Nearest Neighbour Classification 315
approximations are used, respectively. The rationale behind the algorithm is that
the lower and the upper approximation of a decision class, calculated by means
of the nearest neighbours of a test object y, provide good clues to predict the
membership of the test object to that class.
In particular, if (R?C)(y) is high, it reflects that all (most) of y?s neighbours
belong to C, while a high value of (R?C)(y) means that at least one (some)
neighbour(s) belong(s) to that class, depending on whether the FRS or VQRS
approximations are used. A classification will always be determined for y due to
the initialisation of ?
1
(y)and?
2
(y) to zero in line (2). To perform crisp classi
fication, the algorithm outputs the decision class with the resulting best fuzzy
lower and upper approximation memberships, seen in line (4) of the algorithm.
This is only one way of utilising the information in the fuzzy lower and upper
approximations to determine class membership, other ways are possible (such as
combining them into a single measure) but are not investigated in this paper.
The complexity of the algorithm is O(C ? (2U)).
FRNN(U,C,y).
U, the training data; C, the set of decision classes;
y, the object to be classified.
(1) N ? getNearestNeighbors(y,K)
(2) ?
1
(y) ? 0, ?
2
(y) ? 0, Class ??
(3) ?C ?C
(4) if ((R?C)(y) ? ?
1
(y)&&(R?C)(y) ? ?
2
(y))
(5) Class ? C
(6) ?
1
(y) ? (R?C)(y), ?
2
(y) ? (R?C)(y)
(7) output Class
Fig.3. The fuzzyrough nearest neighbour algorithm
When using FRNNFRS, the use of K is not required in principle: as R(x, y)
gets smaller, x tends to have only have a minor influence on (R?C)(y)and
(R?C)(y). For FRNNVQRS, this may generally not be true, because R(x, y)
appears in the numerator as well as the denominator of (6) and (7).
Furthermore, the algorithm is dependent on the choice of the fuzzy toler
ance relation R A general way of constructing R is as follows: given the set of
conditional attributes C, R is defined by
R(x, y)=min
a?C
R
a
(x, y) (13)
in which R
a
(x, y) is the degree to which objects x and y are similar for attribute
a. Possible options include
R
1
a
(x, y)=exp
parenleftbigg
?
(a(x) ? a(y))
2
2?
a
2
parenrightbigg
(14)
R
2
a
(x, y)=1?
a(x) ? a(y)
a
max
? a
min

(15)
316 R. Jensen and C. Cornelis
where ?
a
2
is the variance of attribute a,anda
max
and a
min
are the maximal and
minimal occurring value of that attribute.
5 Experimentation
This section presents the initial experimental evaluation of the classification
methods FNN, FRNNO, FRNNFRS and FRNNVQRS for the task of pattern
classification, over nine benchmark datasets from [2] and [11]. The details of the
datasets used can be found in table 1. All of them have a crisp decision attribute.
Table 1. Dataset details
Dataset Objects Attributes
Cleveland 297 14
Glass 214 10
Heart 270 14
Ionosphere 230 35
Letterdgoq 3114 17
Olitos 120 26
Water 2 390 39
Water 3 390 39
Wine 178 14
5.1 Experimental Setup
K is initialized as U, the number of objects in the training dataset and then
decremented by 1/30
th
of U each time, resulting in 30 experiments for each
dataset. For each choice of parameter K,2? 10fold crossvalidation is per
formed. For FNN and FRNNO, m is set to 2. For the new approaches, the
fuzzy relation given in equation (15) was chosen. In the FRNNFRS approach,
we used the min tnorm and the KleeneDienes implicator I defined by I(x, y)=
max(1?x, y). The FRNNVQRS approach was implemented using Q
l
= Q
(0.1,0.6)
and Q
u
= Q
(0.2,1.0)
, according to the general formula
Q
(?,?)
(x)=
?
?
?
?
?
?
?
?
0,x? ?
2(x??)
2
(???)
2
,?? x ?
?+?
2
1 ?
2(x??)
2
(???)
2
,
?+?
2
? x ? ?
1,?? x
5.2 Comparative Investigation
The results of the experiments are shown in Figure 4. Several interesting observa
tions can be made from them. First, for all but one dataset (letterdgoq,which
was used in [11]), either FRNNFRS or FRNNVQRS yields the best results.
Overall, FRNNFRS produces the most consistent results. This is particularly
A New Approach to FuzzyRough Nearest Neighbour Classification 317
Fig.4. Classification accuracy for the four methods and di?erent values of K
318 R. Jensen and C. Cornelis
remarkable considering the inherent simplicity of the method. FRNNVQRS is
best for cleveland and heart, which might be attributed to the comparative
presence of noise in those datasets, but it performs rather disappointing for a
number of other datasets (glass, letterdgoq, wine, olitos).
It is also interesting to consider the influence of the number of nearest neigh
bours. Both FRNNFRS and FRNNO remain relatively una?ected by changes
in K. This could be explained in that, for FRNNFRS, an infimum and supre
mum are used which can be thought of as a worst case and best case respectively.
When more neighbours are considered, R(x, y) values decrease as these neigh
bours are less similar, hence I(R(x, y),C(x)) increases, and T(R(x, y),C(x))
decreases. In other words, the more distant a neighbour is, the more unlikely
it is to change the infimum and supremum value. For FRNNO, again R(x, y)
decreases when more neighbours are added, and hence the value R(x, y)C(x)
that is added to the numerator is also small. Since each neighbour has the same
weight in the denominator, the ratios stay approximately the same when adding
new neighbours.
For FNN and FRNNVQRS, increasing K can have a significant e?ect on clas
sification accuracy. This is most clearly observed in the results for the olitos
data, where there is a clear downward trend. For FRNNVQRS, the ratio Ry ?
C/Ry has to be calculated. Each neighbour has a di?erent weight in the de
nominator, so the ratios can fluctuate considerably even when adding distant
neighbours.
6 Conclusion and Future Work
This paper has presented two new techniques for fuzzyrough classification based
on the use of lower and upper approximations w.r.t. fuzzy tolerance relations.
The di?erence between them is in the definition of the approximations: while
FRNNFRS uses ?traditional? operations based on a tnorm and an implicator,
FRNNVQRS uses a fuzzy quantifierbased approach. The results show that
these methods are e?ective, and that they are competitive with existing methods
such as the fuzzy Knearest neighbour and the fuzzyrough ownership function
approach. Further investigation, however, is still needed, to adequately explain
the impact of the choice of fuzzy relations, connectives and quantifiers.
Also, the impact of a feature selection preprocessing step upon classification
accuracy needs to be investigated. It is expected that feature selectors that in
corporate fuzzy relations expressing closeness of objects (see e.g. [5,7]) should be
able to further improve the e?ectiveness of the classification methods presented
here.
Finally, an important challenge is to adapt the algorithms so that they can
deal with continuous decision attributes. In this case, we need to predict the
membership of a test object to di?erent, possibly overlapping classes. Such a
prediction can be based on the test object?s membership degrees to the lower
and/or upper approximation (e.g., on the average of these two values).
A New Approach to FuzzyRough Nearest Neighbour Classification 319
Acknowledgment
Chris Cornelis would like to thank the Research Foundation?Flanders for fund
ing his research.
References
1. Bian, H., Mazlack, L.: FuzzyRough NearestNeighbor Classification Approach. In:
Proceeding of the 22nd International Conference of the North American Fuzzy
Information Processing Society (NAFIPS), pp. 500?505 (2003)
2. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. Irvine,
University of California (1998), http://archive.ics.uci.edu/ml/
3. Cornelis, C., De Cock, M., Radzikowska, A.M.: Vaguely Quantified Rough Sets. In:
An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.)
RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 87?94. Springer, Heidelberg (2007)
4. Cornelis, C., De Cock, M., Radzikowska, A.M.: Fuzzy Rough Sets: from Theory
into Practice. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook of
Granular Computing. Wiley, Chichester (2008)
5. Cornelis, C., Hurtado Mart??n, G., Jensen, R., Slezak, D.: Feature Selection with
Fuzzy Decision Reducts. In: Proceedings of 3rd International Conference on Rough
Sets and Knowledge Technology (RSKT2008) (2008)
6. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York
(1973)
7. Jensen, R., Shen, Q.: FuzzyRough Sets Assisted Attribute Selection. IEEE Trans
actions on Fuzzy Systems 15(1), 73?89 (2007)
8. Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy Knearest neighbor algorithm.
IEEE Trans. Systems Man Cybernet. 15(4), 580?585 (1985)
9. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer
Academic Publishing, Dordrecht (1991)
10. Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy
Sets and Systems 126(2), 137?155 (2002)
11. Sarkar, M.: FuzzyRough nearest neighbors algorithm. Fuzzy Sets and Systems 158,
2123?2152 (2007)
12. Wang, X., Yang, J., Teng, X., Peng, N.: FuzzyRough Set Based Nearest Neighbor
Clustering Classification Algorithm. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS
(LNAI), vol. 3613, pp. 370?373. Springer, Heidelberg (2005)