A New Approach to Fuzzy-Rough Nearest Neighbour Classification Richard Jensen 1 and Chris Cornelis 2 1 Dept. of Comp. Sci., Aberystwyth University, Ceredigion, SY23 3DB, Wales, UK rkj@aber.ac.uk 2 Dept. of Appl. Math. and Comp. Sci., Ghent University, Gent, Belgium Chris.Cornelis@UGent.be Abstract. In this paper, we present a new fuzzy-rough nearest neigh- bour (FRNN) classification algorithm, as an alternative to Sarkar?s fuzzy- rough ownership function (FRNN-O) approach. By contrast to the latter, our method uses the nearest neighbours to construct lower and upper approximations of decision classes, and classifies test instances based on their membership to these approximations. In the experimental analysis, we evaluate our approach with both classical fuzzy-rough approxima- tions (based on an implicator and a t-norm), as well as with the recently introduced vaguely quantified rough sets. Preliminary results are very good, and in general FRNN outperforms both FRNN-O, as well as the traditional fuzzy nearest neighbour (FNN) algorithm. 1 Introduction The K-nearest neighbour (KNN) algorithm [6] is a well-known classification technique that assigns a test object to the decision class most common among its K nearest neighbours, i.e., the K training objects that are closest to the test object. An extension of the KNN algorithm to fuzzy set theory (FNN) was introduced in [8]. It allows partial membership of an object to di?erent classes, and also takes into account the relative importance (closeness) of each neighbour w.r.t. the test instance. However, as Sarkar correctly argued in [11], the FNN algorithm has problems dealing adequately with insu?cient knowledge. In particular, when every training pattern is far removed from the test object, and hence there are no suitable neighbours, the algorithm is still forced to make clear-cut predictions. This is because the predicted membership degrees to the various decision classes always need to sum up to 1. To address this problem, Sarkar [11] introduced a so-called fuzzy-rough owner- ship function that, when plugged into the conventional FNN algorithm, produces class confidence values that do not necessarily sum up to 1. However, this method (called FRNN-O throughout this paper) does not refer to the main ingredients of rough set theory, i.e., lower and upper approximation. In this paper, therefore, we present an alternative approach, which uses a test object?s nearest neigh- bours to construct the lower and upper approximation of each decision class, and then computes the membership of the test object to these approximations. C.-C. Chan et al. (Eds.): RSCTC 2008, LNAI 5306, pp. 310?319, 2008. c? Springer-Verlag Berlin Heidelberg 2008 A New Approach to Fuzzy-Rough Nearest Neighbour Classification 311 The method is very flexible, as there are many options to define the fuzzy-rough approximations, including the traditional implicator/t-norm based model [10], as well as the vaguely quantified rough set (VQRS) model [3], which is more robust in the presence of noisy data. This paper is structured as follows. Section 2 provides necessary details for fuzzy rough set theory, while Section 3 is concerned with the existing fuzzy (- rough) NN approaches. Section 4 outlines our algorithm, while comparative ex- perimentation on a series of crisp classification problems is provided in Section 5. The paper is concluded in section 6. 2 Hybridization of Rough Sets and Fuzzy Sets Rough set theory (RST) [9] provides a tool by which knowledge may be extracted from a domain in a concise way; it is able to retain the information content whilst reducing the amount of knowledge involved. Central to RST is the concept of indiscernibility. Let (U, A) be an information system 1 ,whereU is a non-empty set of finite objects (the universe of discourse) and A is a non-empty finite set of attributes such that a : U ? V a for every a ? A. V a is the set of values that attribute a may take. With any B ? A there is an associated equivalence relation R B : R B = {(x, y) ? U 2 |?a ? B, a(x)=a(y)} (1) If (x, y) ? R B ,thenx and y are indiscernible by attributes from B. The equiv- alence classes of the B-indiscernibility relation are denoted [x] B .LetA ? U. A can be approximated using the information contained within B by constructing the B-lower and B-upper approximations of A: R B ?A = {x ? U | [x] B ? A} (2) R B ?A = {x ? U | [x] B ? A negationslash= ?} (3) The tuple ?R B ?A, R B ?A? is called a rough set. The process described above can only operate e?ectively with datasets con- taining discrete values. As most datasets contain real-valued attributes, it is necessary to perform a discretization step beforehand. A more intuitive and flexible approach, however, is to model the approximate equality between ob- jects with continuous attribute values by means of a fuzzy relation R in U, i.e., a U ? [0, 1] mapping that assigns to each couple of objects their degree of simi- larity. In general, it is assumed that R is at least a fuzzy tolerance relation, that is, R(x, x)=1andR(x, y)=R(y,x)forx and y in U.Giveny in U,itsforeset Ry is defined by Ry(x)=R(x, y) for every x in U. Given a fuzzy tolerance relation R and a fuzzy set A in U, the lower and upper approximation of A by R can be constructed in several ways. A general definition [4,10] is the following: 1 In the classification problems considered further on in this paper, A = C?{d},where C represents the set of conditional attributes, and d is the decision or class attribute. 312 R. Jensen and C. Cornelis (R?A)(x)=inf y?U I(R(x, y),A(y)) (4) (R?A)(x)=sup y?U T(R(x, y),A(y)) (5) Here, I is an implicator 2 and T at-norm 3 .WhenA is a crisp (classical) set and R is an equivalence relation in U, the traditional lower and upper approximation are recovered. Just like their crisp counterparts, formulas (4) and (5) (henceforth called the FRS approximations) are quite sensitive to noisy values. That is, a change in a single object can result in drastic changes to the approximations (due to the use of sup and inf, which generalize the existential and universal quantifier, respectively). In the context of classification tasks, this behaviour may a?ect accuracy adversely. Therefore, in [3], the concept of vaguely quantified rough sets (VQRS) was introduced. It uses the linguistic quantifiers ?most? and ?some?, as opposed to the traditionally used crisp quantifiers ?all? and ?at least one?, to decide to what extent an object belongs to the lower and upper approximation. Given a couple (Q u ,Q l ) of fuzzy quantifiers 4 that model ?most? and ?some?, the lower and upper approximation of A by R are defined by (R?A)(y)=Q u parenleftbigg |Ry ? A| |Ry| parenrightbigg = Q u ? ? summationtext x?X min(R(x, y),A(x)) summationtext x?X R(x, y) ? ? (6) (R?A)(y)=Q l parenleftbigg |Ry ? A| |Ry| parenrightbigg = Q l ? ? summationtext x?X min(R(x, y),A(x)) summationtext x?X R(x, y) ? ? (7) where the fuzzy set intersection is defined by the min t-norm and the fuzzy set cardinality by the sigma-count operation. As an important di?erence to (4) and (5), the VQRS approximations do not extend the classical rough set approxi- mations, in a sense that when A and R are crisp, R?A and R?A may still be fuzzy. 3 Fuzzy Nearest Neighbour Classification The fuzzy K-nearest neighbour (FNN) algorithm [8] was introduced to classify test objects based on their similarity to a given number K of neighbours (among the training objects), and these neighbours? membership degrees to (crisp or 2 An implicator I is a [0, 1] 2 ? [0, 1] mapping that is decreasing in its first and increasing in its second argument, satisfying I(0, 0) = I(0,1) = I(1, 1) = 1 and I(1, 0) = 0. 3 At-normT is an increasing, commutative, associative [0, 1] 2 ? [0, 1] mapping sat- isfying T(x,1) = x for x in [0, 1]. 4 By a fuzzy quantifier, we mean an increasing [0, 1] ? [0, 1] mapping such that Q(0) = 0 and Q(1) = 1. A New Approach to Fuzzy-Rough Nearest Neighbour Classification 313 fuzzy) class labels. For the purposes of FNN, the extent C(y) to which an un- classified object y belongs to a class C is computed as: C(y)= summationdisplay x?N R(x, y)C(x)(8) where N is the set of object y?s K nearest neighbours, and R(x, y) is the [0,1]- valued similarity of x and y. In the traditional approach, this is defined in the following way: R(x, y)= ||y ? x|| ?2/(m?1) summationtext j?N ||y ? j|| ?2/(m?1) (9) where || ? || denotes the Euclidean norm, and m is a parameter that controls the overall weighting of the similarity. Assuming crisp classes, Figure 1 shows an application of the FNN algorithm that classifies a test object y to the class with the highest resulting membership. The complexity of this algorithm for the classification of one test pattern is O(|U| + K ?|C|). FNN(U,C,y,K). U, the training data; C, the set of decision classes; y, the object to be classified; K, the number of nearest neighbours. (1) N ? getNearestNeighbours(y,K); (2) ?C ?C (3) C(y)= summationtext x?N R(x,y)C(x) (4) output arg max C?C (C(y)) Fig.1. The fuzzy KNN algorithm Initial attempts to combine the FNN algorithm with concepts from fuzzy rough set theory were presented in [11,12]. In these papers, a fuzzy-rough own- ership function is constructed that attempts to handle both ?fuzzy uncertainty? (caused by overlapping classes) and ?rough uncertainty? (caused by insu?cient knowledge, i.e., attributes, about the objects). The fuzzy-rough ownership func- tion ? C of class C was defined as, for an object y, ? C (y)= summationtext x?U R(x, y)C(x) |U| (10) In this, the fuzzy relation R is determined by: R(x, y)=exp parenleftBigg ? summationdisplay a?C ? a (a(y) ? a(x)) 2/(m?1) parenrightBigg (11) where m controls the weighting of the similarity (as in FNN) and ? a is a para- meter that decides the bandwidth of the membership, defined as 314 R. Jensen and C. Cornelis ? a = |U| 2 summationtext x?U ||a(y) ? a(x)|| 2/(m?1) (12) ? C (y) is interpreted as the confidence with which y can be classified to class C. The corresponding crisp classification algorithm, called FRNN-O in this paper, can be seen in Figure 2. Initially, the parameter ? a is calculated for each attribute and all memberships of decision classes for test object y are set to 0. Next, the weighted distance of y from all objects in the universe is computed and used to update the class memberships of y via equation (10). Finally, when all training objects have been considered, the algorithm outputs the class with highest membership. The algorithm?s complexity is O(|C||U| + |U|?(|C| + |C|)). By contrast to the FNN algorithm, the fuzzy-rough ownership function con- siders all training objects rather than a limited set of neighbours, and hence no decision is required as to the number of neighbours to consider. The rea- soning behind this is that very distant training objects will not influence the outcome (as opposed to the case of FNN). For comparison purposes, the K- nearest neighbours version of this algorithm is obtained by replacing line (3) with N ? getNearestNeighbours(y,K). FRNN-O(U,C,C,y). U, the training data; C, the set of conditional features; C, the set of decision classes; y, the object to be classified. (1) ?a ? C (2) ? a = |U|/2 summationtext x?U ||a(y) ? a(x)|| 2/(m?1) (3) N ?|U| (4) ?C ?C, ? C (y)=0 (5) ?x ? N (6) d = summationtext a?C ? a (a(y) ? a(x)) 2 (7) ?C ?C (8) ? C (y)+ = C(x)?exp(?d 1/(m?1) ) |N| (9) output arg max C?C ? C (y) Fig.2. The fuzzy-rough ownership nearest neighbour algorithm It should be noted that the algorithm does not use fuzzy lower or upper approximations to determine class membership. A very preliminary attempt to do so was described in [1]. However, the authors did not state how to use the upper and lower approximations to derive classifications. 4 Fuzzy-Rough Nearest Neighbour (FRNN) Algorithm Figure 3 outlines our proposed algorithm, combining fuzzy-rough approximations with the ideas of the classical FNN approach. In what follows, FRNN-FRS and FRNN-VQRS denote instances of the algorithm where traditional, and VQRS, A New Approach to Fuzzy-Rough Nearest Neighbour Classification 315 approximations are used, respectively. The rationale behind the algorithm is that the lower and the upper approximation of a decision class, calculated by means of the nearest neighbours of a test object y, provide good clues to predict the membership of the test object to that class. In particular, if (R?C)(y) is high, it reflects that all (most) of y?s neighbours belong to C, while a high value of (R?C)(y) means that at least one (some) neighbour(s) belong(s) to that class, depending on whether the FRS or VQRS approximations are used. A classification will always be determined for y due to the initialisation of ? 1 (y)and? 2 (y) to zero in line (2). To perform crisp classi- fication, the algorithm outputs the decision class with the resulting best fuzzy lower and upper approximation memberships, seen in line (4) of the algorithm. This is only one way of utilising the information in the fuzzy lower and upper approximations to determine class membership, other ways are possible (such as combining them into a single measure) but are not investigated in this paper. The complexity of the algorithm is O(|C| ? (2|U|)). FRNN(U,C,y). U, the training data; C, the set of decision classes; y, the object to be classified. (1) N ? getNearestNeighbors(y,K) (2) ? 1 (y) ? 0, ? 2 (y) ? 0, Class ?? (3) ?C ?C (4) if ((R?C)(y) ? ? 1 (y)&&(R?C)(y) ? ? 2 (y)) (5) Class ? C (6) ? 1 (y) ? (R?C)(y), ? 2 (y) ? (R?C)(y) (7) output Class Fig.3. The fuzzy-rough nearest neighbour algorithm When using FRNN-FRS, the use of K is not required in principle: as R(x, y) gets smaller, x tends to have only have a minor influence on (R?C)(y)and (R?C)(y). For FRNN-VQRS, this may generally not be true, because R(x, y) appears in the numerator as well as the denominator of (6) and (7). Furthermore, the algorithm is dependent on the choice of the fuzzy toler- ance relation R A general way of constructing R is as follows: given the set of conditional attributes C, R is defined by R(x, y)=min a?C R a (x, y) (13) in which R a (x, y) is the degree to which objects x and y are similar for attribute a. Possible options include R 1 a (x, y)=exp parenleftbigg ? (a(x) ? a(y)) 2 2? a 2 parenrightbigg (14) R 2 a (x, y)=1? |a(x) ? a(y)| |a max ? a min | (15) 316 R. Jensen and C. Cornelis where ? a 2 is the variance of attribute a,anda max and a min are the maximal and minimal occurring value of that attribute. 5 Experimentation This section presents the initial experimental evaluation of the classification methods FNN, FRNN-O, FRNN-FRS and FRNN-VQRS for the task of pattern classification, over nine benchmark datasets from [2] and [11]. The details of the datasets used can be found in table 1. All of them have a crisp decision attribute. Table 1. Dataset details Dataset Objects Attributes Cleveland 297 14 Glass 214 10 Heart 270 14 Ionosphere 230 35 Letter-dgoq 3114 17 Olitos 120 26 Water 2 390 39 Water 3 390 39 Wine 178 14 5.1 Experimental Setup K is initialized as |U|, the number of objects in the training dataset and then decremented by 1/30 th of |U| each time, resulting in 30 experiments for each dataset. For each choice of parameter K,2? 10-fold cross-validation is per- formed. For FNN and FRNN-O, m is set to 2. For the new approaches, the fuzzy relation given in equation (15) was chosen. In the FRNN-FRS approach, we used the min t-norm and the Kleene-Dienes implicator I defined by I(x, y)= max(1?x, y). The FRNN-VQRS approach was implemented using Q l = Q (0.1,0.6) and Q u = Q (0.2,1.0) , according to the general formula Q (?,?) (x)= ? ? ? ? ? ? ? ? 0,x? ? 2(x??) 2 (???) 2 ,?? x ? ?+? 2 1 ? 2(x??) 2 (???) 2 , ?+? 2 ? x ? ? 1,?? x 5.2 Comparative Investigation The results of the experiments are shown in Figure 4. Several interesting observa- tions can be made from them. First, for all but one dataset (letter-dgoq,which was used in [11]), either FRNN-FRS or FRNN-VQRS yields the best results. Overall, FRNN-FRS produces the most consistent results. This is particularly A New Approach to Fuzzy-Rough Nearest Neighbour Classification 317 Fig.4. Classification accuracy for the four methods and di?erent values of K 318 R. Jensen and C. Cornelis remarkable considering the inherent simplicity of the method. FRNN-VQRS is best for cleveland and heart, which might be attributed to the comparative presence of noise in those datasets, but it performs rather disappointing for a number of other datasets (glass, letter-dgoq, wine, olitos). It is also interesting to consider the influence of the number of nearest neigh- bours. Both FRNN-FRS and FRNN-O remain relatively una?ected by changes in K. This could be explained in that, for FRNN-FRS, an infimum and supre- mum are used which can be thought of as a worst case and best case respectively. When more neighbours are considered, R(x, y) values decrease as these neigh- bours are less similar, hence I(R(x, y),C(x)) increases, and T(R(x, y),C(x)) decreases. In other words, the more distant a neighbour is, the more unlikely it is to change the infimum and supremum value. For FRNN-O, again R(x, y) decreases when more neighbours are added, and hence the value R(x, y)C(x) that is added to the numerator is also small. Since each neighbour has the same weight in the denominator, the ratios stay approximately the same when adding new neighbours. For FNN and FRNN-VQRS, increasing K can have a significant e?ect on clas- sification accuracy. This is most clearly observed in the results for the olitos data, where there is a clear downward trend. For FRNN-VQRS, the ratio |Ry ? C|/|Ry| has to be calculated. Each neighbour has a di?erent weight in the de- nominator, so the ratios can fluctuate considerably even when adding distant neighbours. 6 Conclusion and Future Work This paper has presented two new techniques for fuzzy-rough classification based on the use of lower and upper approximations w.r.t. fuzzy tolerance relations. The di?erence between them is in the definition of the approximations: while FRNN-FRS uses ?traditional? operations based on a t-norm and an implicator, FRNN-VQRS uses a fuzzy quantifier-based approach. The results show that these methods are e?ective, and that they are competitive with existing methods such as the fuzzy K-nearest neighbour and the fuzzy-rough ownership function approach. Further investigation, however, is still needed, to adequately explain the impact of the choice of fuzzy relations, connectives and quantifiers. Also, the impact of a feature selection preprocessing step upon classification accuracy needs to be investigated. It is expected that feature selectors that in- corporate fuzzy relations expressing closeness of objects (see e.g. [5,7]) should be able to further improve the e?ectiveness of the classification methods presented here. Finally, an important challenge is to adapt the algorithms so that they can deal with continuous decision attributes. In this case, we need to predict the membership of a test object to di?erent, possibly overlapping classes. Such a prediction can be based on the test object?s membership degrees to the lower and/or upper approximation (e.g., on the average of these two values). A New Approach to Fuzzy-Rough Nearest Neighbour Classification 319 Acknowledgment Chris Cornelis would like to thank the Research Foundation?Flanders for fund- ing his research. References 1. Bian, H., Mazlack, L.: Fuzzy-Rough Nearest-Neighbor Classification Approach. In: Proceeding of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS), pp. 500?505 (2003) 2. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. Irvine, University of California (1998), http://archive.ics.uci.edu/ml/ 3. Cornelis, C., De Cock, M., Radzikowska, A.M.: Vaguely Quantified Rough Sets. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds.) RSFDGrC 2007. LNCS (LNAI), vol. 4482, pp. 87?94. Springer, Heidelberg (2007) 4. Cornelis, C., De Cock, M., Radzikowska, A.M.: Fuzzy Rough Sets: from Theory into Practice. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook of Granular Computing. Wiley, Chichester (2008) 5. Cornelis, C., Hurtado Mart??n, G., Jensen, R., Slezak, D.: Feature Selection with Fuzzy Decision Reducts. In: Proceedings of 3rd International Conference on Rough Sets and Knowledge Technology (RSKT2008) (2008) 6. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973) 7. Jensen, R., Shen, Q.: Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Trans- actions on Fuzzy Systems 15(1), 73?89 (2007) 8. Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy K-nearest neighbor algorithm. IEEE Trans. Systems Man Cybernet. 15(4), 580?585 (1985) 9. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishing, Dordrecht (1991) 10. Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126(2), 137?155 (2002) 11. Sarkar, M.: Fuzzy-Rough nearest neighbors algorithm. Fuzzy Sets and Systems 158, 2123?2152 (2007) 12. Wang, X., Yang, J., Teng, X., Peng, N.: Fuzzy-Rough Set Based Nearest Neighbor Clustering Classification Algorithm. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3613, pp. 370?373. Springer, Heidelberg (2005)