Interval-valued Fuzzy-Rough Feature Selection and
Application for Handling Missing Values in Datasets
Richard Jensen and Qiang Shen
Dept. of Computer Science
Aberystwyth University,
Ceredigion, Wales, UK
frkj,qqsg@aber.ac.uk
Abstract
One of the many successful applications
of rough set theory has been to the area
of feature selection. The rough set ideol-
ogy of using only the supplied data and
no other information has many bene ts,
where most other methods require sup-
plementary knowledge. Fuzzy-rough set
theory has recently been proposed as an
extension of this, in order to better han-
dle the uncertainty present in real data.
However, following this approach, there
has been no investigation (theoretical or
otherwise) into how to deal with miss-
ing values e ectively, another problem en-
countered when using real world data.
This paper proposes an extension of the
fuzzy-rough feature selection methodol-
ogy, based on interval-valued fuzzy sets,
as a means to counter this problem via
the representation of missing values in an
intuitive way.
1 Introduction
Lately there has been great interest in devel-
oping computational intelligence methodologies
which are capable of dealing with imprecision
and uncertainty, and the resounding amount of
research currently being done in the areas re-
lated to fuzzy and rough sets [12] is represen-
tative of this. The success of rough set theory
is due in part to three aspects of the theory.
Firstly, only the facts hidden in data are anal-
ysed. Secondly, no additional information about
the data is required for data analysis such as
thresholds or expert knowledge on a particular
domain. Thirdly, it nds a minimal knowledge
representation for data. As rough set theory
handles only one type of imperfection found in
data, it is complementary to other concepts for
the purpose, such as fuzzy set theory. The two
elds may be considered analogous in the sense
that both can tolerate inconsistency and uncer-
tainty - the di erence being the type of uncer-
tainty and their approach to it; fuzzy sets are
concerned with vagueness, rough sets are con-
cerned with indiscernibility.
Many deep relationships have been estab-
lished and more so, most of the recent studies
have concluded at this complementary nature of
the two methodologies, especially in the context
of granular computing [1]. Therefore, it is de-
sirable to extend and hybridize the underlying
concepts to deal with additional aspects of data
imperfection. Such developments o er a high
degree of exibility and provide robust solutions
and advanced tools for data analysis [9]. How-
ever, there has been no investigation into how
such hybridizations may model and cope with
missing values in datasets. This is a severely lim-
iting factor for the application of these powerful
techniques. In this paper, a further extension
to fuzzy-rough set theory is proposed, interval-
valued fuzzy-rough sets, in order to alleviate this
problem. As a result of this, a new feature selec-
tion method is developed that not only handles
missing values, but also alleviates the problem
of de ning overly-speci c type-1 fuzzy similarity
relations (i.e. those with crisp membership func-
tions) through the use of interval-valued fuzzy
sets.
The remainder of this paper is structured as
follows. In Section 2, the fuzzy-rough hybridi-
sation process is reviewed by brie y recalling
its ingredients (fuzzy sets and rough sets) as
well as its resulting end products (fuzzy-rough
sets). Section 3 focuses on the proposed ap-
proach, interval-valued feature selection. Initial
experimental results that demonstrate the po-
tential of the approach are presented in Section
4. Finally, Section 5 concludes the paper and
outlines some ideas for future work.
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 59
2 Theoretical Background
2.1 Fuzzy Sets
Recall that a fuzzy set [14] in U is an U![0;1]
mapping, while a fuzzy relation in U is a fuzzy
set in U U. For all y in U, the R-foreset of y is
the fuzzy set Ry de ned by Ry(x) = R(x;y)
for all x in U. If R is re exive and symmetric,
i.e., R(x;x) = 1 and R(x;y) = R(y;x) hold
for all x and y in U, then R is called a fuzzy
tolerance relation.
Fuzzy logic connectives play an important
role in the hybridisation process. A triangu-
lar norm (t-norm for short) T is any increas-
ing, commutative and associative [0;1]2![0;1]
mapping satisfying T(1;x) = x, for all x in
[0;1]. Common examples of t-norms include
the minimum, the product and TL de ned by
TL(x;y) = max(0;x+y 1) for x;y in [0,1]. An
implicator is any [0;1]2![0;1]-mapping I that
is decreasing in its rst, and increasing in its
second component, and that satis esI(0;0) = 1
and I(1;x) = x, for all x in [0;1].
2.1.1 Interval-valued Fuzzy Sets
An interval-valued fuzzy set eA in U is an ordered
triple of the form
eA =fhx; A
(x); A (x)ijx2Ug (1)
where A (x); A 2 [0;1] are the lower and
upper membership functions which satisfy 0
A (x) A (x) 1; 8x2U. The lower and
upper membership functions correspond to the
lower and upper bound of a closed interval de-
scribing the membership of x to eA.
2.2 Rough Sets
Let I = (U;A) be an information system, where
U is a non-empty set of nite objects (the uni-
verse of discourse) and A is a non-empty nite
set of attributes such that a : U!Va for every
a2A. Va is the set of values that attribute a
may take. With any P A there is an associ-
ated equivalence relation IND(P):
IND(P) =f(x;y)2U2j8a2P; a(x) = a(y)g
(2)
The partition of U, generated by IND(P) is de-
notedU/IND(P) (orU/P for simplicity) and can
be calculated as follows:
U=IND(P) = fU=IND(fag)ja2Pg; (3)
where is speci cally de ned as follows for sets
A and B:
A B =fX\YjX2A;Y 2B;X\Y 6=;g (4)
If (x;y) 2IND(P), then x and y are indis-
cernible by attributes from P. The equivalence
classes of the P-indiscernibility relation are de-
noted [x]P.
Let X U. X can be approximated using
only the information contained within P by con-
structing the P-lower and P-upper approxima-
tions of X:
PX = fx2Uj[x]P Xg (5)
PX = fx2Uj[x]P \X6=;g (6)
The tuple hPX;PXi is called a rough set.
2.2.1 Feature Selection
Let P and Q be sets of attributes inducing equiv-
alence relations over U, then the positive region
can be de ned as:
POSP(Q) =
[
X2U=Q
PX (7)
The positive region contains all objects of U
that can be classi ed to classes of U/Q using
the information in attributes P. Based on this
de nition, dependencies between attributes can
be determined. For P, Q A, it is said that Q
depends on P in a degree k (0 k 1), denoted
P )k Q, if
k = P(Q) = jPOSP(Q)jjUj (8)
If k = 1, Q depends totally on P, if 0 < k <
1, Q depends partially (in a degree k) on P, and
if k = 0 then Q does not depend on P.
The reduction of attributes is achieved by
comparing equivalence relations generated by
sets of attributes. Attributes are removed so
that the reduced set provides the same predic-
tive capability of the decision attribute as the
original. A reduct Rmin is de ned as a mini-
mal subset R of the initial attribute set C such
that for a given set of attributes D, R(D) =
C(D). From the literature, R is a minimal sub-
set if R fag(D) 6= R(D) for all a 2 R. This
means that no attributes can be removed from
the subset without a ecting the dependency de-
gree. Hence, a minimal subset by this de ni-
tion may not be the global minimum (a reduct
of smallest cardinality). The goal of rough set-
based feature selection is to discover reducts of
smallest cardinality.
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 60
2.3 Fuzzy-Rough Sets
It is not possible in the original rough set theory
to say whether two attribute values are similar
and to what extent they are the same; for exam-
ple, two close values may only di er as a result
of noise, but in rough set theory they are con-
sidered to be as di erent as two values of a dif-
ferent order of magnitude. It is, therefore, desir-
able to develop techniques to provide the means
of knowledge modelling for crisp and real-value
attributed datasets which utilises the extent to
which values are similar. This can be achieved
through the use of fuzzy-rough sets [6]. Fuzzy-
rough sets encapsulate the related but distinct
concepts of vagueness (for fuzzy sets) and indis-
cernibility (for rough sets), both of which occur
as a result of uncertainty in knowledge
De nitions for the fuzzy lower and upper ap-
proximations can be found in [13], where a T-
transitive fuzzy similarity relation is used to ap-
proximate a fuzzy concept X:
RPX(x) = inf
y2U
I( RP (x;y); X(y)) (9)
RPX(x) = sup
y2U
T( RP (x;y); X(y)) (10)
Here, I is a fuzzy implicator and T a t-norm.
RP is the fuzzy similarity relation induced by
the subset of features P:
RP (x;y) =Ta2Pf Ra(x;y)g (11)
Ra(x;y) is the degree to which objects x and y
are similar for feature a, and may be de ned in
many ways, for example:
Ra(x;y) = 1 ja(x) a(y)jja
max aminj
(12)
Ra(x;y) = exp( (a(x) a(y))
2
2 a2 ) (13)
Ra(x;y) = max(min( (a(y) (a(x) a))(a(x) (a(x)
a))
;
((a(x) + a) a(y))
((a(x) + a) a(x));0)(14)
where a2 is the variance of feature a. As
these relations do not necessarily display T-
transitivity, the fuzzy transitive closure can be
computed for each attribute. The choice of re-
lation is largely determined by the intended ap-
plication. For feature selection, a relation such
as (14) may be appropriate as this permits only
small di erences between attribute values of dif-
fering objects. For classi cation tasks, such as
fuzzy-rough nearest neighbours [8], a more grad-
ual and inclusive relation such as (12) should be
used.
2.3.1 Feature Selection
In a similar way to the original crisp rough set
approach, the fuzzy positive region can be de-
ned as [10]:
POSR
P (D)
(x) = sup
X2U=D
RPX(x) (15)
An important issue in data analysis is dis-
covering dependencies between attributes. The
fuzzy-rough degree of dependency of D on the
attribute subset P can be de ned in the follow-
ing way:
0P(D) =
P
x2U
POSR
P (D)
(x)
jUj (16)
A fuzzy-rough reduct R can be de ned as
a minimal subset of features that preserves the
dependency degree of the entire dataset, i.e.
0R(D) = 0C(D). Based on this, a fuzzy-
rough greedy hill-climbing algorithm can be con-
structed that uses equation (16) to gauge subset
quality. In [10], it has been shown that the de-
pendency function is monotonic and that fuzzy
discernibility matrices may also be used to dis-
cover reducts. However, there is no mechanism
for modelling missing values in this framework,
and is therefore limited in its application to real
world datasets. This motivates the work pro-
posed in the following section.
3 Interval-valued FRFS
Central to traditional fuzzy-rough feature selec-
tion is the fuzzy tolerance relation. From this,
the fuzzy-rough lower approximations are con-
structed which then form the fuzzy positive re-
gions utilised in the degree of dependency mea-
sure. Thus, the starting point for the process,
type-1 fuzzy tolerance, is critical for its success.
It is recognised that type-1 approaches are un-
able to address particular types of uncertainty
due to their requirement of totally crisp mem-
bership functions [11]. An interval-valued ap-
proach may therefore be able to better handle
this uncertainty and at the same time model
the uncertainty inherent in missing values. Cur-
rently, there is no way to handle such values in
fuzzy-rough set theory. Thus, the starting point
for the proposed work is the interval-valued tol-
erance relation fRa(x;y). The constituent fuzzy
relations for individual attributes can be de ned
via an upper (Ra ) and lower (Ra ) membership
function, for example:
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 61
Ra (x;y) = 1
ja(x) a(y)j
jamax aminj
m
(17)
Ra (x;y) = 1 ja(x) a(y)jja
max aminj
(18)
for m 2 (0;1). If m = 1, equation (17) de-
generates to a standard type-1 fuzzy tolerance
relation. As with type-1 fuzzy-rough feature se-
lection, composition of relations is achieved by
conjunctively combining the individual fuzzy re-
lations fRa with a t-norm T:
fR
P
(x;y) = Ta2Pf fR
a
(x;y)g (19)
= [Ta2Pf Ra (x;y)g;Ta2Pf Ra (x;y)g]
Based on the de nitions above, the interval-
valued P-lower and P-upper approximation of a
concept eX are here de ned as
^R
PX
(x) = inf
y2U
I( fR
P
(x;y); eX(y)) (20)
^
RPX
(x) = sup
y2U
T( fR
P
(x;y); eX(y))(21)
where fRP(x;y) is an interval-valued fuzzy toler-
ance relation. The tuple h]RPX; ]RPXi is called
an interval-valued fuzzy-rough set.
3.1 Feature Selection
For the purposes of feature selection, the work
presented in this paper only considers the use of
the interval-valued lower approximation:
^R
PX
(x) = inf
y2U
I( fR
P
(x;y); eX(x))
= inf
y2U
[I( RP (x;y); X (x));
I( RP (x;y); X (x))]
This provides a measure of the uncertainty of
membership of an object to a given concept eX
as a result of the underlying uncertainty in the
similarity between this object and others in the
universe. Note that the use of RP and RP is
reversed due to the properties of fuzzy implica-
tion. The resulting interval is a lower and upper
bound on the true membership degree. Based
on this, the interval-valued positive region is de-
ned:
^POS
P(D)
(x) = sup
eX2U=D
^R
PX
(x) (22)
From this, the interval-valued degree of depen-
dency of decision features D on a feature subset
P is de ned as:
~ P(D) = j]POSP(D)jjUj (23)
=
2
4X
y2U
POSP (D)(y)
jUj ;
X
y2U
POSP (D)(y)
jUj
3
5
In [5], a normalised version of depen-
dency was introduced for FRFS. The equivalent
interval-valued normalised version is as follows:
~ P(D) = j]POSP(D)j
j]POSC(D)j
(24)
=
2
41UX
y2U
POSP (D)(y)
POSC (D)(y);
1
U
X
y2U
POSP (D)(y)
POSC (D)(y)
3
5
Here, ~ P(D) is an interval [ P (D); P (D)] that
describes the extent to which the features in P
are predictive of the decision feature(s). P is
called a fuzzy decision superreduct to degree
if ~ P(D) , and a fuzzy decision reduct to de-
gree if moreover for all P0 P, ~ P0(D) < .
Note that if a type-1 fuzzy tolerance relation is
used, then these de nitions degenerate to their
traditional fuzzy-rough counterparts. Core fea-
tures (i.e. those that cannot be removed without
introducing inconsistencies) may be determined
by considering the change in dependency of the
full set of conditional features when individual
attributes are removed:
Core(C) =fa2Cj~ C fag(D)6= ~ C(D)g (25)
Subset search can be conducted by whichever
mechanism is appropriate; for example, greedy
hill-climbing, genetic algorithms [2], ant colony
optimization [7] etc. Here, the standard greedy
hill-climbing approach is adopted. The depen-
dency degree is monotonic in both the lower and
upper bounds, and so search continues until [1;1]
is reached (no uncertainty) or there is no im-
provement in dependency.
3.2 Missing Values
The underlying interval-valued tolerance rela-
tion can be modi ed in order to model the uncer-
tainty resulting from missing values. If an object
contains a missing value for a particular feature,
then the resulting degree of similarity with other
objects is unknown. In an interval-valued con-
text, this can be modelled by returning the unit
interval when an attribute value is missing for
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 62
one or both objects:
fR
a
(x;y) =
(
fR
a
(x;y) if x;y6= ;
[0;1] otherwise (26)
where missing values are denoted by . Again,
relations are composed via a t-norm. The result-
ing interval-valued lower approximations, posi-
tive regions and dependency can then be used
to gauge subset quality in an identical manner
to that of the approach in section 3.1. The re-
sulting algorithm is called FRFS-II.
4 Initial experimentation
To test the robustness of the proposed approach
in the presence of missing values four bench-
mark datasets were used, obtained from [3] and
containing no missing values initially. Values
were randomly corrupted to become missing
based on a supplied probability of mutation,
and the FRFS-II algorithm run, using m = 0:9
for the tolerance relation. This was carried
out ten times for each dataset. Table 1 shows
the dataset details as well as the probabilities
of value corruption and the resulting ranges of
numbers of missing values; e.g. for the water
2 dataset and probability of mutation 0.0005%,
datasets were produced with 4 to 14 missing val-
ues present. For comparison, the type-1 FRFS
algorithm was run on the original, uncorrupted
data. As the corruption probability increases,
the amount of missing data greatly increases,
with the nal column showing a degree of miss-
ing values not often seen in real data but useful
to test the robustness of the approach.
The resulting reduct sizes (averaged over re-
peated runs) can be seen in table 2. All discov-
ered reducts produced a dependency degree of
1 for the uncorrupted data and hence were all
fuzzy-rough reducts. This shows the resilience
of the approach when faced with corrupted data,
but may not necessarily be the case in general
as it will depend on the extent of missing values
and similarity relation chosen. For small num-
bers of missing values, FRFS-II manages to lo-
cate identical or equivalent reducts to the orig-
inal FRFS approach (which was applied to the
uncorrupted data). As the extent of data cor-
ruption increases, the task becomes more dif-
cult, with FRFS-II selecting more features to
compensate for the lack of information but still
producing valid fuzzy-rough reducts.
5 Conclusion
This paper proposed an interval-valued ap-
proach to fuzzy-rough feature selection that suc-
cessfully handles the uncertainty that cannot
be modelled by a type-1 approach. In partic-
ular, this method can handle missing values ef-
fectively and in an intuitive way.
Future work will involve further experimen-
tal investigations, particularly to see if the trend
observed in this paper is maintained for other
datasets. This will include an analysis of the im-
pact of the choice of similarity relation and pa-
rameter m. There will also need to be an evalua-
tion of the e ectiveness of the discovered reducts
for the task of classi cation. Recent work in
the area of type-1 FRFS for this purpose has
shown that such reducts improve performance
[10] and a similar improvement is expected here.
Additionally, the work in [4] proposed novel t-
norms and implicators for interval-valued fuzzy
sets; these should be of great bene t for interval-
valued FRFS.
References
[1] A. Bargiela, W. Pedrycz, Granular Comput-
ing. An introduction. Kluwer Academic Pub-
lishers, 2002.
[2] A.T. Bjorvand and J. Komorowski, Practi-
cal Applications of Genetic Algorithms for
E cient Reduct Computation, Proc. 15th
IMACS World Congress on Scienti c Com-
putation, Modelling and Applied Mathemat-
ics, A. Sydow, ed., vol. 4, pp. 601{606, 1997.
[3] C.L. Blake, C.J. Merz, UCI Reposi-
tory of Machine Learning Databases.
Irvine, University of California, 1998.
http://archive.ics.uci.edu/ml/
[4] C. Cornelis, G. Deschrijver, E.E. Kerre,
Advances and challenges in interval-valued
fuzzy logic, Fuzzy Sets and Systems, vol. 157,
no. 5, pp. 622{627, 2006.
[5] C. Cornelis, G. Hurtado Mart n, R. Jensen,
D. Sl ezak, Feature Selection with Fuzzy De-
cision Reducts, 3rd Int. Conf. on Rough Sets
and Knowledge Technology (RSKT?08), pp.
284{291, 2008.
[6] D. Dubois, H. Prade, Rough fuzzy sets and
fuzzy rough sets, International Journal of
General Systems, vol. 17, 91{209, 1990.
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 63
Table 1: Number of missing values
Dataset Objects Features Missing values (range)
0.0005% 0.001% 0.005% 0.01% 0.1%
Heart 270 14 1-3 1-6 12-24 26-43 321-387
Water 2 390 39 4-14 9-30 62-88 123-175 1446-1533
Water 3 390 39 3-13 12-23 65-98 128-163 1462-1531
Wine 178 14 1-5 1-4 7-15 20-30 206-246
Table 2: Reduct size for various data corruption probabilities
Dataset Features Original Average reduct size
FRFS 0.0005% 0.001% 0.005% 0.01% 0.1%
Heart 14 8 8 8 9.1 10.3 14
Water 2 39 7 7 7 7 7.2 11.5
Water 3 39 7 7 7 7.2 7.9 11.8
Wine 14 6 6 6 6 6 9.1
[7] R. Jensen, Performing Feature Selection
with ACO, Swarm Intelligence and Data
Mining, A. Abraham, C. Grosan and V.
Ramos (eds.), Studies in Computational In-
telligence, vol. 34, pp. 45{73, 2006.
[8] R. Jensen, C. Cornelis, A New Approach to
Fuzzy-Rough Nearest Neighbour Classi ca-
tion, Proceedings of the 6th Int. Conf. on
Rough Sets and Current Trends in Comput-
ing, to appear, 2008.
[9] R. Jensen, Q. Shen, Computational Intel-
ligence and Feature Selection: Rough and
Fuzzy Approaches, IEEE Press/Wiley &
Sons, 2008.
[10] R. Jensen, Q. Shen, New approaches to
fuzzy-rough feature selection, IEEE Trans-
actions on Fuzzy Systems, to appear, 2008.
[11] J.M. Mendel, R.I. John, Type-2 Fuzzy Sets
Made Simple, IEEE Transactions on Fuzzy
Systems, vol. 10, no. 2, 117-127, 2002.
[12] Z. Pawlak, Rough sets, International Jour-
nal of Computer and Information Sciences,
vol. 11(5), 341{356, 1982.
[13] A.M. Radzikowska, E.E. Kerre, E.E., A
comparative study of fuzzy rough sets, Fuzzy
Sets and Systems, vol. 126, 137{156, 2002.
[14] L.A. Zadeh, Fuzzy sets, Information and
Control, vol. 8, 338{353, 1965.
Proceedings of the 2008 UK Workshop on Computational Intelligence
Page 64