Clus-DOWA: A New Dependent OWA Operator Tossapon Boongoen and Qiang Shen Abstract?Aggregation operators are crucial to integrating diverse decision makers? opinion. While minimum and max- imum can represent optimistic and pessimistic extremes, an Ordered Weighted Aggregation (OWA) operator is able to reflect varied human attitudes lying between the two using distinct weight vectors. Several weight determination techniques ignore characteristics of data being aggregated. In contrary, data-oriented operators like centered OWA and dependent OWA utilize the centralized data structure to generate reliable weights. Values near the center of a group receive higher weights than those further away. Despite its general applicability, this perspective entirely neglects any local data structures repre- senting strong agreements or consensus. This paper presents a new dependent OWA operator (Clus-DOWA) that applies distributed structure of data or data clusters to determine its weight vector. The reliability of weights created by DOWA and Clus-DOWA operators are experimentally compared in the tasks of classification and unsupervised feature selection. I. INTRODUCTION To aggregate valuable pieces of information, several aggre- gation operators have been developed to deliver a reasonable outcome upon which an intelligent decision can be made. These operators range from the simple arithmetic mean to fuzzy-oriented ones like minimum/maximum and t-norm/t- conorm (more details in [1]). In addition, Yager [17] intro- duced a parameterized mean-like aggregation operator, an ordered weighted aggregation (OWA) operator. Essentially, by selecting an appropriate weight vector, the OWA operators can reflect the uncertain nature of human judgment with the ability to generate an aggregating result lying between two extremes of minimum and maximum. The OWA operators have been applied in different areas [18] such as: information fusion [19], multi-criteria decision making [17] and fuzzy system modelling [20]. Furthermore, several variations of OWA operator are invented for linguistic-oriented environ- ment [8] [12], while others are applicable to a wide range of domains using different weight determining algorithms: maximum-entropy [11], Gaussian distribution [15] [21], re- cursive formulation [13], and weight learning [6]. Recently, Beliakov et al. [2] emphasized another variant utilizing the absorbent tuples to model situations in which certain decision makers may decide the outcome irrespective of the opinion of the others. A definite pre-caution, in combining multiple arguments, is the fact that unduly high or low values might be given by false or biased judgments. In such cases, a typical OWA oper- ator would suffer drastically from giving the highest priority to either the highest or the lowest value. To achieve more reliable outcome, Xu [15] [16] introduced the dependent Tossapon Boongoen and Qiang Shen are with the Department of Com- puter Science, Aberystwyth University, UK (phone: +44 1970 621787; email: {tsb,qqs}@aber.ac.uk). OWA (DOWA) operator, in which the normal-distribution of argument values is used to determine the weight vector. In particular, a high weight is given to the argument whose value is close to the center of all arguments (i.e. arithmetic mean), while lower weights are assigned to those further away. This centralized interpretation has also been adopted in the centered OWA operator [21], where weights are high around the middle and decayed symmetrically towards boundary ends. Despite their general applicability, these trustworthy weight generation methods possess an identical drawback, which originates from their underlying centralized assump- tion. Particularly, argument values are viewed as members of one large cluster (i.e. a global consensus of decision makers? opinion) and their arithmetic mean is considered sufficient to grade their reliability. This approach completely discards the significance of any possible trends emerging from local data structure that can possibly be a subset of values tightly clustered together. Regardless of the cluster density, low weights are to be assigned to its members if it does not situate near the center of the value range. In light of such shortcoming, this paper presents a new cluster-based DOWA operator (Clus-DOWA) whose weight determination is based on the distributed structural interpretation of values being aggregated. Those values very far from the group center (i.e. mean) are not assigned with low weights, if they are seem- ingly indifferent to their local neighbors. For this purpose, the basic technique of agglomerative hierarchical clustering [5] is applied to create the clustering structure of the studied values. In essence, the distance to the nearest cluster is employed to evaluate the reliability of each argument value and its assigned weight. The rest of this paper is organized as follows. Section II introduces the main theoretical concepts of the OWA and DOWA operators upon which the present research is developed. Section III presents the clustering-based DOWA operator including its complexity and a worked example. Applications of the DOWA and Clus-DOWA operators to classification and unsupervised feature selection tasks are detailed in the fourth section. Specifically, class-specific fuzzy sets built from weight vectors are used to determine the class to which an unknown instance belongs. In addition, weight vectors are also utilized to justify the reliability of attributes in order to reduce the size of a feature set. The paper is concluded in section V, with the perspective of further works. II. DEPENDENT OWA OPERATOR The process of information aggregation appears in many applications related to the development of decision support systems. Despite computationally simplistic, neither mini- mum nor maximum is appropriate for most applications. Accordingly, Yager [17] pioneered a new set of aggregation techniques called the ordered weighted averaging (OWA) operator. This mean-type operator provides a flexibility to utilize the entire range of and to or associated with a decision maker?s attitude towards aggregation. A. OWA Operator An OWA operator of dimension n is a mapping Rn ? R, which has an associated weighting vector w = (w1,w2,...,wn)T, where wj ? [0,1] and summationtextnj=1 wj = 1. A set of n input parameters, with the input vector (B1,B2,...,Bn), is aggregated as follows: OWA(B1,B2,...,Bn) = nsummationdisplay j=1 wjbj (1) where bj is the jth largest element in the vector (B1,B2,...,Bn) and b1 ? b2 ... ? bn. A wide range of OWA operators can be formulated between two extremes of minimum and maximum, through the degree of orness(?) as defined below: ? = 1n?1 nsummationdisplay i=1 wi(n?i) (2) This measure ranges from 0 to 1, and estimates the degree to which an OWA operator is similar to the logical connective OR (i.e. maximum) in terms of its aggregation behavior. When ? = 0, an OWA operator becomes a minimum type with the weight vector becomes (0,0,...,1). In contrast, when ? = 1, an OWA operator exhibits the maximum connective type with the weight vector of (1,0,...,0). Weight distributions at any given degree of orness can be differentiated with the dispersion measurement: disp(w) = ? nsummationdisplay i=1 wiln(wi) (3) where 0 ? disp(w) ? ln(n). This measure provides a degree to which information in arguments is used in the aggregation process. The value of dispersion is zero when the orness (?) is either zero or one. In such cases, only one argument with either the lowest or the highest will contribute solely to the ultimate outcome. In contrast, the dispersion is unity when the orness is 0.5, in which all arguments are considered equally important with assigned weight of 1/n. B. Dependent OWA Operator There are many techniques proposed for obtaining OWA weights. For instance, O?Hagan [11] used the maximal entropy as the primary criteria in formulating a set of weights at a given level of orness. Attempts also exist to generate a weight vector from a set of samples [6]. Weight determining approaches generally fall into two categories of argument-independent and argument-dependent. Weights derived by the former approach are not related to argument values. On the other hand, with an argument-dependent approach, weights are determined based on the value of input arguments. Centered OWA [21] and Dependent OWA [15] [16] operators are examples of argument-dependent approaches that particularly employ the centralized weight distribution. Arguments whose values are in the middle of the group, i.e. near to the group average, are more reliable and acquire higher weights comparing to those further away from the center. Specifically, the reliability of an argument reflects the appropriateness of using that argument as a group representative (i.e. aggregating outcome). The following set of equations give an overview of the weight determination process with the DOWA operator. Let (a1,a2,...,an) be the argument vector, and ? be the average value of this argument set, where ? = 1nsummationtextnj=1 aj. The similarity between any argument aj and the average value ? can be calculated as follows: s(aj,?) = 1? |aj ??|summationtextn i=1|ai ??| (4) From this, a weight vector w = (w1,w2,...,wn)T can be generated by applying the following: wj = s(aj,?)summationtextn i=1 s(ai,?) ,j = 1,2,... ,n (5) DOWA(a1,a2,...,an) = nsummationdisplay i=1 wiai (6) The DOWA operator becomes neat in such a way that one weight is provided for a specific argument, regardless of its order of magnitude in the argument vector. Hence, the reordering step, normally pursued for an OWA operator, is entirely irrelevant in this case. III. CLUSTER-BASED DOWA OPERATOR Similar to Centered OWA and DOWA operators, the proposed Clus-DOWA aims to decrease the effect of false or biased judgment in a group decision making. The former approaches utilize the centralized method in such a way that the argument set is viewed as one cluster, whose center is solely used to determine the weight vector. In contrary, the Clus-DOWA operator is based on the distributed clusters of arguments. Instead of interpreting a set of arguments as one large cluster, it is trivial to perceive that local clusters (i.e. local consensus) can appear within the global space. In particular, each local cluster represents agreement or consensus among arguments in close proximity. Intuitively, to evaluate the reliability of one argument is to discover the difference (in terms of distance) between that particular argument and its nearest local cluster. The magnitude of difference from surrounding neighbor dictates the difficulty that one argument can come to agreement with others. Con- ceptually, greater difference signifies greater difficulty and hence smaller reliability. Reliable arguments are those with small differences from their neighbors. Figure 1 graphically depicts this distributed approach, in which arguments (a1 and a2) very far from the global center are reliable provided that they are close to local clusters? center. Fig. 1. Centralized and distributed interpretations. A. Cluster-Based Algorithm With this distributed approach, the first task of measuring reliability of arguments is to determine their cluster structure. The agglomerative hierarchical clustering [5] is modified in such a way that its iterative merging process stops as soon as all arguments (ai,i = 1,...,n) have been merged to their nearest clusters. For each argument, the distance (di) behind such merging as well as the cluster center (Referencei) are recorded for the evaluation of its reliability. With the center- linkage distance measurement (i.e. using the average value of a group as the representative in a distance evaluation), this modified clustering algorithm can be summarized in Figure 2. Note that obviously, a singleton cluster contains only one member. Fig. 2. Modified agglomerative hierarchical clustering. Having applied this clustering algorithm to a set of values (a1,a2,...,an), the reliability of each value (ri) can be directly estimated from the distance to its nearest cluster (di) recorded during the clustering process. ri = 1? disummationtextn j=1 dj (7) Similar to Equation (5), the weight vector can then be cal- culated from the discovered vector of reliability measurement (r1,r2,...,rn) as follows: wi = risummationtextn j=1 rj ,i = 1,2,... ,n (8) Then, the Clus-DOWA operator can be defined as: Clus?DOWA(a1,a2,...,an) = nsummationdisplay j=1 wjaj (9) It is worth noting that with the clustering procedure, the Clus-DOWA operator is fairly computational expensive with the time and space complexity of O(n3) and O(n2), where n is the number of arguments. Both requirements are reduced to O(n) using the simple DOWA operator. Despite this disadvantage, weight vectors generated by the Clus-DOWA operator are truly data-oriented and reliable, which will be illustrated later in Section IV. B. Worked Example For illustrating purpose, both Clus-DOWA and DOWA operators are used to aggregate preference values of ten decision makers (a1,...,a10): 60, 62, 63, 66, 70, 75, 79, 85, 89, and 94. With the DOWA operator and the global average of 74.3, the following set of weights can be achieved using Equations (4) and (5): w1 = 0.0953,w2 = 0.0975,w3 = 0.0986,w4 = 0.1019,w5 = 0.1063,w6 = 0.1103,w7 = 0.1059,w8 = 0.0993,w9 = 0.0949 and w10 = 0.0894. With Equation (6), the aggregation outcome is: DOWA(a1,...,a10) =(60?0.0953) + (62?0.0975)+ (63?0.0986) + (66?0.1019)+ (70?0.1063) + (75?0.1103)+ (79?0.1059) + (85?0.0993)+ (89?0.0949) + (94?0.0894)+ DOWA(a1,...,a10) =74.1125 In order to find the result of this example with the Clus- DOWA operator, the clustering algorithm shown in Figure 2 is firstly applied to these preference values using the center- linkage distance measurement. Figure 3 presents the distance and nearest cluster?s center of each value after being clus- tered. According to Equations (7) and (8), the following set of weights can be estimated: w1 = 0.1033,w2 = 0.1080,w3 = 0.1080,w4 = 0.0985,w5 = 0.0985,w6 = 0.0985,w7 = 0.0985,w8 = 0.0985,w9 = 0.0985 and w10 = 0.0892. With Equation (9), the aggregation result is the summation of: Clus?DOWA(a1,...,a10) = (60?0.1033)+ (62?0.1080) + (63?0.1080) + (66?0.0985)+ (70?0.0985) + (75?0.0985) + (79?0.0985)+ (85?0.0985) + (89?0.0985) + (94?0.0892) Clus?DOWA(a1,...,a10) = 73.8263 It is obviously depicted in Figure 4 that weights given by the DOWA approach are high in the middle and decayed towards both ends. On the other hand, the weights estimated using the Clus-DOWA method reflect the closeness of each value to its neighbors, hence the difficulty to agree with others. Evidently, the weights given to the values 60, 62 and 63 are higher with the Clus-DOWA than the DOWA approach, due to the fact that they are members of the rigid local cluster and their distances to agreement (di) are small comparing to those of the others. In contrast, the value 94 is assigned with similarly lowest weights by both methods, simply because it is greatly different from neighbors with di = 7, the highest of the entire group. Note that those values in the middle (70, 75 and 79), which are normally assigned with high weights by the DOWA approach, do not necessarily receive the same treatment with the new method. Fig. 3. Results of center-linkage clustering. Fig. 4. Weight distribution with DOWA and Clus-DOWA operators. IV. APPLICATION STUDIES This section presents the experimental evaluation of ap- plying the DOWA and Clus-DOWA operators to the tasks of classification and unsupervised feature selection. Their performances are generalized with a collection of well- known datasets obtained from [3]. A. Classification This task is to find the correct class, from n possible classes (Cj,j = 1...n), for an unclassified instance that is characterized by m attributes (ai,i = 1...m). The DOWA operator can be used for the classification by firstly building fuzzy sets of classes in each attribute domain from training instances: ? Values of each attribute in training instances are divided into subsets each belonging to one class. ? The DOWA operator is then applied to these subsets to generate weight vectors. ? The resulting weight vectors are normalized into the range of [0,1] using (wi ? wmin)/(wmax ? wmin), where wi is any weight, wmax and wmin are the maximum and minimum weight values in one vector. Essentially, weight vectors are now interpreted as fuzzy sets representing properties of one specific attribute, see Figure 5 for examples. For each attribute value (aki ,i = 1...m) of a given unclassified instance k, it is possible to use the membership vectors similar to those shown in Figure 5(b) to linearly estimate its membership values belonging to each class, ?Cj(aki ),j = 1...n. The membership degree is: (i) zero if the attribute value is outside the range of values presented in training instances, (ii) ?Cj(ati) when t is a training instance and aki = ati, (iii) the linear approximation of membership values of two values agi and ahi from training instances that are the most similar to aki and agi < aki < ahi . ?Cj(aki ) = parenleftbigg|ah i ?aki| |agi ?ahi| ??Cj(a g i) parenrightbigg + parenleftbigg|ag i ?a ki| |agi ?ahi| ??Cj(a h i ) parenrightbigg (10) Then, the total membership value ?Cj of each class j can be calculated, as shown below. Ultimately, the given instance will be classified to the class with the highest total membership value. ?Cj = msummationdisplay i=1 ?Cj(aki ) (11) This classification process can be used with the Clus- DOWA operator as well, however with additional adjustment. Firstly, both weight and membership functions are similarly built using the Clus-DOWA operator. Then, class member- ship values ?Cj(aki ) of the given unclassified instance k are identically estimated for cases (i) and (ii), as previously discussed. Nonetheless, case (iii) and Equation (10) cannot be directly reused, since the weight and membership values are cluster oriented and unnecessarily decayed to both ends like those generated by the DOWA operator. In order to overcome this native barrier, it is necessary to find an attribute value afi presented in training instances that is the most similar to aki . The cluster center Referencefi , to which the attribute value afi is merged, can be used to calculate ?Cj(aki ) as follows: ?Cj(aki ) = a f i ?Reference f i aki ?Referencefi ??Cj(a f i ) (12) Equation (11) is employed here again to find the class with the highest total membership value, which will be the preferred class for the given instance k. Both classification methods are evaluated with the three benchmark datasets of UCI repository [1]: glass, iris and wine, respectively. As discussed earlier that the complexity of the Clus-DOWA algorithm is worse than the DOWA approach, the evaluation therefore concentrates solely on the accuracy criterion to further justify their quality. Table I presents average accuracies of classifying these dataset with DOWA and Clus-DOWA methods, using 10-fold cross val- idation. The performance of the Clus-DOWA approach is consistently superior to that of the other. This explicitly in- dicates that weights generated with the Clus-DOWA operator are more reliable, however with more expensive computation. TABLE I CLASSIFICATION ACCURACIES Dataset Average accuracies (percentage)DOWA Clus-DOWA glass 45.45 59.09 iris 85.71 90.47 wine 61.11 77.78 B. Unsupervised Feature Selection To further justify the reliability and also to reveal the usefulness of weights generated by these two operators, they will be used for the task of feature selection that aims to reduce a number of features (i.e. attributes) for more efficient data analysis. The benefits of this work includes: reducing the measurement and storage requirements, reducing training time, and defying the curse of dimensionality to improve pre- diction performance. Much of the work in feature selection has been concentrated at supervised category where invented methods rely on the class labels and their correlation with feature values. Guyon and Elisseeff [7] point out that the unsupervised feature selection techniques prove to be ex- tremely useful with real-world data analysis, in which neither class labels are available nor thorough data interpretation is feasible. Works in this category base their judgments on particular characteristics of data values such as entropy and density. Dash and Liu [4] specifically emphasize that the entropy is generally low for data containing tight clusters, and thus is a good criterion to determine feature relevance. Fig. 5. Representing classes in one attribute domain as (a) weight and (b) membership vectors, using DOWA operator. Intuitively, the dispersion measure, (see Equation (3)), of weight vectors generated by DOWA and Clus-DOWA oper- ators can be used as evaluating criteria. This is because the higher the dispersion value the more reliable data becomes, as reflected by the fact that data values are reliable when they are close to neighbors (i.e. forming a tight cluster) and their weights are rather indifferent. To estimate the reliability of each feature (ai,i = 1...m), DOWA and Clus-DOWA operators are applied to generate weight vectors (wDOWAi and wClus?DOWAi ) from its values. Then, the dispersion measurements of these weight vectors, disp(wDOWAi ) and disp(wClus?DOWAi ), can be found using Equation (3). Note that the most reliable feature has the highest dispersion value of ln(n), where n is the number of dataset instances. A simple heuristic-based algorithm, outlined in Figure 6, is used to deliver a subset of features whose dispersion measurement (i.e. reliability) is relatively competitive with that of the original feature set. To evaluate performances of both operators upon this task, the dispersion-oriented algorithm shown in Figure 6 is firstly applied to find reduced feature sets of glass, iris and wine dataset, respectively. Table II presents the size of these reduced feature sets, compared to those reported in [9] [10] using FRFS (Fuzzy-Rough Feature Selection) method. Note that FRFS is a supervised technique with state-of-the- art theory as well as outcomes. Then reduced feature sets are assessed with three different learning classifiers: J48, JRip and PART (from [14]). J48 generates decision trees by choosing the most informative features and recursively partitioning the data into subtables based on their values. Each node in the tree represents a feature with branches from a node representing the alter- native values this feature can take in accordance to the current subtable. Partitioning stops when all data items in the subtable have the same classification. A leaf node is then created, and this classification assigned. JRip learns propositional rules by repeatedly growing rules and pruning them. During the growth phase, features are added greedily until a termination condition is satisfied. Features are then pruned in the next phase subject to a pruning metric. Once the ruleset is generated, a further optimization is performed where classification rules are evaluated and deleted based on their performance on randomized data. PART generates rules by means of repeatedly creating partial decision trees from data. The algorithm adopts a divide-and-conquer strategy such that it removes instances covered by the current ruleset during processing. Essentially, a classification rule is created by building a pruned tree for the current set of instances; the leaf with the highest coverage is promoted to a rule. Table III summarizes the accuracies of reduced feature sets with the aforementioned classifiers, using 10-fold cross validation. It can be generalized that the reduced feature sets achieved with the Clus-DOWA approach consistently reach better accuracy figures than those generated by the DOWA method. Accordingly, these results signify the fact that a weight vector of the Clus-DOWA operator is more reliable than the other. Moreover, the reliability of the Clus-DOWA operator can be further emphasized with its accuracy results comparing to those of the FRFS method, see Table IV. The Clus-DOWA method performs better than FRFS with glass and wine dataset, while the opposite outcome occurs with the iris dataset. However, it is crucial to note that the FRFS method is unable to reduce the size of feature set of iris dataset, while almost half of the original size is diminished using the Clus-DOWA approach. TABLE II SIZE OF REDUCED FEATURE SETS Dataset Instance Feature Size of reduced setDOWA Clus-DOWA FRFS glass 214 10 8 7 9 iris 150 5 3 3 5 wine 178 14 7 9 10 V. CONCLUSIONS This paper has presented a new dependent Ordered Weighted Aggregation (OWA) operator whose weight vector Fig. 6. Heuristic-based feature selection algorithm. TABLE III CLASSIFICATION ACCURACIES OF REDUCED FEATURE SET Dataset Method Classifier accuracies (percentage)J48 JRip PART glass Unreduced 67.29 71.49 67.76 DOWA 69.16 67.29 66.82 Clus-DOWA 71.96 68.22 70.56 iris Unreduced 96.00 95.33 94.00 DOWA 72.67 74.00 73.33 Clus-DOWA 96.00 92.67 95.33 wine Unreduced 94.38 92.70 93.82 DOWA 89.89 85.39 93.26 Clus-DOWA 89.89 91.57 87.64 is tightly related to the structural characteristic of values being aggregated. Unlike the centralized assumption used in previous dependent OWA operators, the reliability of a value is not determined solely by its difference to the group average, but rather by its difference to neighbors. Values that are seemingly indifferent from others in a closed proximity are considered reliable and assigned with high weights. This distributed approach is able to better capture the underlying data characteristics and deliver trusty weights, which is experimentally illustrated through its superior performances comparing to the centralized method for both classification and unsupervised feature selection tasks. However, its appli- cability is to be further examined with other problem do- mains, especially decision making with multiple experts and criteria. Its complexity has to be improved by utilizing more efficient clustering techniques. In addition, the classification algorithm presented in this paper is to be further developed in such a way that its performance becomes competitive with well-known classification methods like decision tree and nearest neighbor. ACKNOWLEDGMENT This work is sponsored by the UK EPSRC grant no. EP/D057086. The authors are grateful to the members of the project team for their contribution, but will take full responsibility for the views expressed in this paper. TABLE IV ACCURACIES OF FRFS AND CLUS-DOWA METHODS Dataset Method Classifier accuracies (percentage)JRip PART glass FRFS 67.76 68.22 Clus-DOWA 68.22 70.56 iris FRFS 95.33 94.00 Clus-DOWA 92.67 95.33 wine FRFS 89.33 93.82 Clus-DOWA 91.57 87.64 REFERENCES [1] G. Beliakov, A. Pradera and T. Calvo, Aggregation Functions: A Guide for Practitioners, Springer: Heidelberg, Berlin, New York, 2007. [2] G. Beliakov, T. Calvo and A. Pradera, ?Absorbent tuples of aggregation operators,? Fuzzy Sets and Systems, Vol. 158, No. 15 pp. 1675-1691. 2007. [3] C. L. Blake and C. J. Merz. UCI Repository of machine learning databases. Irvine, University of California, 1998. http://www.ics.uci.edu/ mlearn/. [4] M. Dash and H. Liu, ?Unsupervised Feature Selection and Ranking,? New Trends in Knowledge Discovery for Business Information Systems, Kluwer Publishers, 2000. [5] M.B. Eisen, P.T. Spellman, P.O. Brown and D. Botstein, ?Cluster anal- ysis and display of genome-wide expression patterns,? In Proceedings of National Academy of Sciences USA, pp. 14863-14868. 1998. [6] D. P. Filev and R. R. Yager, ?On the issue of obtaining OWA operator weights,? Fuzzy Sets and Systems, Vol. 94, pp. 157-169. 1998. [7] I. Guyon and A. Elisseeff, ?An introduction to variable and feature selection,? Journal of Machine Learning Research, Vol. 3, pp. 1157- 1182. 2003. [8] F. Herrera, E. Herrera-Viedma and J.I. Verdegay, ?Direct Approach Processes in Group Decision Making Using Linguistic OWA Operators,? Fuzzy Sets and Systems, Vol. 79, pp. 175-190. 1996. [9] R. Jensen and Q. Shen, ?Fuzzy-rough attribute reduction with appli- cation to web categorization,? Fuzzy Sets and Systems, Vol. 141, pp. 469-485. 2004. [10] R. Jensen and Q. Shen, ?New approaches to fuzzy-rough feature selection,? To appear in IEEE Transactions on Fuzzy Systems. [11] M. O?Hagan, ?Aggregating template rule antecedents in real-time expert systems with fuzzy set logic,? In Proceedings of Annual IEEE Conference on Signals, Systems, and Computers, pp.681-689. 1988. [12] V. Torra, ?The Weighted OWA Operator,? International Journal of Intelligent Systems, Vol. 12, pp. 153-166. 1997. [13] L. Troiano and R. R. Yager, ?Recursive and Iterative OWA Operators,? International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 13, No. 6, pp. 579-599. 2005. [14] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann Publishers, San Francisco, 2000. [15] Z. S. Xu, ?An overview of methods for determining OWA weight,? International Journal of Intelligent Systems, Vol. 20, pp. 843-865, 2005. [16] Z. S. Xu, ?Dependent OWA operators,? In Proceedings of Modeling Decisions for Artificial Intelligence (MDAI2006), pp. 172-178. 2006. [17] R. R. Yager, ?Ordered weighted averaging aggregation operators in multi-criteria decision making,? IEEE Transactions of System, Man and Cybernetics, Vol. 18, pp. 183-190. 1988. [18] R. R. Yager and J. Kacprzyk, The ordered weighted averaging oper- ators: Theory and Applications, Kluwer Academic Publishers, Boston. 1997. [19] R. R. Yager, ?New Modes of OWA Information Fusion,? International Journal of Intelligence Systems, Vol. 13, pp. 661-681. 1998. [20] R. R. Yager, ?Including Importances in OWA Aggregations Using Fuzzy Systems Modeling,? IEEE Transactions of Fuzzy Systems, Vol. 6, No. 2. 1998. [21] R. R. Yager, ?Centered OWA operators,? Soft Computing, Vol. 11, pp. 631-639. 2007.