Datasets used for classification: comparison of results


Computational Intelligence Laboratory | Department of Informatics | Nicolaus Copernicus University

Links on: AI and Machine Learning | AI in Information Retrieval | Cognitive science | Computational Intelligence | Neuroscience | Software & Databases | Science & Fringes | Comparison of classfication results | Logical rules extracted from data |

Before using any new dataset it should be described here!
Results from the Statlog project are here.
Logical rules derived for data are here.

Medical:
Appendicitis |
Breast cancer (Wisconsin) |
Breast Cancer (Ljubljana) |
Diabetes (Pima Indian) |
Heart disease (Cleveland) |
Heart disease (Statlog version) |
Hepatitis |
Hypothyroid |
Hepatobiliary disorders |

Other datasets:
Ionosphere |
Satellite image dataset (Statlog version) |
Sonar |
Telugu Vovel |
Vovel |
Wine |
Other data: Glass, DNA |

More results for Statlog datasets.



A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.
Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.
Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.
Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).
Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal.
Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.
Our results are obtained usually with the GhostMiner package, developed in our group.
Some publications with results are on my page.
TuneIT, Testing Machine Learning & Data Mining Algorithms - Automated Tests, Repeatable Experiments, Meaningful Results.
Results of hand-written signs and numbers classification are here.


Appendicitis.

106 vectors, 8 attributes, two classes (85 acute a. +21 other, or 80.2+19.8%), data from Shalom Weiss;
Results obtained with the leave-one-out test, % of accuracy given
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA

Method
Accuracy %
Reference
PVM (logical rules)
89.6
Weiss, Kapouleas
C-MLP2LN (logical rules)
89.6±?
our
k-NN, stand. Manhatan, k=8,9,22-25
k=4,5, stand. Euclid, f2+f4 removed
88.7
our (WD/KG)
9-NN, stand. Euclides
87.7
our (KG)
RIAC (prob. inductive)
86.9
Hamilton et.al
1-NN, stand. Euclides, f2+f4 rem
86.8
our (WD/KG)
MLP+backpropagation
85.8
Weiss, Kapouleas
CART, C4.5 (dec. trees)
84.9
Weiss, Kapouleas
FSM
84.9
our (RA)
Bayes rule (statistical)
83.0
Weiss, Kapouleas
For 90% accuracy and p=0.95 confidence level 2-tailed bounds are: [82.8%,94.4%]
S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
C-MLP2LN (logical rules) only estimated l-o-o since the rules are like PVM.
3 crisp logical rules, overall 91.5% accuracy
Results for 10-fold stratified crossvalidation

Method
Accuracy %
Reference

NBC+WX+G(WX)
??.5±7.7
TM-GM
NBC+G(WX)
??.2±6.7
TM-GM
kNN auto+G(WX) Eukl
??.2±6.7
TM-GM
C-MLP2LN
89.6
our logical rules
20-NN, stand. Eukl f 4,1,7
89.3±8.6
our (KG); feature sel. from CV on the whole data set
SSV beam leaves
88.7±8.5
WD
SVM linear C=1
88.1±8.6
WD
6-NN, stand. Eukl.
88.0±7.9
WD
SSV default
87.8±8.7
WD
SSV beam pruning
86.9±9.8
WD
kNN, k=auto, Eucl
86.7±6.6
WD
FSM, a=0.9, Gauss, cluster
86.1±8.8
WD-GM
NBC
85.9±10.2
TM-GM
VSS 1 neuron, 4 it
84.9±7.4
WD/MK
SVM Gauss C=32, s=0.1
84.4±8.2
WD
MLP+BP (Tooldiag)
83.9
Rafał Adamczak
RBF (Tooldiag)
80.2
Rafał Adamczak
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).

Wisconsin breast cancer.

From UCI repository, 699 cases, 9 attributes, two classes, 458 (65.5%) & 241 (34.5%).
Results obtained with the leave-one-out test, % of accuracy given.

F6 has 16 missing values, removing these vectors leaves 683 examples.

Method
Accuracy %
Reference
FSM
98.3
our (RA)
3-NN stand Manhatan
97.1
our (KG)
21-NN stand. Euclidean
96.9
our (KG)
C4.5 (decision tree)
96.0
Hamilton et.al
RIAC (prob. inductive)
95.0
Hamilton et.al
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
Results obtained with the 10-fold crossvalidation, 16 vectors with F6 values missing removed, 683 samples left, % of accuracy given.

method
Accuracy %
Reference
Naive MFT
97.1
Opper, Winther, L-1-O est. 97.3
SVM Gauss, C=1,s=0.1
97.0±2.3
WD-GM
SVM (10xCV)
96.9
Opper, Winther
SVM lin, opt C
96.9±2.2
WD-GM, same with Minkovsky kernel
Cluster means, 2 prototypes
96.5±2.2
MB
Default, majority
65.5
--
Results obtained with the 10-fold crossvalidation, % of accuracy given, all data, missing vlues handled in different ways.

method
Accuracy %
Reference
NB + kernel est
97.5±1.8
WD, WEKA, 10X10CV
SVM (5xCV)
97.2
Bennet and Blue
kNN with DVDM distance
97.1
our (KG)
GM k-NN, k=3, raw, Manh
97.0±2.1
WD, 10X10CV
GM k-NN, k=opt, raw, Manh
97.0±1.7
WD, 10CV only
VSS, 8 it/2 neurons
96.9±1.8
WD/MK; 98.1% train
FSM-Feature Space Mapping
96.9±1.4
RA/WD, a=.99 Gaussian
Fisher linear discr. anal
96.8
Ster, Dobnikar
MLP+BP
96.7
Ster, Dobnikar
MLP+BP (Tooldiag)
96.6
Rafał Adamczak
LVQ
96.6
Ster, Dobnikar
kNN, Euclidean/Manhattan f.
96.6
Ster, Dobnikar
SNB, semi-naive Bayes (pairwise dependent)
96.6
Ster, Dobnikar
SVM lin, opt C
96.4±1.2
WD-GM, 16 missing with -10
VSS, 8 it/1 neuron!
96.4±2.0
WD/MK, train 98.0%
GM IncNet
96.4±2.1
NJ/WD; FKF, max. 3 neurons
NB - naive Bayes (completly independent)
96.4
Ster, Dobnikar
SSV opt nodes, 3CV int
96.3±2.2
WD/GM; training 96.6±0.5
IB1
96.3±1.9
Zarndt
DB-CART (decision tree)
96.2
Shang, Breiman
GM SSV Tree, opt nodes BFS
96.0±2.9
WD/KG (beam search 94.0)
LDA - linear discriminant analysis
96.0
Ster, Dobnikar
OC1 DT (5xCV)
95.9
Bennet and Blue
RBF (Tooldiag)
95.9
Rafał Adamczak
GTO DT (5xCV)
95.7
Bennet and Blue
ASI - Assistant I tree
95.6
Ster, Dobnikar
MLP+BP (Weka)
95.4±0.2
TW/WD
OCN2
95.2±2.1
Zarndt
IB3
95.0±4.0
Zarndt
MML tree
94.8±1.8
Zarndt
ASR - Assistant R (RELIEF criterion) tree
94.7
Ster, Dobnikar
C4.5 tree
94.7±2.0
Zarndt
LFC, Lookahead Feature Constr binary tree
94.4
Ster, Dobnikar
CART tree
94.4±2.4
Zarndt
ID3
94.3±2.6
Zarndt
C4.5 (5xCV)
93.4
Bennet and Blue
C 4.5 rules
86.7±5.9
Zarndt
Default, majority
65.5
--
QDA - quadratic discr anal
34.5
Ster, Dobnikar
For 97% accuracy and p=0.95 confidence level 2-tailed bounds are: [95.5%,98.0%]
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
N. Shang, L. Breiman, ICONIP'96, p.133
B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995


Breast Cancer (Ljubljana data)

From UCI repository (restricted): 286 instances, 201 no-recurrence-events (70.3%), 85 recurrence-events (29.7%);
9 attributes, between 2-13 values each, 9 missing values
Results - 10xCV? Sometimes methodology was unclear;
difficult, noisy data, some methods are below the base rate (70.3%).

||
||
For 78% accuracy and p=0.95 confidence level 2-tailed bounds are: [72.9%,82.4%]

  • Assistant-86 achieved 78 %, but this seems to be best result that happens in some crossvalidations, not the average.
  • Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko & N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press.
  • Blanchard, G., Schafer,C., Rozenholc,Y., &Muller,K.-R. (2007) Optimal dyadic decision trees. Machine Learning 66: 709-717.
  • Clark,P. & Niblett,T. (1987). Induction in Noisy Domains. In: Progress in Machine Learning (from the Proceedings of the 2nd European Working Session on Learning), 11-30, Bled, Yugoslavia: Sigma Press.
  • Porter R.B., G. Beate Zimmer, Don R. Hush: Stack Filter Classifiers. ISMM 2009: 282-294
  • Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. In Proceedings of the Fifth National Conference on Artificial Intelligence, 1041-1045, Philadelphia, PA: Morgan Kaufmann.
  • Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning, 121-134, Ann Arbor, MI.
  • F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995
  • S.M. Weiss, I. Kapouleas. An empirical comparison of pattern recognition, neural nets and machine learning classification methods, in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990

They used leave-one-out tests and obtained:
MLP+backprop: 75.7% train, 71.5% test;
Bayes 75.9% train, 71.8% test,
CART & PVM 77.4% train, 77.1% test;
k-NN 65.3 test

Hepatitis.

From UCI repository, 155 vectors, 19 attributes,
Two classes, die with 32 (20.6%), live with 123 (79.4%).
Many missing values! F18 has 67 missing values, F15 has 29, F17 has 16 and other features between 0 and 11.
Results obtained with the leave-one-out test, % of accuracy given
Method
Accuracy, % test
Reference
C-MLP2LN/SSV single rule
76.2±0.0
WD/K. Grabczewski, stable rule
SSV Tree rule
75.7±1.1
WD, av. from 10x10CV
MML Tree
75.3±7.8
Zarndt
SVM Gauss, C=1, s =0.1
73.8±4.3
WD, GM
MLP+backprop
73.5±9.4
Zarndt
SVM Gauss, C, s opt
72.4±5.1
WD, GM
IB1
71.8±7.5
Zarndt
CART
71.4±5.0
Zarndt
ODT trees
71.3±4.2
Blanchard
SVM lin, C=opt
71.0±4.7
WD, GM
UCN 2
70.7±7.8
Zarndt
SFC, Stack filters
70.6±4.2
Porter
Default, majority
70.3±0.0
==
SVM lin, C=1
70.0±5.6
WD, GM
C 4.5 rules
69.7±7.2
Zarndt
Bayes rule
69.3±10.0
Zarndt
C 4.5
69.2±4.9
Blanchard
Weighted networks
68-73.5
Tan, Eshelman
IB3
67.9±7.7
Zarndt
ID3 rules
66.2±8.5
Zarndt
AQ15
66-72
Michalski e.a.
Inductive
65-72
Clark, Niblett
Method
Accuracy %
Reference
21-NN, stand Manhattan
90.3
our (KG)
FSM
90.0
our (RA)
14-NN, stand. Euclid
89.0
our (KG)
LDA
86.4
Weiss & K
CART (decision tree)
82.7
Weiss & K
MLP+backprop
82.1
Weiss & K
MLP, CART, LDA results from (check it ?) S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990.
Other results - our own;
Results obtained with the 10-fold crossvalidation, % of accuracy given; our results with stratified crossvalidation, other results - who knows? Differences for this dataset are rather small, 0.1-0.2%.

Method
Accuracy %
Reference
Weighted 9-NN
92.9±?
Karol Grudziński
18-NN, stand. Manhattan
90.2±0.7
Karol Grudziński
FSM with rotations
89.7±?
Rafał Adamczak
15-NN, stand. Euclidean
89.0±0.5
Karol Grudziński
VSS 4 neurons, 5 it
86.5±8.8
WD/MK, train 97.1
FSM without rotations
88.5
Rafał Adamczak
LDA, linear discriminant analysis
86.4
Stern & Dobnikar
Naive Bayes and Semi-NB
86.3
Stern & Dobnikar
IncNet
86.0
Norbert Jankowski
QDA, quadratic discriminant analysis
85.8
Stern & Dobnikar
1-NN
85.3±5.4
Stern & Dobnikar, std added by WD
VSS 2 neurons, 5 it
85.1±7.4
WD/MK, train 95.0
ASR
85.0
Stern & Dobnikar
Fisher discriminant analysis
84.5
Stern & Dobnikar
LVQ
83.2
Stern & Dobnikar
CART (decision tree)
82.7
Stern & Dobnikar
MLP with BP
82.1
Stern & Dobnikar
ASI
82.0
Stern & Dobnikar
LFC
81.9
Stern & Dobnikar
RBF (Tooldiag)
79.0
Rafał Adamczak
MLP+BP (Tooldiag)
77.4
Rafał Adamczak
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Our good results reflect superior handling of missing values ?
Duch W, Grudziński K (1998) A framework for similarity-based methods. Second Polish Conference on Theory and Applications of Artificial Intelligence, Lodz, 28-30 Sept. 1998, pp. 33-60
Weighted kNN: Duch W, Grudzinski K and Diercksen G.H.F (1998) Minimal distance neural methods. World Congress of Computational Intelligence, May 1998, Anchorage, Alaska, IJCNN'98 Proceedings, pp. 1299-1304


Statlog version of Cleveland Heart disease.

13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).
Attribute Information:

1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholestorol
in mg/dl
6. fasting blood sugar 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored
by flouroscopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
Attributes types: Real: 1,4,5,8,10,12; Ordered:11, Binary: 2,6,9 Nominal:7,3,13
Classes: Absence (1) or presence (2) of heart disease;
In Statlog experiments on heart data cost or risk matrix has been used with 9-fold crossvalidation, only cost values are given.
Results below are obtained with the 10-fold crossvalidation, % of accuracy given, no risk matrix

Method
Accuracy %
Reference
Lin SVM 2D QCP
85.9±5.5
MG, 10xCV
kNN auto+WX
??.8±5.6
TM GM 10xCV
SVM Gauss+WX+G(WX), C=1 s=2-5
??.8±6.4
TM GM 10xCV
SVM lin, C=0.01
84.9±7.9
WD, GM 10x(9xCV)
SFM, G(WX), default C=1
??±5.1
TM, GM 10xCV
Naive-Bayes
84.5±6.3
TM, GM 10xCV
Naive-Bayes
83.6
RA, WEKA
SVML default C=1
82.5±6.4
TM, GM 10xCV
K*
76.7
WEKA, RA
IB1c
74.0
WEKA, RA
1R
71.4
WEKA, RA
T2
68.1
WEKA, RA
MLP+BP
65.6
ToolDiag, RA
FOIL
64.0
WEKA, RA
RBF
60.0
ToolDiag, RA
InductH
58.5
WEKA, RA
Base rate (majority classifier)
55.7

IB1-4
50.0
ToolDiag, RA
Results for Heart and other Statlog datasest are collected here.


Cleveland heart disease.

From UCI repository, 303 cases, 13 attributes (4 cont, 9 nominal), 7 vectors with missing values ?
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 1-4.
Results obtained with the leave-one-out test, % of accuracy given, 2 classes used.

Method
Accuracy %
Reference
LDA
84.5
Weiss ?
25-NN, stand, Euclid
83.6±0.5
WD/KG repeat??
C-MLP2LN
82.5
RA, estimated?
FSM
82.2
Rafał Adamczak
MLP+backprop
81.3
Weiss ?
CART
80.8
Weiss ?
MLP, CART, LDA where are these results from ???
Other results - our own.
Results obtained with the 10-fold crossvalidation, % of accuracy given.
Ster & Dobnikar reject 6 vectors (leaving 297) with missing values.
We use all 303 vectors replacing missing values by means for their class; in KNN we have used Stalog convention, 297 vectors

Method
Accuracy %
Reference
IncNet+transformations
90.0
Norbert Jankowski; check again!
28-NN, stand, Euclid, 7 features
85.1±0.5
WD/KG
LDA
84.5
Ster & Dobnikar
Fisher discriminant analysis
84.2
Ster & Dobnikar
k=7, Euclid, std
84.2±6.6
WD, GhostMiner
16-NN, stand, Euclid
84±0.6
WD/KG
FSM, 82.4-84% on test only
84.0
Rafał Adamczak
k=1:10, Manhattan, std
83.8±5.3
WD, GhostMiner
Naive Bayes
82.5-83.4
Rafał; Ster, Dobnikar
SNB
83.1
Ster & Dobnikar
LVQ
82.9
Ster & Dobnikar
GTO DT (5xCV)
82.5
Bennet and Blue
kNN, k=19, Eculidean
82.1±0.8
Karol Grudziński
k=7, Manhattan, std
81.8±10.0
WD, GhostMiner
SVM (5xCV)
81.5
Bennet and Blue
kNN (k=1? raw data?)
81.5
Ster & Dobnikar
MLP+BP (standarized)
81.3
Ster, Dobnikar, Rafał Adamczak
Cluster means, 2 prototypes
80.8±6.4
MB
CART
80.8
Ster & Dobnikar
RBF (Tooldiag, standarized)
79.1
Rafał Adamczak
Gaussian EM, 60 units
78.6
Stensmo & Sejnowski
ASR
78.4
Ster & Dobnikar
C4.5 (5xCV)
77.8
Bennet and Blue
IB1c (WEKA)
77.6
Rafał Adamczak
QDA
75.4
Ster & Dobnikar
LFC
75.1
Ster & Dobnikar
ASI
74.4
Ster & Dobnikar
K* (WEKA)
74.2
Rafał Adamczak
OC1 DT (5xCV)
71.7
Bennet and Blue
1 R (WEKA)
71.0
Rafał Adamczak
T2 (WEKA)
69.0
Rafał Adamczak
FOIL (WEKA)
66.4
Rafał Adamczak
InductH (WEKA)
61.3
Rafał Adamczak
Default, majority
54.1

baserate

C4.5 rules
53.8±5.9
Zarndt
IB1-4 (WEKA)
46.2
Rafał Adamczak
For 85% accuracy and p=0.95 confidence level 2-tailed bounds are: [80.5%,88.6%]
Results obtained with BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In: A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Magnus Stensmo and Terrence J. Sejnowski, A Mixture Model System for Medical and Machine Diagnosis, Advances in Neural Information Processing Systems 7 (1995) 1077-1084

Kristin P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Other results for this dataset (methodology sometimes uncertain):
D. Wettschereck, averaging 25 runs with 70% train and 30% test, variants of k-NN with different metric functions and scaling.
David Aha & Dennis Kibler - From UCI repository past usage

Method
Accuracy %
Reference
k-NN, Value Distance Metric (VDM)
82.6
D. Wettschereck
k-NN, Euclidean
82.4±0.8
D. Wettschereck
k-NN, Variable Similarity Metric
82.4
D. Wettschereck
k-NN, Modified VDM
83.1
D. Wettschereck
Other k-NN variants
< 82.4
D. Wettschereck
k-NN, Mutual Information
81.8
D. Wettschereck
CLASSIT (hierarchical clustering)
78.9
Gennari, Langley, Fisher
NTgrowth (instance-based)
77.0
Aha & Kibler
C4
74.8
Aha & Kibler
Naive Bayes
82.8±1.3
Friedman et.al, 5xCV, 296 vectors
Gennari, J.H., Langley, P, Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61.
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163

Diabetes.

From the UCI repository, dataset "Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) negative (class1), and 268 (34.9%) positive tests for diabetes. class2.
All patients were females at least 21 years old of Pima Indian heritage.
Attributes used:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
Results obtained with the 10-fold crossvalidation, % of accuracy given; Statlog results are with 12-fold crossvalidation

Method
Accuracy %
Reference
Logdisc
77.7
Statlog
IncNet
77.6
Norbert Jankowski
DIPOL92
77.6
Statlog
Linear Discr. Anal.
77.5-77.2
Statlog; Ster & Dobnikar
SVM, linear, C=0.01
77.5±4.2
WD-GM, 10XCV averaged 10x
SVM, Gauss, C, sigma opt
77.4±4.3
WD-GM, 10XCV averaged 10x
SMART
76.8
Statlog
GTO DT (5xCV)
76.8
Bennet and Blue
kNN, k=23, Manh, raw, W
76.7±4.0
WD-GM, feature weighting 3CV
kNN, k=1:25, Manh, raw
76.6±3.4
WD-GM, most cases k=23
ASI
76.6
Ster & Dobnikar
Fisher discr. analysis
76.5
Ster & Dobnikar
MLP+BP
76.4
Ster & Dobnikar
MLP+BP
75.8±6.2
Zarndt
LVQ
75.8
Ster & Dobnikar
LFC
75.8
Ster & Dobnikar
RBF
75.7
Statlog
NB
75.5-73.8
Ster & Dobnikar; Statlog
kNN, k=22, Manh
75.5
Karol Grudziński
MML
75.5±6.3
Zarndt
SNB
75.4
Ster & Dobnikar
BP
75.2
Statlog
SSV DT
75.0±3.6
WD-GM, SSV BS, node 5CV MC
kNN, k=18, Euclid, raw
74.8±4.8
WD-GM
CART DT
74.7±5.4
Zarndt
CART DT
74.5
Stalog
DB-CART
74.4
Shang & Breiman
ASR
74.3
Ster & Dobnikar
ODT, dyadic trees
74.0±2.3
Blanchard
Cluster means, 2 prototypes
73.7±3.7
MB
SSV DT
73.7±4.7
WD-GM, SSV BS, node 10CV strat
SFC, stacking filters
73.3±1.9
Porter
C4.5 DT
73.0
Stalog
C4.5 DT
72.7±6.6
Zarndt
Bayes
72.2±6.9
Zarndt
C4.5 (5xCV)
72.0
Bennet and Blue
CART
72.8
Ster & Dobnikar
Kohonen
72.7
Statlog
C4.5 DT
72.1±2.6
Blanchard (averaged over 100 runs)
kNN
71.9
Ster & Dobnikar
ID3
71.7±6.6
Zarndt
IB3
71.7±5.0
Zarndt
IB1
70.4±6.2
Zarndt
kNN, k=1, Euclides, raw
69.4±4.4
WD-GM
kNN
67.6
Statlog
C4.5 rules
67.0±2.9
Zarndt
OCN2
65.1±1.1
Zarndt
Default, majority
65.1

QDA
59.5
Ster, Dobnikar
For 77.7% accuracy and p=0.95 confidence level 2-tailed bounds are: [74.6%,80.5%]
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Other results (with different tests):

Method
Accuracy %
Reference
SVM (5xCV)
77.6
Bennet and Blue
C4.5
76.0±0.9
Friedman, 5xCV
Semi-Naive Bayes
76.0±0.8
Friedman, 5xCV
Naive Bayes
74.5±0.9
Friedman, 5xCV
Default, majority
65.1

Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163
Opper/Winther use 200 training and 332 test examples (following Rippley), with TAP MFT results on test 81%, SVS at 80.1% and best NN as 77.4%.


Hypothyroid.

Thyroid, From UCI repository, dataset "ann-train.data": A Thyroid database suited for training ANNs.
3772 learning and 3428 testing examples; primary hypothyroid, compensated hypothyroid, normal.
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes
The problem is to determine whether a patient referred to the clinic has hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid. A good classifier must be significant better than 92%.
Note: These are the datas Quinlans used in the case study of his article "Simplifying Decision Trees" (International Journal of Man-Machine Studies (1987) 221-234)
Names: I (W.D.) have investigated this issue and after some mail exchange with Chris Mertz, who maintains the UCI repository; here is the conclusion:

1 age: continuous
2 sex: {M, F}
3 on thyroxine: logical
4 maybe on thyroxine: logical
5 on antithyroid medication: logical
6 sick - patient reports malaise: logical
7 pregnant: logical
8 thyroid surgery: logical
9 I131 treatment: logical
10 test hypothyroid: logical
11 test hyperthyroid: logical
12 on lithium: logical
13 has goitre: logical
14 has tumor: logical
15 hypopituitary: logical
16 psychological symptoms: logical
17 TSH: continuous
18 T3: continuous
19 TT4: continuous
20 T4U: continuous
21 FTI: continuous

Results:


Method
% training
% test
Reference
C-MLP2LN rules+ASA
99.90
99.36
Rafał/Krzysztof/Grzegorz
CART
99.80
99.36
Weiss
PVM
99.80
99.33
Weiss
SSV beam search
99.80
99.33
WD
IncNet
99.68
99.24
Norbert
SSV opt leaves or pruning
99.7
99.1
WD
MLP init+ a,b opt.
99.5
99.1
Rafał
C-MLP2LN rules
99.7
99.0
Rafał/Krzysztof
Cascade correlation
100.0
98.5
Schiffmann
Local adapt. rates
99.6
98.5
Schiffmann
BP+genetic opt.
99.4
98.4
Schiffmann
Quickprop
99.6
98.3
Schiffmann
RPROP
99.6
98.0
Schiffmann
3-NN, Euclides, with 3 features
98.7
97.9
W.D./Karol
1-NN, Euclides, with 3 features
98.4
97.7
W.D./Karol
Best backpropagation
99.1
97.6
Schiffmann
1-NN, Euclides, 8 features used
--
97.3
Karol/W.D.
SVM Gauss, C=8 s=0.1
98.3
96.1
WD
Bayesian classif.
97.0
96.1
Weiss?
SVM Gauss, C=1 s=0.1
95.4
94.7
WD
BP+conj. gradient
94.6
93.8
Schiffmann
1-NN Manhattan, std data

93.8
Karol G./WD
SVM lin, C=1
94.1
93.3
WD
SVM Gauss, C=8 s=5
100
92.8
WD
Default, majority 250 test errors

92.7

1-NN Manhattan, raw data

92.2
Karol G./WD
For 99.90% accuracy on training and p=0.95 confidence level 2-tailed bounds are: [99.74%,99.96%]
Most NN results from W. Schiffmann, M. Joost, R. Werner, 1993; MLP2LN and Init+a,b ours.
k-NN, PVM and CART from S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
SVM with linear and Gaussian kernels gives quite poor results on this data.
3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4 give 99.3% of accuracy on the test set.


Hepatobiliary disorders

Contains medical records of 536 patients admitted to a university-affiliated Tokyo-based hospital, with four types of hepatobiliary disorders: alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis. The records included results of 9 biochemical tests and sex of the patient. The same 163 cases as in [Hayashi et.al] were used as the test data.
FSM gives about 60 Gaussian or triangular membership functions achieving accuracy of 75.5-75.8%. Rotation of these functions (i.e. introducing linear combination of inputs to the rules) does not improve this accuracy. 10-fold crossvalidation tests on the mixed, training plus test data, give similar results. The best results were obtained with the K* method based on algorithmic complexity optimization, giving 78.5% on the test set, and kNN with Manhattan distance function, k=1 and selection of features (using the leave-one-out method on the training data, features 2, 5, 6 and 9 were removed), giving 80.4% accuracy. Simulated annealing optimization of the scaling factors for the remaining 5 features give 81.0% and optimizing scaling factors using all input features 82.8%. The scaling factors are: 0.92, 0.60, 0.91, 0.92, 0.07, 0.41, 0.55, 0.86, 0.30. Similar accuracy is obtained using multisimplex method for optimization of the scaling factors.

Method
Training set
Test set
Reference
IB2-IB4
81.2-85.5
43.6-44.6
WEKA, our calculation
Naive Bayes
--
46.6
WEKA, our calculation
1R (rules)
58.4
50.3
WEKA, our calculation
T2 (rules from decision tree)
67.5
53.3
WEKA, our calculation
FOIL (inductive logic)
99
60.1
WEKA, our calculation
FSM, initial 49 crisp logical rules
83.5
63.2
FSM, our calculation
LDA (statistical)
68.4
65.0
our calculation
DLVQ (38 nodes)
100
66.0
our calculation
C4.5 decision rules
64.5
66.3
our calculation
Best fuzzy MLP model
75.5
66.3
Mitra et. al
MLP with RPROP

68.0
our calculation
Cascade Correlation

71.0
our calculation
Fuzzy neural network
100
75.5
Hayashi
C4.5 decision tree
94.4
75.5
our calculation
FSM, Gaussian functions
93
75.6
our calculation
FSM, 60 triangular functions
93
75.8
our calculation
IB1c (instance-based)
--
76.7
WEKA, our calculation
kNN, k=1, Camberra, raw
76.1
80.4
WD/SBL
K* method
--
78.5
WEKA, our calculation
1-NN, 4 features removed, Manhattan
76.9
80.4
our calculation, KG
1-NN, Camberra, raw, removed f2, 6, 8, 9
77.2
83.4
our calculation, KG
Y. Hayashi, A. Imura, K. Yoshida, “Fuzzy neural expert system and its appli-cation to medical diagnosis”, in: 8th International Congress on Cybernetics and Systems, New York City 1990, pp. 54-61
S. Mitra, R. De, S. Pal, “Knowledge based fuzzy MLP for classification and rule generation”, IEEE Transactions on Neural Networks 8, 1338-1350, 1997, a knowledge-based fuzzy MLP system gives results on the test set in the range from 33% to 66.3%, depending on the actual fuzzy model used.
W. Duch and K. Grudzinski, ``Prototype Based Rules - New Way to Understand the Data,'' Int. Joint Conference on Neural Networks, Washington D.C., pp. 1858-1863, 2001. Contains best results with 1-NN, Camberra and feature selection, 83.4% on the test.

Other, non-medical data


Landsat Satellite image dataset (STATLOG version)

Training 4435 test 2000 cases, 36 semi-continous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighbourhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).
The StatLog database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number.

||
||
Method
% training
% test
Time train
Time test
MLP+SCG
96.0
91.0
reg alfa=0.5, 36 hidden nodes, 1400 it
fast; WD
k-NN
--
90.9
auto-k=3, Manhattan, std data
GM 2.0
k-NN
91.1
90.6
2105, Statlog
944; parametry?
k-NN
--
90.4
auto-k=5, Euclidean, std data
GM 2.0
k-NN
--
90.0
k=1, Manhattan, std data, no training
fast, GM 2.0
FSM
95.1
89.7
std data, a=0.95
fast, GM 2.0; best NN result
LVQ
95.2
89.5
1273
44
k-NN
--
89.4
k=1, Euclidean, std data, no training
fast, GM 2.0
Dipol92
94.9
88.9
746
111
MLP+SCG
94.4
88.5
5000 it; active learning+reg a=0.5, 8-12 hidden
fast; WD
SVM
91.6
88.4
std data, Gaussian kernel
fast, GM 2.0; unclassified 4.3%
Radial
88.9
87.9
564
74
Alloc80
96.4
86.8
63840
28757
IndCart
97.7
86.2
2109
9
CART
92.1
86.2
330
14
MLP+BP
88.8
86.1
72495
53
Bayesian Tree
98.0
85.3
248
10
C4.5
96.0
85.0
434
1
New ID
93.3
85.0
226
53
QuaDisc
89.4
84.5
157
53
SSV
90.9
84.3
default par.
very fast, GM 2.0
Cascade
88.8
83.7
7180
1
Log DA, Disc
88.1
83.7
4414
41
LDA, Discrim
85.1
82.9
68
12
Kohonen
89.9
82.1
12627
129
Bayes
69.2
71.3
75
17
The original database was generated from Landsat Multi-Spectral Scanner image data. The sample database was generated taking a small section (82 rows and 100 columns) from the original data. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.
The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel. In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighbourhood straddles a boundary.
All results from Statlog book, except GM - GhostMiner calculations, W. Duch.

N
Description
Train
Test
1
red soil
1072 (24.17%)
461 (23.05%)
2
cotton crop
479 (10.80%)
224 (11.20%)
3
grey soil
961 (21.67%)
397 (19.85%)
4
damp grey soil
415 (09.36%)
211 (10.55%)
5
veg. Stubble
470 (10.60%)
237 (11.85%)
6
Mixture class
0
0
7
very damp grey soil
1038 (23.40%)
470 (23.50%)
Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project book!

Ionosphere

351 data records, with class division 224 (63.8%) + 126 (35.9%). Usually first 200 vectors are taken for training, and last 151 for the test, but this is very unbalanced: in the training set 101 (50.5%) and 99 (49.5%) are from 1/2 class, in the test set 123 (82%) and 27 (18%) are from class 1/2.
34 attributes, but f2=0 always and should be removed; f1 is binary, the remaining 32 attributes are continuous.
2 classes - different types of radar signals reflected from ionoshpere.
Some vectors: 8, 18, 20, 22, 24, 30, 38, 52, 76, 78, 80, 82, 103, 163, 169, 171, 183, 187, 189, 191, 201, 215, 219, 221, 223, 225, 227, 229, 231, 233, 249, are either binary 0, 1 or have only 3 values -1, 0, +1.
For example, vector 169 has only one component = 1, all others are 0.

Method
Accuracy %
Reference
3-NN + simplex
98.7
Our own weighted kNN
VSS 2 epochs
96.7
MLP with numerical gradient
3-NN
96.7
KG, GM with or without weights
IB3
96.7
Aha, 5 errors on test
1-NN, Manhattan
96.0
GM kNN (our)
MLP+BP
96.0
Sigillito
SVM Gaussian
94.9±2.6
GM (our), defaults, similar for C=1-100
C4.5
94.9
Hamilton
3-NN Canberra
94.7
GM kNN (our)
RIAC
94.6
Hamilton
C4 (no windowing)
94.0
Aha
C4.5
93.7
Bennet and Blue
SVM
93.2
Bennet and Blue
Non-lin perceptron
92.0
Sigillito
FSM + rotation
92.8
our
1-NN, Euclidean
92.1
Aha, GM kNN (our)
DB-CART
91.3
Shang, Breiman
Linear perceptron
90.7
Sigillito
OC1 DT
89.5
Bennet and Blue
CART
88.9
Shang, Breiman
SVM linear
87.1±3.9
GM (our), defaults
GTO DT
86.0
Bennet and Blue
Perceptron+MLP results:
Sigillito, V. G., Wing, S. P., Hutton, L. V., & Baker, K. B. (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest, 10, 262-266.
N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: k-NN+C4+IB3, from Aha, D. W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794-799). Detroit, MI: Morgan Kaufmann.
IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Training/test division is not too good in this case, distributions are a bit differnet.
In 10xCV results are:

Method
Accuracy %
Reference
SFM+G+G(WX)
??±2.6
GM (our), C=1, s=2-5
kNN auto+WX+G(WX)
??.4±3.6
GM (our)
SVM Gaussian
94.6±4.3
GM (our), C=1, s=2-5
VSS-MKNN
91.5±4.3
MK, 12 neurons (similar 8-17)
SVM lin
89.5±3.8
GM (our), C=1, s=2-5
SSV tree
87.8±4.5
GM (our), default
1-NN
85.8±4.9
GM std, Euclid
3-NN
84.0±5.4
GM std, Euclid
VSS is an MLP with search, implemented by Mirek Kordos, used with 3 epochs; neurons may be sigmoidal or step-wise (64 values).
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).


Sonar: Mines vs Rocks

208 cases, 60 continuous attributes, 2 classes, 111 metal, 97 rock.
From the CMU benchmark repository
This dataset has been used in two kinds of experiments:
1. The "aspect-angle independent" experiments use all 208 cases with 13-fold crossvalidation, averaged over 10 runs to get std.
2. The "angle independent experiments" use training / test sets with 104 vectors each. Class distribution in training is 49 + 55, in test 62 + 42.
Estimation of L1O on the whole dataset (Opper and Winther) give 78.2% only; is the test so easy? Some of this results were made without standardization of the data, which is here very important!
The "angle independent experiments" with training / test sets.

Method
Train %
Test %
Reference
1-NN, 5D from MDS, Euclid, std

97.1
our, GM (WD)
1-NN, Manhattan std

97.1
our, GM (WD)
1-NN, Euclid std

96.2
our, GM (WD)
TAP MFT Bayesian
--
92.3
Opper, Winther
Naive MFT Bayesian
--
90.4
Opper, Winther
SVM
--
90.4
Opper, Winther
MLP+BP, 12 hidden, best MLP
--
90.4
Gorman, Sejnowski
1-NN, Manhattan raw

92.3
our, GM (WD)
1-NN, Euclid raw

91.3
our, GM (WD)
FSM - methodology ?

83.6
our (RA)
The "angle dependent experiments" with 13 CV on all data.

1-NN Euclid on 5D MDS input

87.5±0.8
our GM (WD)
1-NN Euclidean, std data

86.8±1.2
our GM (WD)
1-NN Manhattan, std data

86.3±0.3
our GM (WD)
MLP+BP, 12 hidden
99.8±0.1
84.7±5.7
Gorman, Sejnowski
1-NN Manhattan, raw data

84.5±0.4
our GM (WD)
MLP+BP, 24 hidden
99.8±0.1
84.5±5.7
Gorman, Sejnowski
MLP+BP, 6 hidden
99.7±0.2
83.5±5.6
Gorman, Sejnowski
SVM linear, C=0.1

82.7±8.5
our GM (WD), std data
1-NN Euclidean, raw data

82.1±0.9
our GM (WD)
SVM Gauss, C=1, s=0.1

77.4±10.1
our GM (WD), std data
SVM linear, C=1

76.9±11.9
our GM (WD), raw data
SVM linear, C=1

76.0±9.8
our GM (WD), std data




DB-CART, 10xCV

81.8
Shang, Breiman
CART, 10xCV

67.9
Shang, Breiman
M. Opper and O. Winther, Gaussian Processes and SVM: Mean Field Results and Leave-One-Out. In: Advances in Large Margin Classifiers, Eds. A. J. Smola, P. Bartlett, B. Schölkopf, D. Schuurmans, MIT Press, 311-326, 2000; same methodology as Gorman with Sejnowski.
N. Shang, L. Breiman, ICONIP'96, p.133, 10xCV
Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets", Neural Networks 1, pp. 75-89, 13xCV
Our results: kNN results from 10xCV and from 13xCV are quite similar, so Shang and Breiman should not differ much from 13 CV.
WD Leave-one-out (L1O) estimations on std data:
L1O with k=1, Euclidean distance, for all data gives 87.50%, other k and distance function do not give significant improvement.
SVM linear, C=1, L1O 75.0%, for Gaussian kernel, C=1, L1O is 78.8%
Other L1O results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Discriminant Adaptive NN, DANN

92.3
Adaptive metric NN

90.9
kNN

87.5
SVM Gauss C=1

78.8
C4.5

76.9
SVM linear C=1

75.0


Vovel

528 training, 462 test cases, 10 continous attributes, 11 classes
From the UCI benchmark repository.
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
Results on the total set
Method
Train
Test
Reference
CART-DB, 10xCV on total set !!!

90.0
Shang, Breiman
CART, 10xCV on total set

78.2
Shang, Breiman

Method
Train
Test
Reference
Square node network, 88 units

54.8
UCI
Gaussian node network, 528 units

54.6
UCI
1-NN, Euclides, raw
99.24
56.3
WD/KG
Radial Basis Function, 528 units

53.5
UCI
Gaussian node network, 88 units

53.5
UCI
FSM Gauss, 10CV na treningowym
92.60
51.94
our (RA)
Square node network, 22

51.1
UCI
Multi-layer perceptron, 88 hidden

50.6
UCI
Modified Kanerva Model, 528 units

50.0
UCI
Radial Basis Function, 88 units

47.6
UCI
Single-layer perceptron, 88 hidden

33.3
UCI
N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.


Telugu Vovel

871 patterns, 6 overlapping vowel classes (Indian Telugu vowel sounds), 3 features (formant frequencies).

Method
Test
Reference
10xCV tests below


3-NN, Manhattan
87.8±4.0
Kosice
3-NN, Canberra
87.8±4.2
WD/GM
FSM, 65 Gaussian nodes
87.4±4.5
Kosice
3-NN, Euclid
87.3±3.9
WD/GM
SSV dec. tree, 22 rules
86.0±??
Kosice
SVM Gauss opt C~1000, s~1
85.0±4.0
WD, Ghostminer
SVM Gauss C=1000, s=1
83.5±4.1
WD, Ghostminer
SVM, Gauss, C=1, s=0.1
76.6±2.5
WD, Ghostminer
2xCV tests below


3-NN, Euclidean
86.1±0.6
Kosice
FSM, 40 Gaussian nodes
85.2±1.2
Kosice
MLP
84.6
Pal
Fuzzy MLP
84.2
Pal
SSV dec. tree, beam search
83.3±0.9
Kosice
SSV dec. tree, best first
83.0±1.0
Kosice
Bayes Classifier
79.2
Pal
Fuzzy SOM
73.5
Pal
Parameters in SVM were optimized, that is in each CV different paramters were used, so only approximate value can be quoted. If they are fixed to C=1000, s=1 results are a bit worse.
Papers using this data:

  • S. K. Pal and D. Dutta Majumder, ``Fuzzy sets and decision making approaches in vowel and speaker recognition'', IEEE Transactions on Systems, Man, and Cybernetics, Vol. 7, pp. 625-629, 1977.
  • S. Mitra, M. Banerjee and S. K. Pal, Rough knowledge-based network, fuzziness and classification, Neural Computing & Applications 7, 17-25, 1998.
  • Duch W and Hayashi Y, Computational intelligence methods and data understanding. In: Quo Vadis computational Intelligence? New trends and approaches in computational intelligence. Eds. P. Sincak, J. Vascak, Springer studies in fuzziness and soft computing, Vol. 54 (2000), pp. 256-270.
  • Chaoshun Li, Jianzhong Zhou, Qingqing Li and Xiuqiao Xiang, A Fuzzy Cluster Algorithm Based on Mutative Scale Chaos Optimization, LNCS 5264, 259-267, 2008.


Wine data

Source: UCI, described in Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
Class distribution: 178 cases = [59, 71, 48] in Class 1-3;
13 continuous attributes: alcohol, malic-acid, ash, alkalinity, magnesium, phenols, flavanoids, nonanthocyanins, proanthocyanins, color, hue, OD280/D315, proline.

Method
Test
Reference
Leave-one-out test results


RDA
100
[1]
QDA
99.4
[1]
LDA
98.9
[1]
kNN, Manhattan, k=1
98.7
GM-WD, std data
1NN
96.1
[1] z-transformed data
kNN, Euclidean, k=1
95.5
GM-WD, std data
kNN, Chebyshev, k=1
93.3
GM-WD, std data
10xCV tests below


kNN, Manhattan, auto k=1-10
98.9±2.3
GM-WD, 2D data, after MDS/PCA
IncNet, 10CV, def, Gauss
98.9±2.4
GM-WD, std data, up to 3 neurons
10 CV SSV, opt prune
98.3±2.7
GM-WD, 2D data, after MDS/PCA
10 CV SSV, node count 7
98.3±2.7
GM-WD, 2D data, after MDS/PCA
kNN, Euclidean, k=1
97.8±2.8
GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, k=1
97.8±2.9
GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, auto k=1-10
97.8±3.9
GM-WD
kNN, Euclidean, k=3, weighted features
97.8±4.7
GM-WD
IncNet, 10CV, def, bicentral
97.2±2.9
GM-WD, std data, up to 3 neurons
kNN, Euclidean, auto k=1-10
97.2±4.0
GM-WD
10 CV SSV, opt node
97.2±5.4
GM-WD, 2D data, after MDS/PCA
FSM a=.99, def
96.1±3.7
GM-WD, 2D data, after MDS/PCA
FSM 10CV, Gauss, a=.999
96.1±4.7
GM-WD, std data, 8-11 neurons
FSM 10CV, triang, a=.99
96.1±5.9
GM-WD, raw data
kNN, Euclidean, k=1
95.5±4.4
GM-WD
10 CV SSV, opt node, BFS
92.8±3.7
GM-WD
10 CV SSV, opt node, BS
91.6±6.5
GM-WD
10 CV SSV, opt prune, BFS
90.4±6.1
GM-WD
UCI past usage:
[1] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Technometrics).
[2] S. Aeberhard, D. Coomans and O. de Vel, "The classification performance of RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Journal of Chemometrics).


Other Data


Glass identification

Shang, Breiman CART 71.4% accuracy, DB-CART 70.6%.
Leave-one-out results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Adaptive metric NN

75.2
Discriminant Adaptive NN, DANN

72.9
kNN

72.0
C4.5

68.2


DNA-Primate splice-junction gene sequences, with associated imperfect domain theory.

Stalog Data: splice junctions are points on a DNA sequence at which `superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out).
This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a "acceptors'' while EI borders are referred to as "donors''.)
Number of Instances: 3190. Class distribution:
Class
Train
Test
1
464 (23.20%)
303 (25.55%)
2
485 (24.25%)
280 (23.61%)
3
1051 (52.55%)
603 (50.84%)
All
2000 (100%)
1186 (100%)
Number of attributes: originally 60 attributes {a,c,t,g}, usually converted to 180 binary indicator variables {(0,0,0), (0,0,1), (0,1,0), (1,0,0)}, or 240 binary variables.
Much better performance is generally observed if attributes closest to the junction are used (middle). In the StatLog version (180 variables), this means using attributes A61 to A120 only.

Method
% in training
% on test
Time train
Time test
RBF, 720 nodes
98.5
95.9


k-NN GM, p(X|C), k=6, Euclid, raw
96.8
95.5
0
short
Dipol92
99.3
95.2
213
10
Alloc80
93.7
94.3
14394
--
QuaDisc
100.0
94.1
1581
809
LDA, Discrim
96.6
94.1
929
31
FSM, 8 Gaussians, 180 binary
95.4
94.0


Log DA, Disc
99.2
93.9
5057
76
SSV Tree, p(X|C), opt node, 4CV
94.8
93.4
short
short
Naive Bayes
94.8
93.2
52
15
Castle, middle 90 binary var
93.9
92.8
397
225
IndCart, 180 binary
96.0
92.7
523
516
C4.5, on 60 features
96.0
92.4
9
2
CART, middle 90 binary var
92.5
91.5
615
9
MLP+BP
98.6
91.2
4094
9
Bayesian Tree
99.9
90.5
82
11
CN2
99.8
90.5
869
74
New ID
100.0
90.0
698
1
Ac2
100.0
90.0
12378
87
Smart
96.6
88.5
79676
16
Cal5
89.6
86.9
1616
8
Itrule
86.9
86.5
2212
6
k-NN
91.1
85.4
2428
882
Kohonen
89.6
66.1
-
-
Default, majority
52.5
50.8


kNN GM - GhostMiner version of kNN (our group)
SSV Decision Tree - our results


Włodzisław Duch, last modification 26.08.2012