# Discriminant analysis

This module provides discriminant analysis for two or more groups (the latter is sometimes called Canonical Variates Analysis). The groups must be specified with a group column.

A scatter plot of specimens along the first two canonical axes produces maximal and second to maximal separation between all groups. The axes are linear combinations of the original variables as in PCA, and eigenvalues indicate amount of variation explained by these axes. If only two groups are given, a histogram is plotted instead.

Missing data supported by column average substitution.

#### Classifier

Classifies the data, assigning each point to the group that gives minimal Mahalanobis distance to the group mean. The Mahalanobis distance is calculated from the pooled within-group covariance matrix, giving a linear discriminant classifier. The given and estimated group assignments are listed for each point. In addition, group assignment is cross-validated by a leave-one-out cross-validation (jackknifing) procedure.

#### Mystery specimens

Rows with unknown group, i.e. ‘?’ In the group column, are not included in the discriminant analysis itself, but will be classified. In this way, it is possible to classify new specimens that are not part of the training set.

#### Confusion matrix

A table with the numbers of points in each given group (rows) that are assigned to the different groups (columns) by the classifier. Ideally each point should be assigned to its respective given group, giving a diagonal confusion matrix. Off-diagonal counts indicate the degree of failure of classification.