Software Manual
The software is realized as multi-windows application, which allows working with different datasets, algorithms and results simultaneously. The Main Window (panel) contains the Menu and indicates current working dataset.
To load a sample dataset, select File->Open from the Menu. Machaon CVE supports tab delimited text and XML-based files (see File Format section). Before loading the dataset the Parameter Window appers to select the Row Contain parameters (Objects or Features). To demonstrate the features of the program the leukemia dataset from Golub et. al. is used. After loading the dataset, the Data Set Window is displayed. The Data Set Window contains the expression Table and the Result Tree with list of all clustering and validation results obtained. The table depicts the gene expression values and, if clustering has been obtained, indicates the partitioning into the Custer Sets (the right column(s) in the Table). The are several datasets (ready to work in Machaon) to download:
Machaon reads tab-delimited text and XML-based files. A tab-delimited text format, described below. Such text files can be created and exported in any standard spreadsheet program, such as Microsoft Excel. All files should have the following format:
The "Number of rows" and "Number of columns" indicate the numerical values of rows and columns in the expression table. The terms Si , 1< i < Ns are the names or descriptions of the experimental samples, conditions, strains, or specimens (number of the samples in the dataset equals Ns); Gj , 1< j < Ng, are the names or descriptions of the gene names (number of the genes in the dataset equals Ng); NCk, , 1< k < Nnc are the names or descriptions of the natural classes (number of the natural classes in the dataset equals Nnc). The terms Vij represent the data values for the ith sample/experiment and the jth gene. The terms Cn , 1< n < Nc are the names of the clusters to which the sample/gene is referred (number of the clusters in the dataset equals Nc). Bold entries indicate necessary records. The program can read files, which already contain the number of clusters (datasets, which has already been clustered by other software tools). Thus, the user could apply the validation techniques to the data files, which are provided by other systems. Here is the examples originated from leukaemia data: Example 1 5 3 U22376 X59417 U05259 sample_12 ALL 551 846 2504 0 sample_25 ALL 1872 3878 5070 1 sample_34 AML 1126 782 711 1 sample_35 AML 880 490 654 0 sample_36 AML 473 1648 -14 1
Example 2 3 4 sample_12 sample_25 sample_34 sample_35 U22376 - 551 1872 1126 880 0 X59417 - 846 3878 782 490 1 U05259 - 2504 5070 711 654 0
The description of the XML-based format may be found here. Currently, two types of data transformation is presented Log Normalization and Row Normalization (normalizes intensities for a given table to be mean zero, variance 1 across all genes). The transformations are offered as a convenience to the user. To apply the data transformation to the current dataset, simply select the Menu item Transformation -> Row Normalization or Transformation -> Log Normalization. The Data Set Window with transformed current dataset will appear. Machaon obtains the clustering algorithms to the both, rows and columns, of the table. To start the Hierarchical Clustering calculation simply select the Menu item Clustering -> Hierarchical. The Parameter Window will appear to select the parameters such as:
To start the calculation process simply click Next. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of clustering is also indicated in the expression Table. To see the Hierarchical Dendrogram, mark (left mouse bottom) the desired hierarchical clustering in the Result Tree and select the menu item View->Dendrogram. To start the K-Means Clustering calculation simply select the Menu item Clustering -> K-Means. The Parameter Window will appear to select the parameters such as:
To start the calculation process simply click Next. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of clustering is also indicated in the expression Table. To start the K-Medoids Clustering calculation simply select the Menu item Clustering -> K-Medoids. The Parameter Window will appear to select the parameters such as:
To start the calculation process simply click Next. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of clustering is also indicated in the expression Table. To start the Weak Clustering calculation simply select the Menu item Clustering -> Weak Clustering. The Parameter Window will appear to select the parameters such as:
To start the calculation process simply click Next. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of clustering is also indicated in the expression Table. Machaon contains support for ensemble clustering, which involves combining a collection of multiple "base" clusterings to produce an improved partition of a data set. The ensemble clustering process involves two stages:
To start the Ensemble Clustering prodecure, simply select the Menu item Clustering -> Ensemble Clustering. The first Parameter Window will appear to select the basic ensemble parameters such as:
After selecting the basic parameters, a second Parameter Window will appear to select the parameters for the generation procedure. The list of parameters available depends on the generation method chosen previouly. To start the clustering process simply click Next. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of clustering is also indicated in the expression Table. To apply any validation technique, it is necessary to select the Cluster Set first and then choose the validation method from the Menu. To start the C-index calculation for the current Cluster Set, simply select the Menu item Validation -> C-index. The Parameter Window will appear to select the C-index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. Because of a high computational complexity the calculation of C-index for large datasets could be very time-consuming. How to interpret the results To start the Davis-Bouldin index calculation for the current Cluster Set, simply select the Menu item Validation -> Davis-Bouldin index. The Parameter Window will appear to select the Davis-Bouldin index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree.How to interpret the results To start the Dunns index calculation for the current Cluster Set, simply select the Menu item Validation -> Dunn index. The Parameter Window will appear to select the Dunns index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. How to interpret the results To start the Goodman-Kruskal index calculation for the current Cluster Set, simply select the Menu item Validation -> Goodman-Kruskal index. The Parameter Window will appear to select the Goodman-Kruskal index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. Because of a high computational complexity the calculation of Goodman-Kruskal index for large datasets could be very time-consuming. How to interpret the results To start the Silhouette index calculation for the current Cluster Set, simply select the Menu item Validation -> Silhouette. The Parameter Window will appear to select the Silhouette index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. How to interpret the results To start the Isolation index calculation for the current Cluster Set, simply select the Menu item Validation -> Isolation. The Parameter Window will appear to select the Isolation index parameters such as:
To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. How to interpret the results To start the Jaccard index calculation for the current Cluster Set, simply select the Menu item Validation -> Jaccard index. The Parameter Window will appear to indicate that there are no parameters are required for this procedure. To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree.How to interpret the results To start the Rand index calculation for the current Cluster Set, simply select the Menu item Validation -> Rand index. The Parameter Window will appear to indicate that there are no parameters are required for this procedure. To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. How to interpret the results To start the Class Accuracy calculation for the current Cluster Set, simply select the Menu item Validation -> Class Accuracy. The Parameter Window will appear to indicate that there are no parameters are required for this procedure. To calculate the index, clicks Validate. As soon as the calculation has been completed, a new entry is added to the Results Tree. The result of validation is attaches to clustering result in the tree. How to interpret the results |