GenMiner: Non-Redundant Association Rules Mining from Genomic Data

Ricardo Martinez, Nicolas Pasquier and Claude Pasquier
Submitted to Bioinformatics

Supplementary materials

GenMiner program

The genminer program can be downloaded here.

Data pre-processing

Gene expression measures are those used by Eisen et al (Eisen et al., 1998). This dataset is discretized using the NorDi algorithm at a 95% confidence level.

Gene annotations were collected from the following sources:

Available Data files

Files Descriptions
Eisen dataset Expression ratios of 2465 Yeast genes under 79 biological conditions.
Microarray Experiments Description of the 79 experiments.
Cutoffs Under-expressed and over-expressed cutoff thresholds computed by NorDi.
Discretized expression measures Discretization of expression measures performed by Nordi.
Data mining context Data matrix of 2465 lines (genes) and 737 columns (discretized expression levels and annotations). Each line contains expression profiles over the 79 biological conditions (values discretized by NorDi) and at most 658 gene annotations (24 GOSlim terms, 14 pathways, 25 transcriptional regulators, 14 phenotypes and 581 pubmed IDs).
Equivalence classes Frequent closed itemsets and their generators extracted by Close with a minsupport of 0.005. Each class if represented by a line of the form
[Generator] [Closed itemset] n
where 'n' is the number of items in the class
Exact associations rules All exact association rules displayed in the form
[antecedent] => [consequent] supp=s conf=c
where 's' and c are the support and the confidence of the rule respectively
Approximate associations rules All approximate association rules, with a confidence greater or equals to 0.5, displayed in the form
[antecedent] -> [consequent] supp=s conf=c
where 's' and c are the support and the confidence of the rule respectively