AlmaTextMiner's Help


Choose the species used in the DNA array experiment from the available options. More organisims will be available soon.

File of Clusters

Introduce the complete path and name of the file containing the different clusters generated by the clustering process. Each cluster is defined in a row of the file, with the different genes separated by tabs. The first element of the row should be the name of the cluster. No spaces are allowed as part of the names. Any line starting with '#' will be ignored.


# file of clusters generated in July, 2001.
# Organism: E. coli. Experiment X4ERY
NODE1 gene1 gene2 gene3 gene4
NODE2 gene8 gene9 gene10  
NODE3 gene21 gene5 gene33 gene46 gene95

Newick File

Instead of suplying to AlmaTextMiner a plain text file with the composition of the clusters (as described above), it is posible to take one of the output files of the Sota Server and use it to specify the clusters and the genes that forms each of them.

Quantification Table

Here is the location of the table of quantifications from the DNA array experiment. It should be a tab delimited text file, containing the intensity ratios of each gene in each experimental condition. All these ratios must be in logarithmic form, in order to make a simetrical scale. The first column is interpreted as the name of the gene. These names should be unambiguous biological symbols of the gene or its accession number in GenBank. Empty values (low quality data) can be indicated by words (NULL, ND...) or even with no values at all. The first row must contain short descriptions of the experimental conditions. Any line starting with '#' will be ignored.


# header line
# another header line
#GENE Condition1vsControl Condition2vsControl Condition3vsControl Condition4vsControl
gene1 -3.1 0.3 ND 1.0
gene2 -1.8 2.3 -0.2 3.2
gene3 4.1 ND 4.2 1.1
gene4 1.2 6.1 0.3 -2.9

Project Name

Introduce a name in order to identify your analysis. All the results willl be stored in a directory with this name.

Maximum number of clusters per page

This is the number of clusters to show in a page of results.

Minimum number of units of information per cluster

This is the minimum number of units of information needed to include a given cluster in the statistical analysis. Clusters with less units will not be treated in order to avoid erroneous results.

Units of information

If abstracts is selected, the system will associate to a cluster of genes all complete abstracts that include the name of any gene from the cluster.
If sentences is selected, the system will associate to a cluster of genes all complete sentences that include the name of any gene of the cluster.

Ignore words that just appear once in the whole corpus

If this option is chossen, AlmaTextMiner won't consider in its analysis that words that appears just once in the whole corpus.

Numbers of words to show

This is the number of relevant words to show in the results page, ordered by their Z scores.

Do Analysis offline

A big analysis can need several minutes to be finished. If you do not want to be conected all the time, you can check this option and introduce your e-mail in the box. You will receive an e-mail when the analysis is finished, indicating the URL where you could see the results.

Download some examples

Here are some input files to test the system.

Back to the main form