High-throughput experiments, such as with DNA microarrays, typically result in a list of hundreds of genes
deemed potentially relevant to the process under study. A growing number of methodologies is being developed to
efficiently extract and use information for these large numbers of genes.
Anni is an innovative approach to find functional relations between genes and other biomedical concepts from free text literature. For each gene a profile of
related concepts is constructed that summarizes the context in which the gene is mentioned in literature.
An advantage of these concept profiles is that they can easily be compared and patterns of similarity can be found efficiently, for
instance with clustering approaches. An important issue is the selection of the measure to weigh the association of a
concept in a profile. It is a challenge to distinguish between a concept that co-occurs with the concept of interest
because of chance and a concept that has a semantic relationship. With this in mind we adopted a method based on
likelihood ratios, which has been successfully used for the identification of interesting collocations. This method
does not require the data to have a normal distribution and is known to yield good results even on small samples. With
Anni genes with similar functions are identified by hierarchical clustering.
For a cluster Anni provides a coherence measure together with a complete annotation of the underlying overlap of the concept profiles, a p-value to illustrate
how exceptional the cluster is and a link-out to the literature behind concept associations.
For more information:
Visit the Anni website of the Biosemantics Group Rotterdam here.