Helpful Abstraction using Minimum Spanning Trees for Expression Relations (Plus)

HAMSTER builds a set of minimum spanning trees (MSTs) from the experiments of a microarray data set formatted in GEO SOFT format. Each node in a MST is an experiment and edges between nodes indicate level of dissimilarity through their lengths. This overview looks at two images produced through the tutorial (Available by clicking on Documentation in the left menu.). The MSTs that make up this example are available here.

The data set GDS596 is entitled "Large-scale analysis of the human transcriptome (HG-U133A)". This data set contains 158 experiments that form two complete sets of replicates. The aim of the data set is to examine the expression level of 79 physiologically normal tissues.

We select the first set of replicates and instead of using all 22,283 probes, we focus our attention on the 2,002 probes that are assigned to the Gene Ontology categories:

  • Extracellular region (GO:0005576)
  • Extracellular region part (GO:0044421)

Applying Euclidean distanceas our dissimilarity measure produces the image below:

Color code:
red = cerebral
blue = epithelial
yellow = immune
green = muscle

A coloring scheme has been automatically assigned since all 79 of these experiments have been manually curated beforehand. The text is small, but the overall view is of primary concern (other available options supported by HAMSTER+ include generating an SVG image or a PNG in a higher resolution). The figure shows that the cerebral and immune tissues are closely correlated to each other, while epithelial and muscle are not in this data set.

The most similar experiments in the original data set are recursively merged and the minimum spanning tree algorithm repeated. At the 52nd MST, the following image results:

In this image, most nodes remain colored, indicating that they do not consist of any composite nodes of different colors (manually curated labels). If we expand on the node 51 in the center, we can see that most of the experiments contained in it are immune or cerebral tissue samples:

Experiments:
GSM18865epithelial
GSM18867immune
GSM18869immune
GSM18871immune
GSM18873immune
GSM18875immune
GSM18877immune
GSM18879immune
GSM18881immune
GSM18883immune
GSM18885immune
GSM18887immune
GSM18889immune
GSM18891immune
GSM18893immune
GSM18895immune
GSM18897immune
GSM18899immune
GSM18901immune
GSM18903immune
GSM18907immune
GSM18911cerebral
GSM18913cerebral
GSM18915cerebral
GSM18917cerebral
GSM18919cerebral
GSM18921cerebral
GSM18923cerebral
GSM18925cerebral
GSM18927cerebral
GSM18929cerebral
GSM18931cerebral
GSM18933cerebral
GSM18935cerebral
GSM18937cerebral
GSM18939cerebral
GSM18941cerebral
GSM18943cerebral
GSM18945cerebral
GSM18959epithelial
GSM18973epithelial
GSM18997epithelial
GSM18999epithelial
GSM19001epithelial
GSM19013muscle
GSM19015epithelial
GSM19017muscle

As a final observation, in both images, note that experiments that are similar to many other experiments are located in the center of the MST and those that are similar with only one experiment are along the outside of the MST.

To determine the steps required to generate this output, select Documentation in the left menu and then click on the Tutorial link.


Last update: April 29, 2010
Copyright (C) National Institute of Advanced Science and Technology (AIST), Computational Biology Research Center (CBRC) and Bioinformatics Center, Kyoto University. All Rights Reserved.