ohnosequences!

era7 bioinformatics R&D group

Playing with Gephi, Bio4j and Go

It had already been some time without having some fun with Gephi so today I told myself: why not trying visualizing the whole Gene Ontology and seeing what happens?

First of all I had to generate the corresponding file in gexf format containing all the terms and relationships belonging to the ontology. For that I did a small program (GenerateGexfGo.java) which uses Bio4j for terms/relationships info retrieval and a couple of XML Gexf wrapper classes from the github project BioinfoXML.

Once I had my gexf file I tried opening it (~17 MB) with gephi in my laptop with no success, (gephi froze forever when trying to import the file). Then, after a quick search on google I figured out that the amount of memory used by Gephi was really easy to change, (just open the file ‘etc/gephi07beta.conf’ and change the -Xmx value).

With my file already imported, first I applied the algorithm OpenOrd (which is the best one for large graphs) and then once it had an acceptable distribution I finally applied some iterations of the algorithm Fruchterman Reingold for a better visualization. And this is what I got:

Colors correspondance:

  • Green: Cellular component
  • Blue: Molecular function
  • Orange: Biological process

UPDATE: zoomable independent ontology visualizations using gephi SeaDragon plugin.

Molecular function

Cellular component

Biological process

Here you can download the gexf file in case you want to experiment a bit with it.