A summary to my network motif research
This post contains some of my short articles and reading list about network motif and its application.
Network motif can be seen as a subgraph pattern of a complex network (Milo 2002). For example, in a citation network, if we consider three adjacent nodes, we can find the patterns where two nodes point to the other one or one nodes points to the other two quite frequently. In biology system studies, researchers care about such statistically significant motifs. Systems such as gene transcription network or ecological food web exhibits a significant motifs count (for some motifs) compared to a random network with the same number nodes and links (Milo 2002). The correlation (and whether it is causation) of a statistical significant motif and the fucntionality of the network is an active area of research. It’s worth to mention that a motif usually contains 3 to 5 nodes in most studies.
Being extensively studied in biology, network motif is assumed to be the result of a biological system’s functionality (ref Milo, Alon). Recently, network motifs are also considered into social network and graph-structured data analysis (ref Leskovec). With the same assumption as in previous researches, the community structure of a complex network (or graph-structured data) is assumed to be the result of some underlying mesoscopic scale interaction between nodes. Personally, I am very excited about this approach. As mentioned in (Milo 2002), the motifs can be the building blocks for complex systems and understanding network motifs may show us insightful information about the complex systems.
mesoscopic? While microscopic analysis considers node properties and pair-wise interations and macroscopic gives us the overall quantitative view, mesoscopic analysis studies the network structures on an intermediate scale (subgraphs, motifs interactions). (Barabasi 2016)
With the motif scheme in mind, the very first thing we want to know is given a motif (one of the figure below), how can we tell this motif is significant?. One approach to this problem is to compare the motif count between the given network and a random network of the same order (same number of nodes). I wonder which type of motif statistic or random network is the best for comparision…
z-score measures the different between the number of a motif type found in the network
we need to analyse and the mean number of that motif in random networks of the
same order (i.e. number of nodes and edges). The tool I used for graph motif analysis
is a Python package called
graph-tool. Since motif analysis is a demanding task,
running undirected-size-4 motif z-score on Blogcatalog3 (~300k edges) takes almost 2
weeks on my lab machine (single thread). However, there is a trick to force
to use multiple-cores processing mentioned here.
The selection of the graph random-rewire algorithm is also an open question in
complex network research. In my work, I settle with the configuration model for
random graph generation. However, other random graph models such as block model could
be better for a certain type of motif. Professor Barabasi also mentioned about
this matter in his slide (Barabasi 2016).
graph-tool also provides the implementations of
some popular random graph rewiring functions.