2013年6月24日 星期一

GraphLite project

I am going to start a new project. I call it GraphLite project. GraphLite project aims to provide lite weight data structure and algorithms with focus on tasks connected with graph theory. It will be a part of the MCLinker project.

GraphLite comes out from the demand of garbage collection in the MCLinker project. Garbage collection is a linker optimization to remove the unused fragments and symbols in a software module. For example, if you write a function `foo` in your program and you never use it, then linker can remove the redundant function at link-time. Garbage collection is a kind of solution to the reachability problem in graph theory. From linkers perspective, the modules is a kind of graph whose nodes are fragments, and edges are relocation entries. Linkers traverse the graph and remove all nodes that are unable to visit.

I did some study. Here are some great graph implementations: Boost Graph library, LEMON, and OGDF. I personally like LEMON most due to its clear architecture. LEMON separate the graph theory problems into three loose-coupled components: graph data structures, maps and algorithms. Graph data structures provides pure structures. That means the data structures do not contain associated data (node labels, weight etc.). The associated data is stored in `map`s. Take max/min flow problem for example. Arcs do not contain weight information; Developers must use ArcMap to attach weight information to all arcs.

There are two reasons drive me the develop a new graph library. First, MCLinker must run well even on limited mobile devices. That constraint drives me to implement a smaller graph library. Second, garbage collection process requires adjacency matrix of a graph, and LEMON does not provide it.

Garbage Collection Process

Garbage collection process has four steps. First, linkers transform modules into a fragment-reference graph. Then, build up the adjacency matrix of the graph. Adjacency matrix is a kind of boolean matrix. Every element in the matrix represents the connectivity of two nodes. If there is an edge between two nodes, then the corresponding element should be `1`. The third step is straightforward. Linkers self-multiply the adjacency matrix over and over until the result of multiplication doesn't change. After the self-multiplication, `0` means there is no path between these two nodes. That is, if the program is started from one fragment, then the program never execute the paired fragments whose value of the corresponding element of the self-multiplied adjacency matrix is `0`. Final step, linkers remove all `0` paired fragments of all fragments who represent the entrances of the module.


沒有留言:

張貼留言