2013年6月24日 星期一

GraphLite project

I am going to start a new project. I call it GraphLite project. GraphLite project aims to provide lite weight data structure and algorithms with focus on tasks connected with graph theory. It will be a part of the MCLinker project.

GraphLite comes out from the demand of garbage collection in the MCLinker project. Garbage collection is a linker optimization to remove the unused fragments and symbols in a software module. For example, if you write a function `foo` in your program and you never use it, then linker can remove the redundant function at link-time. Garbage collection is a kind of solution to the reachability problem in graph theory. From linkers perspective, the modules is a kind of graph whose nodes are fragments, and edges are relocation entries. Linkers traverse the graph and remove all nodes that are unable to visit.

I did some study. Here are some great graph implementations: Boost Graph library, LEMON, and OGDF. I personally like LEMON most due to its clear architecture. LEMON separate the graph theory problems into three loose-coupled components: graph data structures, maps and algorithms. Graph data structures provides pure structures. That means the data structures do not contain associated data (node labels, weight etc.). The associated data is stored in `map`s. Take max/min flow problem for example. Arcs do not contain weight information; Developers must use ArcMap to attach weight information to all arcs.

There are two reasons drive me the develop a new graph library. First, MCLinker must run well even on limited mobile devices. That constraint drives me to implement a smaller graph library. Second, garbage collection process requires adjacency matrix of a graph, and LEMON does not provide it.

Garbage Collection Process

Garbage collection process has four steps. First, linkers transform modules into a fragment-reference graph. Then, build up the adjacency matrix of the graph. Adjacency matrix is a kind of boolean matrix. Every element in the matrix represents the connectivity of two nodes. If there is an edge between two nodes, then the corresponding element should be `1`. The third step is straightforward. Linkers self-multiply the adjacency matrix over and over until the result of multiplication doesn't change. After the self-multiplication, `0` means there is no path between these two nodes. That is, if the program is started from one fragment, then the program never execute the paired fragments whose value of the corresponding element of the self-multiplied adjacency matrix is `0`. Final step, linkers remove all `0` paired fragments of all fragments who represent the entrances of the module.


2013年6月20日 星期四

mind set

Materialization is just smoke and mirrors. Partition, Allocation and Scheduling are essential.

2013年6月13日 星期四

An old article: "A Machine-Independent Linker"

Fraser and Hanson gives a mathematical definition as a termination condition of symbol resolution process. In this paper, they classify symbols as follows.
  • R - a set of symbols referenced in object codes
  • D - a set of symbols whose values are defined in some object code
  • S - a set  of symbols that identify segments
An object file for which

R is a subset of the union of D and S

is said to be resolved.

The definition means that every undefined symbol has a corresponding defined symbol.

In MCLinker, we treat symbol resolution in the other way. MCLinker IR is a reduced graph of the use-define chain in typical compilers. An undefined symbol with relocation identifies the location of a `use`, and a defined symbol identifies the location of a `define`. Symbol resolution is a process to finish the topology of the use-define graph by connecting all defines to all uses. The termination condition of symbol resolution process in MCLinker is set if and only if every `use`s has its corresponding `define`.

Reference
@article{FraserH82,
  title = {A Machine-Independent Linker},
  author = {Christopher W. Fraser and David R. Hanson},
  year = {1982},
  researchr = {http://researchr.org/publication/FraserH82},
  cites = {0},
  citedby = {0},
  journal = {Software: Practice and Experience},
  volume = {12},
  number = {4},
  pages = {351-366},
}

2013年6月11日 星期二

Global Data Mechanism in RISC Architecture

Some architectures for which the size of their machine instructions cannot contain the full immediate absolute memory addresses as operands, use offsets from a certain base address held in a register to refer the memory addresses.

For such architectures, reference to global variables, functions and data is usually via a global table - ToC (Table of Contents). ToC is a table of offsets. The concept of ToC is very similar to GOT (global offset table) in ARM and X86 architecture. Both of they stores the offset of the global variables and functions. Compilers and linkers must know the start addresses of these tables to calculate the absolute address of the global variables and functions.

The difference between ToC and GOT is the mechanism to extract the absolute addresses. The architectures using GOT usually use a special symbol to hold the address of GOT. For each GOT entry, there is a corresponding dynamic relocation entry to tell loaders how to calculate the correct absolute address. Such architectures usually have a single, centralized GOT and loaders can easily access GOT via special segment.

However, the architectures using ToC usually have special purpose register to hold the address of ToC. The start address of ToC can be re-assigned at run time. This allows such architectures to have multiple ToCs in the same program and simplify the compiler implementation. However, assemblers or linkers thus have additional overhead during instruction relaxing.

In ARM architecture, constant pool (aka address pool) is another mechanism to access variables that is far away. ARM does not want to sacrifice a general-purpose register for holding the start address, so they do not have ToC. Moreover, the immediate operands in ARM are very small, the GOT would not be able to store much addresses. For these two reasons, ARM architecture stores addresses in between the code and extracts these addresses through PC-relative loads.

Reference

Gadi Haber, et al., "Optimization Opportunities Created by Global Data Reordering," 2003.
@inproceedings{Haber:2003:OOC:776261.776286,
 author = {Haber, Gadi and Klausner, Moshe and Eisenberg, Vadim and Mendelson, Bilha and Gurevich, Maxim},
 title = {Optimization opportunities created by global data reordering},
 booktitle = {Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization},
 series = {CGO '03},
 year = {2003},
 isbn = {0-7695-1913-X},
 location = {San Francisco, California},
 pages = {228--237},
 numpages = {10},
 url = {http://dl.acm.org/citation.cfm?id=776261.776286},
 acmid = {776286},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA},
}