2013年12月14日 星期六

初學者指南:如何客製化自己的 streambu

This article is revised from http://www.mr-edd.co.uk/blog/beginners_guide_streambuf

這篇廢話很多,所以我砍掉廢話,寫了精簡版。

Background of Stream Buffer

在 C++ STL 中,stream 只是一種操作界面,其實並不負責真正的資料搬移動作。真正的資料搬移,是在一個特別的類別 - streambuf 當中完成。舉個例子來說,ofstream 並不負責把資料搬到檔案當中,真正負責搬資料動作的,是 filebuf,但是ofstream決定資料搬移的時機;類似的是 stringtream 和 stringbuf 的關係,stringbuf 負責真正的資料搬移,而stringstream決定了搬移的時機。
上圖是從 cplusplus.com 所剪出來的圖,從這圖中,我們可以看到很清楚的概念。ostream與istream提供了所有 stream 的抽象界面,而streambuf提供了buffer的抽象界面。ostream/istream與buffer的關係,是所謂的strategy,一旦更換buffer,則ostream/istream就彷彿換了個行為一般。fstream與stringstream就是利用這個方式,藉由filebuf與stringbuf來修改ostream與istream的行為。

std::streambuf 在概念上,就是一個 array,被 stream 物件所使用。input stream 會 re-filled 這個 array;或者 output stream 會 flush 且清空這個 array。
當插入一筆資料到 ostream,這筆資料就會被寫進 buffer 的 array。當這 array overflows,那比資料就會被 flush 到目的地。並且此 array 的狀態會被重新設定,好寫入更多的資料。

重點提示

  • 所有的 stream buffer 都是繼承 std::streambuf base class。衍生物件必須要透過 override std::streambuf 的 virtual functions 來識做自己的特殊行為。
  • stream buffer 就是一個 array,當資料都被移出 buffer 時,istream 會重新 fill 這個 array;而當資料都被移入 buffer 之後,ostream 會呼叫 flush 並且清空這個 array。

Internal Array

Streambuf 使用了六個 pointer 來維持 array的資料 - 三個 for input,三個 for output。

給 ostream 使用的 pointer 有
  • pbase() - 指向 array 的第一個 element
  • pptr() - put pointer,指向下一個被 ostream 寫入的 element
  • eppter() - the end of array
outbuf_pointers.png
一般來說,pbase() 和 epptr() 不會改變,只有 pptr() 會隨著 ostream 不斷地寫入而改變。

Input pointer 有:
  • eback() - 指向 array 的最後一個 element。
  • gptr() - the get pointer - 指向 buffer 下一個應該被送入 istream 的 element
  • epgtr() -the end of array
inbuf_pointers.png
相同的,eback() 與 epgtr() 一般來說不會改變。只有 gptr 會隨著 istream 不斷地讀取資料而改變。

Example 1 - New OStream

這例子當中,我們的 buffer 是要給 ostream 使用。每次 ostream 呼叫 operator << 時,我們便將資料搬到 buffer 當中。當 buffer 已經滿了,無法再寫入時,會呼叫 overflow()。

我們只需要 override overflow() 和 sync()。當 pptr() == epptr() 的時候,overflow 會被執行。overflow() 應該要把 contents (包含發生 overflow 的那個 CharT) 寫入資料當中,並且 return 一個不是 traits_type::eof() 的直回去。

sync() 負責把目前的 data 寫入 target 當中,即使 buffer 尚未 full。舉例來說,如果 std::flush 被寫入了,應該要立即寫入 target。如果 sync() 失敗,應該要 return -1。

setg用來設定前述的 ostream buffer,我們只需要設定 pbase() 和 epptr()。系統會自動設定 pptr() 為 pbase()。

2013年6月24日 星期一

GraphLite project

I am going to start a new project. I call it GraphLite project. GraphLite project aims to provide lite weight data structure and algorithms with focus on tasks connected with graph theory. It will be a part of the MCLinker project.

GraphLite comes out from the demand of garbage collection in the MCLinker project. Garbage collection is a linker optimization to remove the unused fragments and symbols in a software module. For example, if you write a function `foo` in your program and you never use it, then linker can remove the redundant function at link-time. Garbage collection is a kind of solution to the reachability problem in graph theory. From linkers perspective, the modules is a kind of graph whose nodes are fragments, and edges are relocation entries. Linkers traverse the graph and remove all nodes that are unable to visit.

I did some study. Here are some great graph implementations: Boost Graph library, LEMON, and OGDF. I personally like LEMON most due to its clear architecture. LEMON separate the graph theory problems into three loose-coupled components: graph data structures, maps and algorithms. Graph data structures provides pure structures. That means the data structures do not contain associated data (node labels, weight etc.). The associated data is stored in `map`s. Take max/min flow problem for example. Arcs do not contain weight information; Developers must use ArcMap to attach weight information to all arcs.

There are two reasons drive me the develop a new graph library. First, MCLinker must run well even on limited mobile devices. That constraint drives me to implement a smaller graph library. Second, garbage collection process requires adjacency matrix of a graph, and LEMON does not provide it.

Garbage Collection Process

Garbage collection process has four steps. First, linkers transform modules into a fragment-reference graph. Then, build up the adjacency matrix of the graph. Adjacency matrix is a kind of boolean matrix. Every element in the matrix represents the connectivity of two nodes. If there is an edge between two nodes, then the corresponding element should be `1`. The third step is straightforward. Linkers self-multiply the adjacency matrix over and over until the result of multiplication doesn't change. After the self-multiplication, `0` means there is no path between these two nodes. That is, if the program is started from one fragment, then the program never execute the paired fragments whose value of the corresponding element of the self-multiplied adjacency matrix is `0`. Final step, linkers remove all `0` paired fragments of all fragments who represent the entrances of the module.


2013年6月20日 星期四

mind set

Materialization is just smoke and mirrors. Partition, Allocation and Scheduling are essential.

2013年6月13日 星期四

An old article: "A Machine-Independent Linker"

Fraser and Hanson gives a mathematical definition as a termination condition of symbol resolution process. In this paper, they classify symbols as follows.
  • R - a set of symbols referenced in object codes
  • D - a set of symbols whose values are defined in some object code
  • S - a set  of symbols that identify segments
An object file for which

R is a subset of the union of D and S

is said to be resolved.

The definition means that every undefined symbol has a corresponding defined symbol.

In MCLinker, we treat symbol resolution in the other way. MCLinker IR is a reduced graph of the use-define chain in typical compilers. An undefined symbol with relocation identifies the location of a `use`, and a defined symbol identifies the location of a `define`. Symbol resolution is a process to finish the topology of the use-define graph by connecting all defines to all uses. The termination condition of symbol resolution process in MCLinker is set if and only if every `use`s has its corresponding `define`.

Reference
@article{FraserH82,
  title = {A Machine-Independent Linker},
  author = {Christopher W. Fraser and David R. Hanson},
  year = {1982},
  researchr = {http://researchr.org/publication/FraserH82},
  cites = {0},
  citedby = {0},
  journal = {Software: Practice and Experience},
  volume = {12},
  number = {4},
  pages = {351-366},
}

2013年6月11日 星期二

Global Data Mechanism in RISC Architecture

Some architectures for which the size of their machine instructions cannot contain the full immediate absolute memory addresses as operands, use offsets from a certain base address held in a register to refer the memory addresses.

For such architectures, reference to global variables, functions and data is usually via a global table - ToC (Table of Contents). ToC is a table of offsets. The concept of ToC is very similar to GOT (global offset table) in ARM and X86 architecture. Both of they stores the offset of the global variables and functions. Compilers and linkers must know the start addresses of these tables to calculate the absolute address of the global variables and functions.

The difference between ToC and GOT is the mechanism to extract the absolute addresses. The architectures using GOT usually use a special symbol to hold the address of GOT. For each GOT entry, there is a corresponding dynamic relocation entry to tell loaders how to calculate the correct absolute address. Such architectures usually have a single, centralized GOT and loaders can easily access GOT via special segment.

However, the architectures using ToC usually have special purpose register to hold the address of ToC. The start address of ToC can be re-assigned at run time. This allows such architectures to have multiple ToCs in the same program and simplify the compiler implementation. However, assemblers or linkers thus have additional overhead during instruction relaxing.

In ARM architecture, constant pool (aka address pool) is another mechanism to access variables that is far away. ARM does not want to sacrifice a general-purpose register for holding the start address, so they do not have ToC. Moreover, the immediate operands in ARM are very small, the GOT would not be able to store much addresses. For these two reasons, ARM architecture stores addresses in between the code and extracts these addresses through PC-relative loads.

Reference

Gadi Haber, et al., "Optimization Opportunities Created by Global Data Reordering," 2003.
@inproceedings{Haber:2003:OOC:776261.776286,
 author = {Haber, Gadi and Klausner, Moshe and Eisenberg, Vadim and Mendelson, Bilha and Gurevich, Maxim},
 title = {Optimization opportunities created by global data reordering},
 booktitle = {Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization},
 series = {CGO '03},
 year = {2003},
 isbn = {0-7695-1913-X},
 location = {San Francisco, California},
 pages = {228--237},
 numpages = {10},
 url = {http://dl.acm.org/citation.cfm?id=776261.776286},
 acmid = {776286},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA},
}