A pinboard by
Congduan Li

Postdoctoral Research Fellow, City University of Hong Kong


This research is in distributed storage systems, which are used in many places in this big data era. We know that the simplest way to protect from failure of disks is duplication. But this is a waste of resources. By coding (mix the data together), one can save the storage spaces. In conventional distributed storage exact repair problems, all sources are reconstructed when the decoder has access to a certain number of encoders (disks). So, the underlying reconstruction network is equivalent to a single-source problem. This paper considers a variant of the exact repair problem, where the underlying reconstruction network is the independent distributed source coding problem, a type of multi-source problem. As the first non-trivial case with two sources and three encoders, the storage-repair tradeoff regions are proved for all the 33 instances, and it is shown that binary codes are optimal.


On Multi-source Networks: Enumeration, Rate Region Computation, and Hierarchy

Abstract: This paper investigates the enumeration, rate region computation, and hierarchy of general multi-source multi-sink hyperedge networks under network coding, which includes multiple network models, such as independent distributed storage systems and index coding problems, as special cases. A notion of minimal networks and a notion of network equivalence under group action are defined. An efficient algorithm capable of directly listing single minimal canonical representatives from each network equivalence class is presented and utilized to list all minimal canonical networks with up to 5 sources and hyperedges. Computational tools are then applied to obtain the rate regions of all of these canonical networks, providing exact expressions for 744,119 newly solved network coding rate regions corresponding to more than 2 trillion isomorphic network coding problems. In order to better understand and analyze the huge repository of rate regions through hierarchy, several embedding and combination operations are defined so that the rate region of the network after operation can be derived from the rate regions of networks involved in the operation. The embedding operations enable the definition and determination of a list of forbidden network minors for the sufficiency of classes of linear codes. The combination operations enable the rate regions of some larger networks to be obtained as the combination of the rate regions of smaller networks. The integration of both the combinations and embedding operators is then shown to enable the calculation of rate regions for many networks not reachable via combination operations alone.

Pub.: 21 Jul '15, Pinned: 25 Aug '17

Network Coding for Distributed Storage Systems

Abstract: Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a node failure is for a new node to download subsets of data stored at a number of surviving nodes, reconstruct a lost coded block using the downloaded data, and store it at the new node. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to download \emph{functions} of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff.

Pub.: 05 Mar '08, Pinned: 25 Aug '17