A pinboard by
Jian Du

Postdoc Research Associate, Carnegie Mellon University


My research area lies in distributed data processing for big data in large-scale networks. We design and evaluate efficient algorithm that is scalable to the network size to analyze and process big data from engineering science to social networks.

The requirement of processing, analyzing and extracting information from raw data generated in large-scale network have been pervasive. For example, in the transportation networks, as more and more cameras and sensors are deployed, processing of the city-scale transportation data from different locations can help monitor and early warn along transportation networks. Another example is the massive data generated by smart meters in the power grid. As smart meters are widely adopted, more and more data is available in the power grid. Processing and analyzing the city-scale power grid data can help us understand current demand profiles and how best to support future demand increases as well as monitor the power grid behavior to avoid or minimized the blackouts. Recommendation systems like amazon and Netflix which generate massive data also need data processing and analyzing to serve the customer.

When extracting information from the raw data, traditional centralized data processing is not preferred since it requires not only the knowledge of the network topology but also heavy communications from peripheral nodes to central processing unit. Besides, computation at the central processing unit cannot scale indefinitely with the network size. Therefore, distributed data processing, which involves only local computation at each node in the network and limited information exchanges between immediate neighboring nodes is needed.

The main challenges to realize the distributed data processing is the convergence property of the distributed data processing algorithm. We proposed a distributed algorithm that converges to the optimal point with exponential convergence rate. Moreover, it is robust to packet drops and converges with asynchronous updating.


Anytime Exact Belief Propagation

Abstract: Statistical Relational Models and, more recently, Probabilistic Programming, have been making strides towards an integration of logic and probabilistic reasoning. A natural expectation for this project is that a probabilistic logic reasoning algorithm reduces to a logic reasoning algorithm when provided a model that only involves 0-1 probabilities, exhibiting all the advantages of logic reasoning such as short-circuiting, intelligibility, and the ability to provide proof trees for a query answer. In fact, we can take this further and require that these characteristics be present even for probabilistic models with probabilities \emph{near} 0 and 1, with graceful degradation as the model becomes more uncertain. We also seek inference that has amortized constant time complexity on a model's size (even if still exponential in the induced width of a more directly relevant portion of it) so that it can be applied to huge knowledge bases of which only a relatively small portion is relevant to typical queries. We believe that, among the probabilistic reasoning algorithms, Belief Propagation is the most similar to logic reasoning: messages are propagated among neighboring variables, and the paths of message-passing are similar to proof trees. However, Belief Propagation is either only applicable to tree models, or approximate (and without guarantees) for precision and convergence. In this paper we present work in progress on an Anytime Exact Belief Propagation algorithm that is very similar to Belief Propagation but is exact even for graphical models with cycles, while exhibiting soft short-circuiting, amortized constant time complexity in the model size, and which can provide probabilistic proof trees.

Pub.: 27 Jul '17, Pinned: 25 Aug '17

Belief Propagation, Bethe Approximation and Polynomials

Abstract: Factor graphs are important models for succinctly representing probability distributions in machine learning, coding theory, and statistical physics. Several computational problems, such as computing marginals and partition functions, arise naturally when working with factor graphs. Belief propagation is a widely deployed iterative method for solving these problems. However, despite its significant empirical success, not much is known about the correctness and efficiency of belief propagation. Bethe approximation is an optimization-based framework for approximating partition functions. While it is known that the stationary points of the Bethe approximation coincide with the fixed points of belief propagation, in general, the relation between the Bethe approximation and the partition function is not well understood. It has been observed that for a few classes of factor graphs, the Bethe approximation always gives a lower bound to the partition function, which distinguishes them from the general case, where neither a lower bound, nor an upper bound holds universally. This has been rigorously proved for permanents and for attractive graphical models. Here we consider bipartite normal factor graphs and show that if the local constraints satisfy a certain analytic property, the Bethe approximation is a lower bound to the partition function. We arrive at this result by viewing factor graphs through the lens of polynomials. In this process, we reformulate the Bethe approximation as a polynomial optimization problem. Our sufficient condition for the lower bound property to hold is inspired by recent developments in the theory of real stable polynomials. We believe that this way of viewing factor graphs and its connection to real stability might lead to a better understanding of belief propagation and factor graphs in general.

Pub.: 08 Aug '17, Pinned: 25 Aug '17