Ph.D Candidate, George Mason University
Network based methodologies for time series that provide new insights into causality and forecasting
High dimensional time series is concerned with data that are characterized by small sample sizes and a large number of features with time trends. As an example, while forecasting the national GDP (which we refer to as a response variable) data concerning several economic features such as income, inflation rate, and unemployment rate are collected over time. However, several of these features are correlated amongst themselves over time and hence the relevant features affecting the response variable are unknown. Consequently, identifying important features in time series data is a first step for data analysis. Recently developed statistical methods for identifying relevant features tend to fail in these situations due to the presence of time lags and trends and their interactions. In my thesis, we develop new methodologies for identifying relevant features and their time effect for response variables of interest. In addition to identification of relevant features, we also identify clusters of features that have a similar effect on response variables. These clusters take into account time effect and time lags. This is accomplished by using new network based methodologies which involve network wide metrics in a multilayer network. These network wide metrics represent the importance of features in each layer of the network. For this reason, we carry out a detailed analysis of the multilayer networks and provide useful software, based on machine learning algorithms, for routine use by practitioners.
Abstract: Suppose that we have a historical time series with samples taken at a slow rate, e.g. quarterly. The paper proposes a new method to answer the question: is it worth sampling the series at a faster rate, e.g. monthly? Our contention is that classical time series methods are designed to analyse a series at a single and given sampling rate with the consequence that analysts are not often encouraged to think carefully about what an appropriate sampling rate might be. To answer the sampling rate question we propose a novel Bayesian method that incorporates the historical series, cost information and small amounts of pilot data sampled at the faster rate. The heart of our method is a new Bayesian spectral estimation technique that is capable of coherently using data sampled at multiple rates and is demonstrated to have superior practical performance compared with alternatives. Additionally, we introduce a method for hindcasting historical data at the faster rate. A freeware R package, regspec, is available that implements our methods. We illustrate our work by using official statistics time series including the UK consumer price index and counts of UK residents travelling abroad, but our methods are general and apply to any situation where time series data are collected.
Pub.: 18 Dec '16, Pinned: 03 Jul '17
Abstract: In one embodiment, a request to make a prediction regarding one or more service level agreements (SLAs) in a network is received. A network traffic parameter and an SLA requirement associated with the network traffic parameter according to the one or more SLAs are also determined. In addition, a performance metric associated with traffic in the network that corresponds to the determined network traffic parameter is estimated. It may then be predicted whether the SLA requirement would be satisfied based on the estimated performance metric.
Pub.: 10 May '16, Pinned: 03 Jul '17
Abstract: Machine learning (ML) is believed to be an effective and efficient tool to build reliable prediction model or extract useful structure from an avalanche of data. However, ML is also criticized by its difficulty in interpretation and complicated parameter tuning. In contrast, visualization is able to well organize and visually encode the entangled information in data and guild audiences to simpler perceptual inferences and analytic thinking. But large scale and high dimensional data will usually lead to the failure of many visualization methods. In this paper, we close a loop between ML and visualization via interaction between ML algorithm and users, so machine intelligence and human intelligence can cooperate and improve each other in a mutually rewarding way. In particular, we propose "transparent boosting tree (TBT)", which visualizes both the model structure and prediction statistics of each step in the learning process of gradient boosting tree to user, and involves user's feedback operations to trees into the learning process. In TBT, ML is in charge of updating weights in learning model and filtering information shown to user from the big data, while visualization is in charge of providing a visual understanding of ML model to facilitate user exploration. It combines the advantages of both ML in big data statistics and human in decision making based on domain knowledge. We develop a user friendly interface for this novel learning method, and apply it to two datasets collected from real applications. Our study shows that making ML transparent by using interactive visualization can significantly improve the exploration of ML algorithms, give rise to novel insights of ML models, and integrates both machine and human intelligence.
Pub.: 18 Oct '16, Pinned: 03 Jul '17
Abstract: The construction of complex networks in current financial data does not take into account the energy characteristics that exist in similar stocks. Without considering the stock network as a kind of asymmetric directed network, most of the existing researches only measure the importance of nodes from the nodes degree in the network, lack of comprehensive consideration of structure and function. Second, for the division of the community, many algorithms are based on the number of community is known, and the network is unweighted and undirected. Based on the entropy analysis of information theory, in this paper we study the complexity and information flow of stock time series and construct the stock influence network on the basis of transfer entropy. By combining the structural features and functional characteristics of complex networks, we find the relationship between stocks. At last we design and verify the forecasting ability of the proposed long- and short-memory model.
Pub.: 20 Jun '17, Pinned: 30 Jun '17
Abstract: We consider the problem of power demand forecasting in residential micro-grids. Several approaches using ARMA models, support vector machines, and recurrent neural networks that perform one-step ahead predictions have been proposed in the literature. Here, we extend them to perform multi-step ahead forecasting and we compare their performance. Toward this end, we implement a parallel and efficient training framework, using power demand traces from real deployments to gauge the accuracy of the considered techniques. Our results indicate that machine learning schemes achieve smaller prediction errors in the mean and the variance with respect to ARMA, but there is no clear algorithm of choice among them. Pros and cons of these approaches are discussed and the solution of choice is found to depend on the specific use case requirements. A hybrid approach, that is driven by the prediction interval, the target error, and its uncertainty, is then recommended.
Pub.: 29 Jun '17, Pinned: 30 Jun '17
Abstract: In this paper we focus our attention on the exploitation of the information contained in financial news to enhance the performance of a classifier of bank distress. Such information should be analyzed and inserted into the predictive model in the most efficient way and this task deals with all the issues related to text analysis and specifically analysis of news media. Among the different models proposed for such purpose, we investigate one of the possible deep learning approaches, based on a doc2vec representation of the textual data, a kind of neural network able to map the sequential and symbolic text input onto a reduced latent semantic space. Afterwards, a second supervised neural network is trained combining news data with standard financial figures to classify banks whether in distressed or tranquil states, based on a small set of known distress events. Then the final aim is not only the improvement of the predictive performance of the classifier but also to assess the importance of news data in the classification process. Does news data really bring more useful information not contained in standard financial variables? Our results seem to confirm such hypothesis.
Pub.: 29 Jun '17, Pinned: 30 Jun '17
Join Sparrho today to stay on top of science
Discover, organise and share research that matters to you