A pinboard by
Sana Siddiqui

Masters Student (Starting Ph.D. in Fall 2017), University of Manitoba


Cognitive detection of advanced and hidden cyber threats using multiscale complexity analysis

In order to decide a hypothesis using machine learning, a decision boundary (decision threshold in hypothesis testing) is required that can be linear or non linear. Machine learning in cyber security is mainly used for creating a decision boundary between threats and normal data flows and are better in detecting new threats (not known already). However, cyber threats are evolvoing into more sophisticated vectors having overlapping or inseparable behaviour where finding any type of decision boundary is not possible. I am working on developing new machine learning algorithms to develop seaparation boundaries for these threats, using multiscale analysis tools e.g. wavelets, fractal analysis. These tools are not only applicable in the domain of cyber security but are also applicable in other application domains e.g. detecting an overlapping object in a 2D image, reading handwritten numbers etc.

Primary challenge that this thesis addresses is identification of stealth threats on network data e.g. packet based communications, where the unique features that can be used to identify specific category of threats are no more unique and apparently threats cannot be distinguished from normal data, because they occupy the same feature space coordinates as that of legitimate instances over which traditional machine intelligence approaches fail in finding a unique decision boundary. It has been shown in this research that contemporary internet data sets having threats and normal samples render overlapping and indistinguishable feature space. Also, this thesis proposes modification in existing machine intelligence methods using multiscale analysis that is akin to cognitive human analytical methods of finding hidden patterns using complexity analysis. Empirical results show promising results in detection performance and are tractable due to the elegance of multiscale tools of fractal and wavelets.


Accelerated Stochastic Power Iteration

Abstract: Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/\Delta)$ full-data passes to recover the principal component of a matrix with eigen-gap $\Delta$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrt{\Delta})$ passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity $\mathcal O(\sigma^2/\Delta^2)$. Unfortunately, they are fully sequential, and also require $\mathcal O(\sigma^2/\Delta^2)$ iterations, far from the $\mathcal O(1/\sqrt{\Delta})$ rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, $\mathcal O(1/\sqrt{\Delta})$. We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity $\mathcal O(1/\sqrt{\Delta})$. Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

Pub.: 09 Jul '17, Pinned: 11 Jul '17

STDP allows close-to-optimal spatiotemporal spike pattern detection by single coincidence detector neurons.

Abstract: Repeating spatiotemporal spike patterns exist and carry information. How this information is extracted by downstream neurons is unclear. Here we theoretically investigate to what extent a single cell could detect a given spike pattern and what the optimal parameters to do so are, in particular the membrane time constant τ. Using a leaky integrate-and-fire (LIF) neuron with homogeneous Poisson input, we computed this optimum analytically. We found that a relatively small τ (at most a few tens of ms) is usually optimal, even when the pattern is much longer. This is somewhat counter-intuitive as the resulting detector ignores most of the pattern, due to its fast memory decay. Next, we wondered if spike-timing-dependent plasticity (STDP) could enable a neuron to reach the theoretical optimum. We simulated a LIF equipped with additive STDP, and repeatedly exposed it to a given input spike pattern. As in previous studies, the LIF progressively became selective to the repeating pattern with no supervision, even when the pattern was embedded in Poisson activity. Here we show that, using certain STDP parameters, the resulting pattern detector is optimal. These mechanisms may explain how humans learn repeating sensory sequences. Long sequences could be recognized thanks to coincidence detectors working at a much shorter timescale. This is consistent with the fact that recognition is still possible if a sound sequence is compressed, played backward, or scrambled using 10ms bins. Coincidence detection is a simple yet powerful mechanism, which could be the main function of neurons in the brain.

Pub.: 03 Jul '17, Pinned: 11 Jul '17