A pinboard by
George Ng

I have a Doctorate in Biotechnology and I'm a machine learning expert based in Hong Kong


Recently DeepMind beat, the current world number one in the game of GO.

The Problem With AI 1.0

The self-learning (AI) algorithm may have won in the ancient two-player board game, but don't ask it to win at PAC Man. It can't. This is known as narrow or weak AI - built specifically to solve a narrow task.

Strong AI was popularised by the Chinese Room experiment - where a person using a software program, replies to questions posed by a human Chinese speaker outside the room and convinces the human that the program is a live Chinese speaker. Instead of simulating human reasoning, Weak AI, on the other hand, wants to mimic human reasoning.

It helps to understand the problem by visualising the elements of Artificial General Intelligence (AGI), with the eyes (computer vision), ears and mouth (text and natural language processing), and the touch (haptic robots). The missing puzzle is the command centre or the brain is needed - AGI.

AI 2.0 - AGI

The goal of AGI is to create this brain, resembling human thinking that infers across a wide spectrum of situations, where AGI system display human-like common sense as opposed to possessing a narrow skill.

The holy grail of AGI is the unification of computer vision, text and natural language processing, robotics commandeered by a single brain that possesses human-like skill over a comprehensive range of tasks. We call AGI AI 2.0.

How Does AI 2.0 Work?

Imagine a software program successfully learns to win the game PONG and moves on to learn how to win at Minecraft. Using what it learnt to win PONG in order to spend less time learning to win at Minecraft - two different games. That's called Transfer Learning - the ability of an AI to learn from diverse tasks and apply its accumulated education to a novel task. The advantage is that this method drastically reduces the time to learn the solution to a new task from scratch. DeepMind Lab - has developed PathNet - a network of neural networks trained using both stochastic gradient descent and a genetic selection method.

Using the PONG and Minecraft games to demonstrate successful transfer learning; fixing the parameters along a path learned on winning at Pong and re-evolving a new population of paths for Minecraft, results in winning at Minecraft to be learned faster than it could be learned from scratch.


PathNet: Evolution Channels Gradient Descent in Super Neural Networks

Abstract: For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C).

Pub.: 30 Jan '17, Pinned: 17 Jun '17

Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

Abstract: In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.

Pub.: 05 Apr '17, Pinned: 16 Jun '17

Minimally Naturalistic Artificial Intelligence

Abstract: The rapid advancement of machine learning techniques has re-energized research into general artificial intelligence. While the idea of domain-agnostic meta-learning is appealing, this emerging field must come to terms with its relationship to human cognition and the statistics and structure of the tasks humans perform. The position of this article is that only by aligning our agents' abilities and environments with those of humans do we stand a chance at developing general artificial intelligence (GAI). A broad reading of the famous 'No Free Lunch' theorem is that there is no universally optimal inductive bias or, equivalently, bias-free learning is impossible. This follows from the fact that there are an infinite number of ways to extrapolate data, any of which might be the one used by the data generating environment; an inductive bias prefers some of these extrapolations to others, which lowers performance in environments using these adversarial extrapolations. We may posit that the optimal GAI is the one that maximally exploits the statistics of its environment to create its inductive bias; accepting the fact that this agent is guaranteed to be extremely sub-optimal for some alternative environments. This trade-off appears benign when thinking about the environment as being the physical universe, as performance on any fictive universe is obviously irrelevant. But, we should expect a sharper inductive bias if we further constrain our environment. Indeed, we implicitly do so by defining GAI in terms of accomplishing that humans consider useful. One common version of this is need the for 'common-sense reasoning', which implicitly appeals to the statistics of physical universe as perceived by humans.

Pub.: 13 Jan '17, Pinned: 17 Jun '17

Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots.

Abstract: A reinforcement learning agent that autonomously explores its environment can utilize a curiosity drive to enable continual learning of skills, in the absence of any external rewards. We formulate curiosity-driven exploration, and eventual skill acquisition, as a selective sampling problem. Each environment setting provides the agent with a stream of instances. An instance is a sensory observation that, when queried, causes an outcome that the agent is trying to predict. After an instance is observed, a query condition, derived herein, tells whether its outcome is statistically known or unknown to the agent, based on the confidence interval of an online linear classifier. Upon encountering the first unknown instance, the agent "queries" the environment to observe the outcome, which is expected to improve its confidence in the corresponding predictor. If the environment is in a setting where all instances are known, the agent generates a plan of actions to reach a new setting, where an unknown instance is likely to be encountered. The desired setting is a self-generated goal, and the plan of action, essentially a program to solve a problem, is a skill. The success of the plan depends on the quality of the agent's predictors, which are improved as mentioned above. For validation, this method is applied to both a simulated and real Katana robot arm in its "blocks-world" environment. Results show that the proposed method generates sample-efficient curious exploration behavior, which exhibits developmental stages, continual learning, and skill acquisition, in an intrinsically-motivated playful agent.

Pub.: 11 Dec '13, Pinned: 17 Jun '17