A pinboard by
this curator

I am a researcher in the Physics and Chemistry Departments at the University of Cambridge

I am interested in the physical and biological applications of molecular photophysics. My particular interests are in solar energy conversion (OPV), advanced displays (OLED) and super resolution imaging in biology.


The latest in computer vision and smart image recognition

The headline at this years Google I/O conference was the unveiling of its new image recognition system, Google Lens. The high profile of this new product reflects the growing value of the image recognition industry. Indeed, a recent report by Stratistics MRC (see below) values the global image recognition market at $16 billion, forecast to grow to $43 billion by 2022.

But why is there so much excitement about the ability of computers to make simple inferences from images when humans can do this easily? And, more to the point, what is the science behind how computers are trained to do this?

Image recognition and computer vision are a form of artificial intelligence and are integral to many emerging AI technologies, such as self-driving cars, facial recognition, or applications such as process line optimisation in factories. The promise of more autonomous machines in the future means that computer vision must be highly accurate and reliable.

Computer scientists are developing computer vision using neural networks, which are simplistically described as a chain a nodes that are used to model a complex function. Neural networks are used to enable computer to receive information and make decisions.

For image recognition, neural networks are implemented via the process of applying meta data to train a computer with unstructured data. In particular, convolutional neural networks (CNN) - the grouping of multiple pixels to a single neural - enable more efficient image recognition. More on this topic and other technical advances in computer vision are covered in this pinboard of research papers.


CNN-based malicious user detection in social networks

Abstract: Following the advances in various smart devices, there are increasing numbers of users of social network services (SNS), which allows communication and information sharing in real time without limitations on distance or space. Although personal information leakage can occur through SNS, where an individual's personal details or online activities are leaked, and various financial crimes such as phishing and smishing are also possible, there are currently no countermeasures. Consequently, malicious activities are being conducted through messages toward the users who are in follow or friend relationships on SNS. Therefore, in this paper, we propose a method of assessing follow suggestions from users with less likelihood of committing malicious activities through an information-driven follow suggestion based on a categorical classification of interests using both the images and text of user posts. We ensure the objectiveness of interest categories by defining these based on DMOZ, which is established by the Open Directory Project. The images and text are learnt using a convolutional neural network, which is one of the machine learning techniques developed with a biological inspiration, and the interests are classified into categories. Users with a large number of posts are defined as certified users, and a database of certified users is established. Users with similar interests are classified, and the similarity distances between certified users and users are measured, and a follow suggestion is generated to the certified user with the most similar interest. Using the method proposed in this paper to classify the interest categories of certified users and users, precisions of 80% and 79.8% were obtained, respectively, and the overall precision was 79.93%, indicating a good classification performance overall. It is expected that the method proposed in this paper can be used to provide follow suggestions of users with less likelihood of malicious activities based on the information posted by the user.

Pub.: 15 May '17, Pinned: 19 May '17

Deceiving Google's Cloud Video Intelligence API Built for Summarizing Videos

Abstract: Despite the rapid progress of the techniques for image classification, video annotation has remained a challenging task. Automated video annotation would be a breakthrough technology, enabling users to search within the videos. Recently, Google introduced the Cloud Video Intelligence API for video analysis. As per the website, the system "separates signal from noise, by retrieving relevant information at the video, shot or per frame." A demonstration website has been also launched, which allows anyone to select a video for annotation. The API then detects the video labels (objects within the video) as well as shot labels (description of the video events over time). In this paper, we examine the usability of the Google's Cloud Video Intelligence API in adversarial environments. In particular, we investigate whether an adversary can manipulate a video in such a way that the API will return only the adversary-desired labels. For this, we select an image that is different from the content of the Video and insert it, periodically and at a very low rate, into the video. We found that if we insert one image every two seconds, the API is deceived into annotating the entire video as if it only contains the inserted image. Note that the modification to the video is hardly noticeable as, for instance, for a typical frame rate of 25, we insert only one image per 50 video frames. We also found that, by inserting one image per second, all the shot labels returned by the API are related to the inserted image. We perform the experiments on the sample videos provided by the API demonstration website and show that our attack is successful with different videos and images.

Pub.: 26 Mar '17, Pinned: 19 May '17

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Abstract: Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

Pub.: 23 Apr '15, Pinned: 19 May '17