A pinboard by
this curator

Research Fellow, Deakin University


Efficient Identification of Arbitrarily Shaped and Varied Density Clusters in High-dimensional Data.

Clustering has become one of the most important processes of knowledge discovery from data in the era of big data. It explores and reveals the hidden patterns in the data, and provides insight into the natural groupings in the data. I am solving two existing problems of density-based clustering in order to efficiently identify the arbitrarily shaped and varied density clusters in high-dimensional data. I have investigated and designed different approaches for each problem. The effectiveness of these proposed approaches has been verified with extensive empirical evaluations on synthetic and real-world datasets.


Machine Learning as an Adversarial Service: Learning Black-Box Adversarial Examples

Abstract: Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. Until now, black-box attacks against neural networks have relied on transferability of adversarial examples. White-box attacks are used to generate adversarial examples on a substitute model and then transferred to the black-box target model. In this paper, we introduce a direct attack against black-box neural networks, that uses another attacker neural network to learn to craft adversarial examples. We show that our attack is capable of crafting adversarial examples that are indistinguishable from the source input and are misclassified with overwhelming probability - reducing accuracy of the black-box neural network from 99.4% to 0.77% on the MNIST dataset, and from 91.4% to 6.8% on the CIFAR-10 dataset. Our attack can adapt and reduce the effectiveness of proposed defenses against adversarial examples, requires very little training data, and produces adversarial examples that can transfer to different machine learning models such as Random Forest, SVM, and K-Nearest Neighbor. To demonstrate the practicality of our attack, we launch a live attack against a target black-box model hosted online by Amazon: the crafted adversarial examples reduce its accuracy from 91.8% to 61.3%. Additionally, we show attacks proposed in the literature have unique, identifiable distributions. We use this information to train a classifier that is robust against such attacks.

Pub.: 17 Aug '17, Pinned: 21 Aug '17

Machine Learning Methods to Predict Diabetes Complications.

Abstract: One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.

Pub.: 13 May '17, Pinned: 21 Aug '17