Ph.D in Biotechnology who has joined a eCommerce startup in Hong Kong
We review Machine Learning models that has shown promise in the early detection of breast cancer
Breast Cancer And Machine Learning Breast cancer is cancer that starts in breast tissues and 5 -10% are hereditary causes due to mutation of the BRCA1 and BRCA2 genes - . Artificial intelligence is when a computer exhibits intelligence by mimicking 'cognitive' functions that humans associate with learning and problem solving. Breast cancer detection using Machine Learning Models - a sub-field of Artificial Intelligence, has shown promise. Feature extraction based on a new hybrid feature selection scheme, is used as evaluation criterion for the classification of breast malignant tumors in ultrasound images has shown a higher classification accuracy of 96.6% compared with back-propagation artificial neural network.
Breast Cancer Cluster You can't miss the ABC News Studio and it's transmission tower as you drive from the Brisbane city to the Toowong suburb. The 'Armstrong review' had found that risk of breast cancer in women working at the ABC at Toowong from January 1, 1994, to June 30, 2006, was “six times higher than that in the general population of women in Queensland” 12 years, 18 breast cancer diagnosis, 1 death and 3 review reports later ABC management decided evacuation was the only option. In 2016 the Toowong ABC Studio became synonymous with the world's first breast cancer cluster closed for good.
Abstract: Breast and pectoral muscle segmentation is an essential pre-processing step for the subsequent processes in computer aided diagnosis (CAD) systems. Estimating the breast and pectoral boundaries is a difficult task especially in mammograms due to artifacts, homogeneity between the pectoral and breast regions, and low contrast along the skin-air boundary. In this paper, a breast boundary and pectoral muscle segmentation method in mammograms is proposed. For breast boundary estimation, we determine the initial breast boundary via thresholding and employ Active Contour Models without edges to search for the actual boundary. A post-processing technique is proposed to correct the overestimated boundary caused by artifacts. The pectoral muscle boundary is estimated using Canny edge detection and a pre-processing technique is proposed to remove noisy edges. Subsequently, we identify five edge features to find the edge that has the highest probability of being the initial pectoral contour and search for the actual boundary via contour growing. The segmentation results for the proposed method are compared with manual segmentations using 322, 208 and 100mammograms from the Mammographic Image Analysis Society (MIAS), INBreast and Breast Cancer Digital Repository (BCDR) databases, respectively. Experimental results show that the breast boundary and pectoral muscle estimation methods achieved dice similarity coefficients of 98.8% and 97.8% (MIAS), 98.9% and 89.6% (INBreast) and 99.2% and 91.9% (BCDR), respectively.
Pub.: 14 Jun '17, Pinned: 15 Jun '17
Abstract: Selection of relevant and appropriate features to characterize breast patterns is of paramount importance in breast tissue representation and classification in machine learning paradigm. Feature selection based on single evaluation criterion has shown limited capability in breast tumor detection and classification due to their biases towards single criterion. In this paper, a new hybrid feature selection scheme is used to determine most relevant features for classification of benign and malignant tumors in breast ultrasound images. The proposed approach uses ten different evaluation criteria to decide the relevance of a particular feature. The existing feature selection techniques are also reviewed. A new database of 178 breast ultrasound images consisting of 88 benign and 90 malignant cases are used in experiments. The performance of the proposed approach is compared with that of existing feature selection techniques using back-propagation artificial neural network (BPANN) and support vector machine (SVM) based classifiers. The results demonstrate that proposed feature selection approach outperformed traditional methods achieving significantly higher classification accuracy of 96.6% and 94.4 % with BPANN and SVM classifiers respectively.
Pub.: 11 Jan '17, Pinned: 19 Apr '17
Abstract: It is estimated that 7% of women in the western world will develop palpable breast cysts in their lifetime. Even though cysts have been correlated with risk of developing breast cancer, many of them are benign and do not require follow-up. We develop a method to discriminate benign solitary cysts from malignant masses in digital mammography. We think a system like this can have merit in the clinic as a decision aid or complementary to specialised modalities.We employ a deep Convolutional Neural Network (CNN) to classify cyst and mass patches. Deep CNNs have been shown to be powerful classifiers, but need a large amount of training data which for medical problems is often difficult to come by. The key contribution of this paper is that we show good performance can be obtained on a small dataset by pretraining the network on a large dataset of a related task. We subsequently investigate the following: (1) when a mammographic exam is performed, two different views of the same breast are recorded. We investigate the merit of combining the output of the classifier from these two views. (2) We evaluate the importance of the resolution of the patches fed to the network. (3) A method dubbed tissue augmentation is subsequently employed, where we extract normal tissue from normal patches and superimpose this onto the actual samples aiming for a classifier invariant to occluding tissue. (4) We combine the representation extracted using the deep CNN with our previously developed features.We show that using the proposed deep learning method, an Area Under the ROC Curve (AUC) value of 0.80 can be obtained on a set of benign solitary cysts and malignant mass findings recalled in screening. We find that it works significantly better than our previously developed approach by comparing the AUC of the ROC using bootstrapping. By combining views, the results can be further improved, though this difference was not found to be significant. We find no significant difference between using a resolution of 100 versus 200 micron. The proposed tissue augmentations give a small improvement in performance, but this improvement was also not found to be significant. The final system obtained an AUC of 0.80 with 95% confidence interval [0.78, 0.83], calculated using bootstrapping. The system works best for lesions larger than 27mm where it obtains an AUC value of 0.87.We have presented a Computer Aided Diagnosis (CADx) method to discriminate cysts from solid lesion in mammography using features from a deep Convolutional Neural Network (CNN) trained on a large set of mass candidates, obtaining an AUC of 0.80 on a set of diagnostic exams recalled from screening. We believe the system shows great potential and comes close to the performance of recently developed spectral mammography. We think the system can be further improved when more data and computational power becomes available. This article is protected by copyright. All rights reserved.
Pub.: 18 Jan '17, Pinned: 18 Apr '17
Abstract: Predicting tumor proliferation scores is an important biomarker indicative of breast cancer patients' prognosis. In this paper, we present a unified framework to predict tumor proliferation scores from whole slide images in breast histopathology. The proposed system is offers a fully automated solution to predicting both a molecular data based, and a mitosis counting based tumor proliferation score. The framework integrates three modules, each fine-tuned to maximize the overall performance: an image processing component for handling whole slide images, a deep learning based mitosis detection network, and a proliferation scores prediction module. We have achieved 0.567 quadratic weighted Cohen's kappa in mitosis counting based score prediction and 0.652 F1-score in mitosis detection. On Spearman's correlation coefficient, which evaluates prediction on the molecular data based score, the system obtained 0.6171. Our system won first place in all of the three tasks in Tumor Proliferation Assessment Challenge at MICCAI 2016, outperforming all other approaches.
Pub.: 21 Dec '16, Pinned: 18 Apr '17
Abstract: Current analysis of tumor proliferation, the most salient prognostic biomarker for invasive breast cancer, is limited to subjective mitosis counting by pathologists in localized regions of tissue images. This study presents the first data-driven integrative approach to characterize the severity of tumor growth and spread on a categorical and molecular level, utilizing multiple biologically salient deep learning classifiers to develop a comprehensive prognostic model. Our approach achieves pathologist-level performance on three-class categorical tumor severity prediction. It additionally pioneers prediction of molecular expression data from a tissue image, obtaining a Spearman's rank correlation coefficient of 0.60 with ex vivo mean calculated RNA expression. Furthermore, our framework is applied to identify over two hundred unprecedented biomarkers critical to the accurate assessment of tumor proliferation, validating our proposed integrative pipeline as the first to holistically and objectively analyze histopathological images.
Pub.: 11 Oct '16, Pinned: 18 Apr '17
Abstract: Background and Objective: Feature reduction is an essential stage in computer aided breast cancer diagnosis systems. Multilayer neural networks can be trained to extract relevant features by encoding high-dimensional data into low-dimensional codes. Optimizing traditional auto-encoders works well only if the initial weights are close to a proper solution. They are also trained to only reduce the mean squared reconstruction error (MRE) between the encoder inputs and the decoder outputs, but do not address the classification error. The goal of the current work is to test the hypothesis that extending traditional auto-encoders (which only minimize reconstruction error) to multi-objective optimization for finding Pareto-optimal solutions provides more discriminative features that will improve classification performance when compared to single-objective and other multi-objective approaches (i.e. scalarized and sequential).
Pub.: 13 Apr '17, Pinned: 18 Apr '17
Abstract: Microcalcification is an effective indicator of early breast cancer. To improve the diagnostic accuracy of microcalcifications, this study evaluates the performance of deep learning-based models on large datasets for its discrimination. A semi-automated segmentation method was used to characterize all microcalcifications. A discrimination classifier model was constructed to assess the accuracies of microcalcifications and breast masses, either in isolation or combination, for classifying breast lesions. Performances were compared to benchmark models. Our deep learning model achieved a discriminative accuracy of 87.3% if microcalcifications were characterized alone, compared to 85.8% with a support vector machine. The accuracies were 61.3% for both methods with masses alone and improved to 89.7% and 85.8% after the combined analysis with microcalcifications. Image segmentation with our deep learning model yielded 15, 26 and 41 features for the three scenarios, respectively. Overall, deep learning based on large datasets was superior to standard methods for the discrimination of microcalcifications. Accuracy was increased by adopting a combinatorial approach to detect microcalcifications and masses simultaneously. This may have clinical value for early detection and treatment of breast cancer.
Pub.: 09 Jun '16, Pinned: 18 Apr '17
Abstract: Diagnosis of breast carcinomas has so far been limited to the morphological interpretation of epithelial cells and the assessment of epithelial tissue architecture. Consequently, most of the automated systems have focused on characterizing the epithelial regions of the breast to detect cancer. In this paper, we propose a system for classification of hematoxylin and eosin (H&E) stained breast specimens based on convolutional neural networks that primarily targets the assessment of tumor-associated stroma to diagnose breast cancer patients. We evaluate the performance of our proposed system using a large cohort containing 646 breast tissue biopsies. Our evaluations show that the proposed system achieves an area under ROC of 0.92, demonstrating the discriminative power of previously neglected tumor-associated stroma as a diagnostic biomarker.
Pub.: 19 Feb '17, Pinned: 18 Apr '17
Abstract: Recent advances in deep learning for object recognition in natural images has prompted a surge of interest in applying a similar set of techniques to medical images. Most of the initial attempts largely focused on replacing the input to such a deep convolutional neural network from a natural image to a medical image. This, however, does not take into consideration the fundamental differences between these two types of data. More specifically, detection or recognition of an anomaly in medical images depends significantly on fine details, unlike object recognition in natural images where coarser, more global structures matter more. This difference makes it inadequate to use the existing deep convolutional neural networks architectures, which were developed for natural images, because they rely on heavily downsampling an image to a much lower resolution to reduce the memory requirements. This hides details necessary to make accurate predictions for medical images. Furthermore, a single exam in medical imaging often comes with a set of different views which must be seamlessly fused in order to reach a correct conclusion. In our work, we propose to use a multi-view deep convolutional neural network that handles a set of more than one high-resolution medical image. We evaluate this network on large-scale mammography-based breast cancer screening (BI-RADS prediction) using 103 thousand images. We focus on investigating the impact of training set sizes and image sizes on the prediction accuracy. Our results highlight that performance clearly increases with the size of training set, and that the best performance can only be achieved using the images in the original resolution. This suggests the future direction of medical imaging research using deep neural networks is to utilize as much data as possible with the least amount of potentially harmful preprocessing.
Pub.: 21 Mar '17, Pinned: 18 Apr '17
Abstract: The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Pub.: 18 Jun '16, Pinned: 18 Apr '17
Abstract: Survivability rates vary widely among various stages of breast cancer. Although machine learning models built in past to predict breast cancer survivability were given stage as one of the features, they were not trained or evaluated separately for each stage.
Pub.: 09 Nov '16, Pinned: 18 Apr '17
Abstract: Recent advances in machine learning yielded new techniques to train deep neural networks, which resulted in highly successful applications in many pattern recognition tasks such as object detection and speech recognition. In this paper we provide a head-to-head comparison between a state-of-the art in mammography CAD system, relying on a manually designed feature set and a Convolutional Neural Network (CNN), aiming for a system that can ultimately read mammograms independently. Both systems are trained on a large data set of around 45,000 images and results show the CNN outperforms the traditional CAD system at low sensitivity and performs comparable at high sensitivity. We subsequently investigate to what extent features such as location and patient information and commonly used manual features can still complement the network and see improvements at high specificity over the CNN especially with location and context features, which contain information not available to the CNN. Additionally, a reader study was performed, where the network was compared to certified screening radiologists on a patch level and we found no significant difference between the network and the readers.
Pub.: 09 Aug '16, Pinned: 18 Apr '17