A pinboard by



Use of data mining decision tree algorithm to predict occurrence of community diseases.

Information Technology is everywhere: in the government, in the academe, in military, in our social life and in health. Technology advancement in the field of health is in a high leap especially with the occurrence of highly contagious diseases that killed thousands already. I.T. is being employed in the search of cure, maintenance and monitoring. This research focus on the prediction of top 5 diseases in the City of Caloocan. Philippines. It aims to determine the extent of a disease in the community and to use actual-historical hospital medical records to create a model that will predict occurrence of diseases using data mining-decision tree.

C. Objectives of the Study The general objective of this study is to develop a framework to be used in predicting diseases in Barangays (communities) of Caloocan City, Philippines

  1. To determine the important attributes in predicting diseases in the different Barangay of Caloocan City.
  2. To develop a decision tree model that will be used for predicting diseases in the different barangays of Caloocan City.
  3. To evaluate the accuracy of the Community Diseases Monitoring Information System (CDMIS) prototype in implementing the prediction model.

The study focuses on the medical history of the residents of Caloocan City collected from DJNR Memorial Hospital. The data were used to generate a decision tree model that will predict the future occurrence of disease and to describe the rate of occurrence of disease per barangay using a color coded map that indicates the extent of a specific disease, This model will be useful for the city government of Caloocan City in the timely and accurate delivery of services related to health. It will determine the exact barangay or community where the medical services and goods are needed. The rule set from the model will be used to develop the Community Disease Monitoring Information System (CDMIS). For the residents, the model generated will be used to predict possible occurrence of diseases like dengue, hypertensions, heart related diseases, TB and pneumonia.


Prediction by data mining, of suicide attempts in Korean adolescents: a national study.

Abstract: This study aimed to develop a prediction model for suicide attempts in Korean adolescents.We conducted a decision tree analysis of 2,754 middle and high school students nationwide. We fixed suicide attempt as the dependent variable and eleven sociodemographic, intrapersonal, and extrapersonal variables as independent variables.The rate of suicide attempts of the total sample was 9.5%, and severity of depression was the strongest variable to predict suicide attempt. The rates of suicide attempts in the depression and potential depression groups were 5.4 and 2.8 times higher than that of the non-depression group. In the depression group, the most powerful factor to predict a suicide attempt was delinquency, and the rate of suicide attempts in those in the depression group with higher delinquency was two times higher than in those in the depression group with lower delinquency. Of special note, the rate of suicide attempts in the depressed females with higher delinquency was the highest. Interestingly, in the potential depression group, the most impactful factor to predict a suicide attempt was intimacy with family, and the rate of suicide attempts of those in the potential depression group with lower intimacy with family was 2.4 times higher than that of those in the potential depression group with higher intimacy with family. And, among the potential depression group, middle school students with lower intimacy with family had a 2.5-times higher rate of suicide attempts than high school students with lower intimacy with family. Finally, in the non-depression group, stress level was the most powerful factor to predict a suicide attempt. Among the non-depression group, students who reported high levels of stress showed an 8.3-times higher rate of suicide attempts than students who reported average levels of stress.Based on the results, we especially need to pay attention to depressed females with higher delinquency and those with potential depression with lower intimacy with family to prevent suicide attempts in teenagers.

Pub.: 24 Sep '15, Pinned: 10 Nov '17

Predicting rotator cuff tears using data mining and Bayesian likelihood ratios.

Abstract: Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into "tear" and "no tear" groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.

Pub.: 16 Apr '14, Pinned: 10 Nov '17

Seminal quality prediction using data mining methods.

Abstract: Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality.The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods.The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes.The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM+PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM+PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction.This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM+PSO provide more accurate results than support vector machine and decision tree.

Pub.: 06 Jun '14, Pinned: 10 Nov '17