HEAD, COMPUTER STUDIES DEPARTMENT, UNIVERSITY OF CALOOCAN CITY
Use of data mining decision tree algorithm to predict occurrence of community diseases.
Information Technology is everywhere: in the government, in the academe, in military, in our social life and in health. Technology advancement in the field of health is in a high leap especially with the occurrence of highly contagious diseases that killed thousands already. I.T. is being employed in the search of cure, maintenance and monitoring. This research focus on the prediction of top 5 diseases in the City of Caloocan. Philippines. It aims to determine the extent of a disease in the community and to use actual-historical hospital medical records to create a model that will predict occurrence of diseases using data mining-decision tree.
C. Objectives of the Study The general objective of this study is to develop a framework to be used in predicting diseases in Barangays (communities) of Caloocan City, Philippines
The study focuses on the medical history of the residents of Caloocan City collected from DJNR Memorial Hospital. The data were used to generate a decision tree model that will predict the future occurrence of disease and to describe the rate of occurrence of disease per barangay using a color coded map that indicates the extent of a specific disease, This model will be useful for the city government of Caloocan City in the timely and accurate delivery of services related to health. It will determine the exact barangay or community where the medical services and goods are needed. The rule set from the model will be used to develop the Community Disease Monitoring Information System (CDMIS). For the residents, the model generated will be used to predict possible occurrence of diseases like dengue, hypertensions, heart related diseases, TB and pneumonia.
Abstract: During these decades, data mining has become one of the effective tools for data analysis and knowledge management system, so that there are many areas which adapted data mining approach to solve their problems. Using data mining in education to enhance the education system is still relatively new. This paper focuses on predicting the instructor performance and investigates the factors that affect students’ achievements to improve the education system quality. Turkey Student Evaluation records dataset is considered and run on different data classifier such as J48 Decision Tree, Multilayer Perception, Naïve Bayes, and Sequential Minimal Optimization. Comparison of all the four classifiers is conducted to predict the accuracy and to find the best performing classification algorithm among all. The conclusions of this study are very promising and provide another point of view to evaluate student performance. It also highlights the importance of employing data mining tools in the field of education. The results show that using the attribute evaluation method on the dataset increases the prediction performance accuracy.
Pub.: 25 Oct '16, Pinned: 10 Nov '17
Abstract: This study aimed to develop a prediction model for suicide attempts in Korean adolescents.We conducted a decision tree analysis of 2,754 middle and high school students nationwide. We fixed suicide attempt as the dependent variable and eleven sociodemographic, intrapersonal, and extrapersonal variables as independent variables.The rate of suicide attempts of the total sample was 9.5%, and severity of depression was the strongest variable to predict suicide attempt. The rates of suicide attempts in the depression and potential depression groups were 5.4 and 2.8 times higher than that of the non-depression group. In the depression group, the most powerful factor to predict a suicide attempt was delinquency, and the rate of suicide attempts in those in the depression group with higher delinquency was two times higher than in those in the depression group with lower delinquency. Of special note, the rate of suicide attempts in the depressed females with higher delinquency was the highest. Interestingly, in the potential depression group, the most impactful factor to predict a suicide attempt was intimacy with family, and the rate of suicide attempts of those in the potential depression group with lower intimacy with family was 2.4 times higher than that of those in the potential depression group with higher intimacy with family. And, among the potential depression group, middle school students with lower intimacy with family had a 2.5-times higher rate of suicide attempts than high school students with lower intimacy with family. Finally, in the non-depression group, stress level was the most powerful factor to predict a suicide attempt. Among the non-depression group, students who reported high levels of stress showed an 8.3-times higher rate of suicide attempts than students who reported average levels of stress.Based on the results, we especially need to pay attention to depressed females with higher delinquency and those with potential depression with lower intimacy with family to prevent suicide attempts in teenagers.
Pub.: 24 Sep '15, Pinned: 10 Nov '17
Abstract: Recently, economic depression, which scoured all over the world, affects business organizations and banking sectors. Such economic pose causes a severe attrition for banks and customer retention becomes impossible. Accordingly, marketing managers are in need to increase marketing campaigns, whereas organizations evade both expenses and business expansion. In order to solve such riddle, data mining techniques is used as an uttermost factor in data analysis, data summarizations, hidden pattern discovery, and data interpretation. In this paper, rough set theory and decision tree mining techniques have been implemented, using a real marketing data obtained from Portuguese marketing campaign related to bank deposit subscription [Moro et al., 2011]. The paper aims to improve the efficiency of the marketing campaigns and helping the decision makers by reducing the number of features, that describes the dataset and spotting on the most significant ones, and predict the deposit customer retention criteria based on potential predictive rules.
Pub.: 14 Mar '15, Pinned: 10 Nov '17
Abstract: Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into "tear" and "no tear" groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.
Pub.: 16 Apr '14, Pinned: 10 Nov '17
Abstract: Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality.The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods.The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes.The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM+PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM+PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction.This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM+PSO provide more accurate results than support vector machine and decision tree.
Pub.: 06 Jun '14, Pinned: 10 Nov '17
Abstract: The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.
Pub.: 02 Nov '10, Pinned: 10 Nov '17
Abstract: The purpose of this study was to compare the performance of logistic regression, artificial neural networks (ANNs) and decision tree models for predicting diabetes or prediabetes using common risk factors. Participants came from two communities in Guangzhou, China; 735 patients confirmed to have diabetes or prediabetes and 752 normal controls were recruited. A standard questionnaire was administered to obtain information on demographic characteristics, family diabetes history, anthropometric measurements and lifestyle risk factors. Then we developed three predictive models using 12 input variables and one output variable from the questionnaire information; we evaluated the three models in terms of their accuracy, sensitivity and specificity. The logistic regression model achieved a classification accuracy of 76.13% with a sensitivity of 79.59% and a specificity of 72.74%. The ANN model reached a classification accuracy of 73.23% with a sensitivity of 82.18% and a specificity of 64.49%; and the decision tree (C5.0) achieved a classification accuracy of 77.87% with a sensitivity of 80.68% and specificity of 75.13%. The decision tree model (C5.0) had the best classification accuracy, followed by the logistic regression model, and the ANN gave the lowest accuracy.
Pub.: 26 Jan '13, Pinned: 10 Nov '17
Abstract: The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database.For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures.We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status.In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes.
Pub.: 03 Aug '14, Pinned: 10 Nov '17