IJE TRANSACTIONS B: Applications Vol. 30, No. 11 (November 2017) 1548-1557    Article in Press

PDF URL: http://www.ije.ir/Vol30/No11/B/27.pdf  
downloaded Downloaded: 0   viewed Viewed: 54

S. Kumar and G Sahoo
( Received: April 06, 2017 – Accepted: September 08, 2017 )

Abstract    Machine learning based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slow-er learning rate and higher computational cost. Feature selection is expected to deal with the high dimensionality of data-sets in terms of reduced feature set. Feature selection improves the performance of classification accuracy particularly performing with less number of features in decision making process. In this paper Random Forest (RF) is employed for the diagnosis of cardiovascular disease. The first phase of proposed system aims at constructing various feature selection algorithm such as Principal Component Analysis (PCA), Relief- F, Sequential Forward Floating Search (SFFS), Sequential Backward Floating Search (SBFS) and Genetic Algorithm (GA) for reducing the dimension of cardiovascular disease datasets. The second phase switched to model construction based on RF algorithm for cardiovascular disease classification. The obtained outcome shows that the combination with GA and RF delivered the highest classification accuracy of 93.2% by help of six features.


Keywords    Random Forest, Genetic Algorithm, Feature Selection, Cardiovascular Disease


References    [1]Koh, H.C. and Tan, G.: Data mining applications in healthcare. Journal of healthcare information management, 19(2), p.65 (2011) [2]Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P.: From data mining to knowledge discovery in databases. AI magazine, 17(3), p.37 (1966) [3]Dietterich, T.G.: Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15).Springer Berlin Heidelberg (2000) [4]Van Der Maaten, L., Postma, E. and Van den Herik, J.: Dimensionality reduction: a comparative. J Mach Learn Res, 10, pp.66-71(2009) [5]Guyon, I. and Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research, pp.1157-1182(2003) [6]Alba, E., Garcia-Nieto, J., Jourdan, L. and Talbi, E.G.: Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In Evolutionary Computation, 2007. IEEE Congress on (pp. 284-290) (2007) [7]World Health Organization. Prevention of cardiovascular disease. World Health Organization (2007) [8]Mendis, S., Puska, P. and Norrving, B.: Global atlas on cardiovascular disease prevention and control. World Health Organization (2011) [9]Finks, S.W., Airee, A., Chow, S.L., Macaulay, T.E., Moranville, M.P., Rogers, K.C. and Trujillo, T.C.: Key articles of dietary interventions that influence cardiovascular mortality. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, 32(4), pp.e54-e87(2012) [10]Enriko, I.K.A., Suryanegara, M. and Gunawan, D.:Heart Disease Prediction System using k-Nearest Neighbor Algorithm with Simplified Patient's Health Parameters. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 8(12), pp.59--65(2016) [11]Polat, K., Şahan, S. and Güneş, S.: Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbor) based weighting preprocessing. Expert Systems with Applications. 32(2), pp.625--631(2007) [12]Shouman, M., Turner, T. and Stocker, R.: Using decision tree for diagnosing heart disease patients. In proceedings of the Ninth Australasian Data Mining Conference-Volume 121 (pp. 23-30), Australian Computer Society, Inc(2011) [13]Kahramanli, H. and Allahverdi, N.: Design of a hybrid system for the diabetes and heart diseases.Expert Systems with Applications. 35(1), pp.82-89(2008) [14]Das, R., Turkoglu, I. and Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles.Expert systems with applications 36(4), pp.7675-7680(2009) [15]Nguyen, T., Khosravi, A., Creighton, D. and Nahavandi, S.: Classification of healthcare data using genetic fuzzy logic system and wavelets. Expert Systems with Applications. 42(4), pp.2184-2197(2015) [16]Bashir, S., Qamar, U. and Khan, F.H.: BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting. Australasian Physical & Engineering Sciences in Medicine, 38(2), pp.305--323(2015) [17]Long, N.C., Meesad, P. and Unger, H.: A highly accurate firefly based algorithm for heart disease prediction. Expert Systems with Applications, 42(21), pp.8221-8231(2015) [18]Santhanam, T. and Ephzibah, E.P.:Heart disease prediction using hybrid genetic fuzzy model. Indian Journal of Science and Technology, 8(9), p.797 (2015) [19]Abdar, M., Kalhori, S.R.N., Sutikno, T., Subroto, I.M.I. and Arji, G.: Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer Engineering (IJECE), 5(6), pp.1569-1576(2015) [20]Liu, X., Wang, X., Su, Q., Zhang, M., Zhu, Y., Wang, Q. and Wang, Q.: A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method. Computational and Mathematical Methods in Medicine, (2017) [21]Shilaskar, S. and Ghatol, A.: Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Expert Systems with Applications, 40(10), pp.4146-4153(2013) [22]Konda, S., Balmuri, K.R., Basireddy, R.R. and Mogili, R.: Hybrid Approach for Prediction of Cardiovascular Disease Using Class Association Rules and MLP. International Journal of Electrical and Computer Engineering, 6(4), p.1800 (2016)[23]Rajalakshmi, K. and Nirmala, K.: Heart Disease Prediction with MapReduce by using Weighted Association Classifier and K-Means. Indian Journal of Science and Technology, 9(19) (2016) [24]Holland, J.H.: Genetic algorithms.Scientific american. 267(1), pp.66-72(1992) [25]Shimada, K., Hirasawa, K. and Hu, J.: Class association rule mining with chi-squared test using genetic network programming. In IEEE International Conference on Systems, Man and Cybernetics. Vol. 6, pp.5338-5344(2006) [26]Jabbar, M.A., Deekshatulu, B.L. and Chandra, P.: An Evolutionary algorithm for Heart Disease Prediction. Wireless Networks and Computational Intelligence, Springer Berlin Heidelberg. Pp.378-389(2012) [27]Kelly Jr, J.D. and Davis, L.:A Hybrid Genetic Algorithm for Classification. In IJCAI, Vol. 91, pp. 645-650(1991) [28]Huang, C.L. and Wang, C.J.: A GA-based feature selection and parameters optimization for support vector machines. Expert Systems with applications. 31(2), pp.231-240(2006) [29]Yang, J. and Honavar, V.:Feature subset selection using a genetic algorithm. In Feature extraction, construction and selection (pp. 117-136). Springer US (1998) [30]Anbarasi, M., Anupriya, E. and Iyengar, N.C.S.N.: Enhanced prediction of heart disease with feature subset selection using genetic algorithm. International Journal of Engineering Science and Technology, 2(10), pp.5370--5376(2010) [31]Azar, A.T., Elshazly, H.I., Hassanien, A.E. and Elkorany, A.M.: A random forest classifier for lymph diseases. Computer methods and programs in biomedicine, 113(2), pp.465-473(2014) [32]Goldberg, D.E. and Holland, J.H.: Genetic algorithms and machine learning. Machine learning.3 (2), pp.95-99(1988) [33]Elsayed, S.M., Sarker, R.A. and Essam, D.L.: A new genetic algorithm for solving optimization problems. Engineering Applications of Artificial Intelligence. 27, pp.57-69(2014) [34]Houck, C.R., Joines, J. and Kay, M.G.: A genetic algorithm for function optimization: a Matlab implementation. CSU-IE TR, 95(09) (1995) [35]Ho, T.K.: Random decision forests. In Document Analysis and Recognition, 1995. Proceedings of the Third International Conference on (Vol. 1, pp. 278-282) [36]Amit, Y. and Geman, D.: Shape quantization and recognition with randomized trees. Neural computation, 9(7), pp.1545-1588(1997) [37]L. Breiman, Random forests, Machine Learning 45 (1) 5–32(2001) [38]Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science, University California Irvine (1998) [39]Powers, D.M.:Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation(2011) [40]T.N. Yang, S.D. Wang.: Robust algorithms for principal component analysis. Pattern Recognit. Lett. 20, 927–933(1999) [41]K. Kira, L.A. Rendell.: A practical approach to feature selection. in: Proceedings of the Ninth International Workshop on Machine Learning, Aberdeen, Scotland, UnitedKingdom, pp. 249–256(1992) [42]Pudil, P., Novovičová, J. and Kittler, J.: Floating search methods in feature selection. Pattern recognition letters, 15(11), pp.1119-1125(1994) [43]Jain, A. and Zongker, D. Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2), pp.153--158(1997) [44]Kittler, J.: Feature selection and extraction. Handbook of pattern recognition and image processing, pp.59--83(1986) [45]Donner, A., Shoukri, M.M., Klar, N. and Bartfay, E.: Testing the equality of two dependent kappa statistics. Statistics in Medicine, 19(3), pp.373--387(2000) [46]Wood, J.M.: Understanding and Computing Cohen's Kappa: A Tutorial. WebPsychEmpiricist. Web Journal at http://wpe. info/(2007)  

Download PDF 

International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir