Implementing Vertices Principal Component Analysis under Various Weighting Schemes for Interval Valued Observations with Applications to Data Mining
DOI:
https://doi.org/10.3329/dujs.v72i1.71184Keywords:
Data Mining, Interval Valued Data, Principal Component Analysis, Vertices Principal Component Analysis, K-Nearest Neighbor, Distance MatrixAbstract
Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate.
Dhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)
Downloads
40
47