Handling Duplicate Data in Data Warehouse for Data Mining
In data warehouse, data is integrated or collected from multiple sources. While integrating data from multiple sources, the amount of the data increases and as well as data is duplicated. Data warehouse may have terabyte of data for the mining process. The preprocessing of data is the initial and often crucial step of the data mining process.