How does data mining works: Data mining engine is essential part of data mining system that consist several functional modules like association, correlation analysis, luster analysis, knowledge discovery, characterization, evolution analysis and many more.
Another term related to mining is data warehouse that is constructed by integrating the multiple data from heterogeneous sources of data. It supports various functionalities such as analysis of reporting, structured queries, and decision making patterns. Various characteristics that support warehouse to manage decision making process are as follows:
How does data mining works
a) Subject oriented: it provides information around a particular matter or subject rather than organization operation that why it is called subject oriented. The subjects can be products, suppliers, users, sales and so on.
b) Integrated: the data is collected from various heterogeneous sources and integrated to form one such as relational databases, flat files and many more and enhances the effectiveness of data.
c) Time variant: the data is always identified by the particular time it is saved in data warehouse that why it’s called time variant that provides information from historical point.
d) Non volatile: the main characteristics of data warehouse are non volatile means previous data remains while adding new data. The warehouse is always kept aside from operational data base while making frequent changes.
To perform integration of heterogeneous databases, following approaches are followed that involves data cleaning, integration and consolidations.
a) Query driven: to build wrappers and integrators also called mediators on heterogeneous databases, the traditional approach called query driven is followed.
b) Update driven: in this approach the data from multiple heterogeneous databases are fetched and stored in warehouse for applying direct query and analysis.
Integration from OLAP to OLAM:
OLAP (online analytical processing) formerly called data warehousing integrates with OLAM (online analytical mining) formally called data mining for mining knowledge from multidimensional data base sources.
Various data mining tools are integrated to analyze consistent and cleaned and preprocessed data. Such preprocessing is done on high quality data from OLAP and OLAM as well. To perform transformation of multiple heterogeneous data, web access and reporting facilities OLAP tools are used. To perform data mining various exploratory data analysis tools are required that provides subset of data and provide different levels of data abstraction. With OLAP all data mining functions provide users with flexible data mining functions and swap the data dynamically.
Knowledge discovery process : its an essential step while performing data mining and involves following steps such as data integration, data cleaning, transformation, mining, and pattern evaluation and many more.
a) User interface: this module provide the user interaction between users and data mining system. It provides information to search the intermediate mining results and browse database as well as data warehouse structures.
b) Data integration: it is the data preprocessing method that integrates the data from multiple sources into a data store . it involves inconsistent data and data cleaning needs to be performed on it.
c) Data cleaning: this method is used to remove the noisy data by performing transformations to correct the incorrect data while preparing data for data warehouse.
d) Data selection: it is the process to retrieve the relevant data from database. During data selection process data transformation and consolidation are performed. Common data mining tasks are as follows, such as deviation detection, regression, pattern discovery, association rule, classification.
e) Clustering: it forms several clusters to a group of objects that are similar to each other.
f) Data transformation : data if transformed into several forms that are strictly appropriate for data mining by performing aggregation operations.
g) Pattern evaluation: data patterns are evaluated for data mining process. If the pattern evaluation is not useful, then the process might start again from previous steps.
h) Knowledge presentation: knowledge is represented for users in a simple and easy to understand manner.
i) Selection of mining algorithm: the model and parameters are decided for the method to look up for patterns from data. Popular methods for data mining are decision trees, rules, learning models, and many more.
Knowledge discovery process incorporates multidisciplinary tasks. This incorporates storage, access, scaling methods, sets and interpreting results. Artificial intelligence also requires KDD by using empirical laws from observations. There are various steps that are involved in Knowledge discovery process are as follows.
a) Identify the primary goals from customer’s point of view.
b) Explore application domains and required knowledge.
c) Examine target data or subset of samples while performing discovery process.
d) By removing unwanted variables simplify the data sets.
e) Match the prior KDD goals along with mining methods.
f) Select mining algorithms to search hidden patterns.
g) Search the patterns that include classification rules, trees, clustering.
h) Examine knowledge from mined patterns.’
i) Make appropriate reports.
Data mining primitives are allowed to communicate in an interactive manner with data warehouse. The tasks associated are that are relevant to be mined with data and knowledge, representation for visualizing the patterns. The set of tasks that are relevant to database are database attributes and data warehouse dimensions. In mining, there is a kind of knowledge to be mined with the following functions such as discrimination, characterization, prediction, classification, clustering, and correlation analysis and so on. It also includes representation of patterns such as rules, charts, tables, graphs, cubes and many more.