DATA MINING KDD PROCESS
KDD Knowledge Delivery in Data Base
THE KDD PROCESS
The knowledge discovery process, as illustrated in the provided figure, is both iterative and interactive, encompassing nine distinct steps. The iterative nature of the process means that revisiting previous stages may be necessary. Given its complex and creative aspects, there is no single formula or comprehensive scientific classification for making the correct decisions at each step or for every type of application. Therefore, a thorough understanding of the process, along with the specific requirements and possibilities at each stage, is essential.
The knowledge discovery process, as depicted in the provided figure, is characterized by its iterative and interactive nature, consisting of nine distinct steps. This iterative process often requires revisiting earlier stages. Due to its complexity and creative demands, there is no one-size-fits-all formula or universal scientific classification for making the right decisions at each step or for every application type. Therefore, it is crucial to have a deep understanding of the process and to recognize the specific requirements and opportunities at each stage.
1. Building up an understanding of the application domain:
This is the initial preliminary step, setting the stage for determining the necessary actions such as data transformation, algorithm selection, and representation. Those overseeing a KDD project must understand and define the end-user’s objectives and the context in which the knowledge discovery process will take place, including relevant prior knowledge
2. Choosing and creating a data set on which discovery will be performed:
Once the objectives are defined, the next step is to determine the data that will be used in the knowledge discovery process. This involves identifying the available data, acquiring relevant data, and integrating it into a cohesive dataset for analysis. This integration is crucial because Data Mining relies on the available data to learn and discover insights. The quality of the models built depends on the completeness of the data; missing significant attributes can undermine the entire study. However, managing and processing extensive data repositories can be costly and complex. The process involves an iterative and interactive approach, starting with the best available datasets and gradually expanding them while assessing their impact on knowledge discovery and modeling.
3. Preprocessing and cleansing:
4. Data Transformation:
In this stage, the data is prepared and developed for Data Mining. This involves techniques such as dimension reduction (e.g., feature selection and extraction) and record sampling, as well as attribute transformation (e.g., discretizing numerical attributes and applying functional transformations). This step is critical to the success of the KDD project and is often tailored to the specific needs of the project. For instance, in medical assessments, the combination of attributes may be more important than any single attribute on its own. In business contexts, factors beyond our control, such as the effects of advertising campaigns, may need to be considered. If the initial transformations are not appropriate, they may lead to unexpected results, necessitating adjustments in subsequent iterations. Thus, the KDD process is iterative, with each stage informing the next and refining the understanding of necessary transformations.
5. Prediction and description
7. Utilizing the Data Mining algorithm:
Finally, the implementation of the Data Mining algorithm takes place. This stage often requires running the algorithm multiple times to achieve satisfactory results. For example, adjustments may be made to the algorithm's control parameters, such as the minimum number of instances allowed in a single leaf of a decision tree, until the desired outcome is achieved.
8. Evaluation:
In this step, we evaluate and interpret the mined patterns and rules to ensure they align with the objectives defined in the initial stage. This involves examining the impact of preprocessing steps on the results of the Data Mining algorithm. For instance, if a feature added in step 4 affects the outcome, we may need to revisit and refine earlier steps. The focus here is on the comprehensibility and utility of the resulting model. Addition/ally, the discovered knowledge is documented for future reference. The final step involves applying the findings, gathering overall feedback, and assessing the results obtained from the Data Mining process.