Discretization in data mining

Samundeeswari

 Discretization in data mining

Data discretization is a method used to simplify large sets of data by converting many values into smaller, manageable groups. It involves dividing continuous data into a limited number of intervals while minimizing data loss.

There are two types of data discretization:

  1. Supervised Discretization: Uses class information to guide the process.
  2. Unsupervised Discretization: Does not use class data and instead relies on strategies like splitting data from the top down or merging it from the bottom up.

Some Techniques of Data Discretization

  1. Histogram Analysis
    A histogram is a graph that shows how often different values appear in a continuous dataset. It helps to understand the data's distribution, like identifying outliers, skewness, or checking if it follows a normal distribution.

  2. Binning
    Binning is a technique that groups a large number of continuous values into smaller ranges or categories. It is useful for simplifying data and creating levels or hierarchies for better analysis.

  3. Data Discretization Using Correlation Analysis

    In this method, data is discretized using linear regression to find the best nearby intervals. Larger intervals are then combined to create overlapping sections, resulting in around 20 overlapping intervals. This is a supervised technique, meaning it uses class data to guide the process.

Data Discretization and Binarization in Data Mining


Data discretization is a technique that converts continuous data values into a limited number of intervals, minimizing data loss. On the other hand, data binarization transforms both continuous and discrete attributes into binary values, typically represented as 0 or 1.

Data Discretization and Concept Hierarchy Generation

A hierarchy is an organizational structure where items are ranked based on their level of importance or generality. In concept hierarchy, a set of simpler concepts is mapped to more complex concepts, moving from low-level to high-level ideas.

For example, in a computer system, files are organized in folders within a tree structure, representing a hierarchy.

There are two types of hierarchy:

  1. Top-Down Mapping: Starts with high-level concepts and breaks them down into more specific ones.
  2. Bottom-Up Mapping: Starts with detailed, low-level concepts and groups them into broader categories.


Our website uses cookies to enhance your experience. Learn More
Accept !

GocourseAI

close
send