Data mining is the process of discovering patterns and extracting information from large amounts of data and transforming the information into a comprehensible structure for further use.
with an overall goal of (with intelligent methods). The term "data mining" is a misnomer because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining or scraping) of data itself.
Discovery in databases (KDD) process is commonly defined with the stages:
- Selection, the process of determining the appropriate data type and source and suitable instruments to collect data.
- Pre-processing: making the data easier to work with or more informative. For example:
- “Cleaning” the data (for example completing missing values, or filtering not useable data), and
- “Transformation” of the data (for example normalizing the values, discretizing, or creating new attributes from the existing data)
- “Reduction” for the amount of data by aggregating information or other methods.
- Data mining. The phase of extracting the patterns and information from the data
- Interpretation/evaluation. Like in Machine Learning, The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. This will be done usually by a “test-set”, a data set on which the expected output is known, and can be compared with the results of the data mining process.
Data mining is widely used to extract valuable information and insights from large and complex data sets, such as customer data, financial data, and healthcare data. Some of the key benefits of data mining include improved decision making, cost savings, and the ability to identify new business opportunities.