Introduction to Data Mining

Introduction to Data Mining
Data Mining

Data mining is processing data to identify patterns and establish relationships. Data mining is a new technology, which helps organizations to process data through algorithms to uncover meaningful patterns and correlations from large databases that otherwise may not be possible with standard analysis and reporting.

What is Data Mining? | Definition | Kaspersky
kaspersky.com

Tools of Data Mining can help to understand the business better. Also improve future performance through predictive analytics and make them proactive and allow knowledge driven decisions.

Issues related to information extraction from large databases, data mining field brings together methods from several domains like Machine Learning, Statistics, Pattern Recognition, Databases and Visualization.

Data Mining field finds its application in market analysis and management like, for e.g. Customer relationship management, cross selling, market segmentation. It can also be use in risk analysis and management for forecasting, customer retention, quality control, competitive analysis and credit scoring.

Data Mining is the process of analysing large amounts of data stored in a data warehouse for useful information. It makes use of Artificial Intelligence techniques, Neural Networks, and advanced statistical tools (such as cluster analysis) to reveal trends, patterns and relationships, which otherwise may be undetected.

Data Mining Task Primitives –

Data Mining primitives define a data mining task, which can be specified in the form of a it query.

Data Mining High Res Stock Images | Shutterstock
shutterstock.com

(1) Task relevant data :

  • Specify the data on which the data mining function to be perform.
  • Using relational query, a set of task relevant data can be collected.
  • Before data mining analysis, data can be clean and transformed.
  • Minable view is created, i.e. the set of task relevant data for data mining.

(2) The kind of knowledge to be mined :

  • Specify the knowledge to be mined.
  • Kinds of knowledge include concept description, association, classification prediction and clustering.
  • User can also provide pattern templates. Also can say as metapatterns or metarules or metaqueries.

(3) Background Knowledge :

  • It is the information about the domain to be mined.
  • Concept hierarchies is the form of background knowledge which helps to discover the knowledge at multiple levels of abstraction.

(4) Interestingness measures :

  • It is use to confine the number of uninteresting patterns returned by the process.
  • Based on the structure of patterns and statistics underlying them.
  • Each measure is associate with a threshold which can be control by the user.
  • Patterns not meeting the threshold are not present to the user.

Data Mining’s Techniques –

Data Mining Techniques - 6 Crucial Techniques in Data Mining - DataFlair
data-flair.training

(1) Statistics –

Statistics is a discipline of science, which uses mathematical analysis to quantify representations, model and summarize empirical data or real world observations. Statistical analysis involves collection of methods and applying them on large amounts of data to draw conclusions and report the trend.

(2) Machine Learning –

Machine Learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programme. When new data is expose, computer programs can teach themselves to grow or change due to machine learning.

(3) Database Systems and Data Warehouses –

Database : Databases are use to record the data and also be use for data warehousing. Online Transaction Processing (OLTP) uses databases for day to day transaction purpose. To remove the redundant data and save the storage space, data is normalize and stored in the form of tables.

Data Warehouse : Data Warehouses are use to store historical data which helps to take strategic decision for business. It is use for Online Analytical Processing (OLAP) which helps to analyze the data. Data-modelling techniques like star schema are use for the Data Warehouse Design.

(4) Decision Support System :

Decision Support System is a category of information systems, which helps in decision making for business and organisations. It is an interactive software based systems which helps decision makers to extract useful information from raw data and documents.