Introduction to Web Mining

Introduction to Web Mining
Web Mining

Web Mining refers to application of data mining techniques to web data. It helps in solving the problem of how users are using the web sites.

The process involves mining logs or analysis of the logs to get meaningful data from them. It is the process of discovering the useful and previously unknown information from the web data.

Most Viewed Web Mining Lectures - KDnuggets

Web Data is :

  • Web Content – text, image, records, etc.
  • Structure – hyperlinks, tags, etc.
  • Web Usage – http logs, app server logs, etc.

How Web Mining is different from classical Data Mining ?

  • It is similar to data mining.
  • It differs in data collection.
  • Data Mining – The collection of data is already done and stored in a data warehouse.
  • Web Mining – Data collection is done by crawling through a number of target web pages.
Web Mining - Assignment Point

Benefits of Web Data Mining :

  • Match your available resources to visitor interests.
  • Increase the value of each visitor.
  • Improve the visitor’s experience at the website.
  • Perform targeted resource management.
  • Collect information in new ways.
  • Test the relevance of content and web site architecture.

Web Content Mining –

This process is to discover useful information from the content of a web page.

  • Text
  • Image
  • Audio
  • Video
  • Web Content Mining is also known as Web Text Mining.
  • Web Content Mining uses the following techniques.
  • Natural Language Processing (NLP).
  • Information Retrieval.

Text Mining –

What Is Text Mining? - The Complete Beginner's Guide

Text Mining is a process of deriving high quality information from text. It is an interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics, and computational linguistics.

As most information (Over 80%) is currently store as text, text mining is believe to have a high commercial potential value. Using Statistical Pattern learning, high quality of information is derive.

-> Text Mining involves the process of :

  • Structuring the input text by Parsing.
  • Addition of some derived linguistic features and the removal of others.
  • Subsequent insertion into a database.
  • Deriving patterns within the structured data.
  • Evaluation and interpretation of the output.

-> Typical text mining tasks include :

  • Text Categorization.
  • Text Clustering.
  • Concept/Entity Extraction.
  • Document Summarization.
  • Entity Relational Modelling. (i.e. Learning relations between named entities).

Web Usage Mining –

It is the type of activity which predicts about which pages are likely to be visited in near future based on the active user’s behaviour. Such pages can be pre-fetched to reduce access times.

The usage data records the user’s behaviour when the user browses or makes transactions on the web sites in order to better understand and serve the needs of users.

The first web analysis tools simply provides mechanisms to report the user activity as recorded in the servers. Using such tools, it was possible to determine such information as :

  • Number of accesses to the server.
  • The times or time intervals of visits.
  • The domain names and the URLs of users of the web servers.

It is useful in finding interesting trends and patterns which can provide important knowledge about the users of a system. Several Machine learning and data mining techniques are clustering and can be apply in several fields.

Web usage data provides an extremely useful way to learn user’s interest. Web logs may be use in web usage mining to help and identify user who have access similar web pages. These patterns may be useful in Web Searching and Filtering.