Classification is a data-mining technique that assigns categories to a collection of data to aid in more accurate predictions and analysis. Classification is one of several methods intended to make the analysis of very large datasets effective.

Why Classification?

Very large databases are becoming the norm in today’s world of big data. Imagine a database with terabytes of data—a terabyte is one trillion bytes of data. Facebook alone crunches 600 terabytes of new data every single day (as of 2014, the last time it reported these specs). The primary challenge of big data is how to make sense of it.

And sheer volume is not the only problem: big data also tends to be diverse, unstructured and fast-changing. Consider audio and video data, social media posts, 3D data, or geospatial data. This kind of data is not easily categorized or organized.

To meet this challenge, a range of automatic methods for extracting useful information has been developed, among them classification.

How Classification Works

An analyst’s goal is to create a set of classification rules that answer a question, make a decision, or predict behavior. To start, a set of training data is developed that contains a certain set of attributes as well as the likely outcome. The job of the classification algorithm is to discover how that set of attributes reaches its conclusion.

Consider a credit-card company trying to determine which prospects should receive a credit card offer.

The company’s training data might include:

The predictor columns Age, Gender, and Annual Income determine the value of the “predictor attribute” Credit Card Offer. In a training set, the predictor attribute is known. The classification algorithm then tries to determine how the value of the predictor attribute was reached: what relationships exist between the predictors and the decision? It will develop a set of prediction rules, usually an IF/THEN statement.

Obviously, this is a simple example, and the algorithm would need a far larger data sampling than the two records shown here. Further, the prediction rules are likely to be far more complex, including sub-rules to capture attribute details.

Next, the algorithm is given a “prediction set” of data to analyze, but this set lacks the prediction attribute (or decision):

This predictor data helps estimate the accuracy of the prediction rules, and the rules are then tweaked until the developer considers the predictions effective and useful.

Day to Day Examples of Classification

Classification and other data-mining techniques are behind much of our day-to-day experience as consumers. Weather predictions use of classification techniques to report whether the day will be rainy, sunny, or cloudy. The medical profession analyzes health conditions to predict likely medical outcomes. A type of classification method, Naive Bayesian, uses conditional probability to categorize spam emails.

Get the Latest Tech News Delivered Every Day

  • What Is Data Mining?

  • What Is a File Attribute?

  • Defining the Regression Statistical Model

  • What Is Quantum Computing?

  • How to Create a Report in Excel

  • An Overview of the Nagle Algorithm for TCP Network Communication

  • What Are Biometrics?

  • WD My Passport SSD Review

  • How to Use the COUNTIFS Function in Excel

  • Mobile Technology: AI in Phones

  • What Is Mewe and How Is It Different?

  • What Is Bayesian Spam Filtering?

  • What Is K-Means Clustering?

  • Using GPS Technology With Your Personal Computer

  • Kaspersky Total Security Review

  • Power Pivot For Excel: What It Is and How to Use It

  • Facebook

  • Twitter

Hit Refresh on Your Tech News

  • About Us

  • Privacy Policy

  • Editorial Guidelines

  • Terms of Use

  • Careers

  • Advertise

  • Contact

  • EU Privacy

  • NEWS

  • HOW TO

  • FEATURES

  • ABOUT US