# Getting started with Classification

As the name suggests, Classification is the task of “classifying things” into sub-categories. But, by a machine! If that doesn’t sound like much, imagine your computer being able to differentiate between you and a stranger. Between a potato and a tomato. Between an A grade and an F. Now, it sounds interesting now. In Machine Learning and Statistics, Classification is the problem of identifying to which of a set of categories (subpopulations), a new observation belongs, on the basis of a training set of data containing observations and whose categories membership is known.

Classification is a machine learning task that involves assigning a class label to a given input based on a set of training data. The goal of classification is to build a model that can accurately predict the class label for new, unseen data.

### Here are some steps to get started with classification:

- Understanding the problem: Before getting started with classification, it is important to understand the problem you are trying to solve. What are the class labels you are trying to predict? What is the relationship between the input data and the class labels?
- Data preparation: Once you have a good understanding of the problem, the next step is to prepare your data. This includes collecting and preprocessing the data, and splitting it into training, validation, and test sets.
- Selecting a model: There are many different models that can be used for classification, including decision trees, random forests, k-nearest neighbors, and support vector machines. It is important to select a model that is appropriate for your problem, taking into account the size and complexity of your data, and the computational resources you have available.
- Training the model: Once you have selected a model, the next step is to train it on your training data. This involves adjusting the parameters of the model to minimize the error between the predicted class labels and the actual class labels for the training data.
- Evaluating the model: After training the model, it is important to evaluate its performance on a validation set. This will give you a good idea of how well the model is likely to perform on new, unseen data.
- Fine-tuning the model: If the model’s performance is not satisfactory, you can fine-tune it by adjusting the parameters, or trying a different model.
- Deploying the model: Finally, once you are satisfied with the performance of the model, you can deploy it to make predictions on new data.
- These are the basic steps to get started with classification. As you gain more experience, you may want to explore more advanced techniques, such as ensemble methods, deep learning, and transfer learning.

**Types of Classification**

Classification is of two types:

**Binary Classification**: When we have to categorize given data into 2 distinct classes. Example – On the basis of given health conditions of a person, we have to determine whether the person has a certain disease or not.**Multiclass Classification**: The number of classes is more than 2. For Example – On the basis of data about different species of flowers, we have to determine which specie our observation belongs.

Fig: Binary and Multiclass Classification. Here x1 and x2 are the variables upon which the class is predicted.

**How does classification works?**

Suppose we have to predict whether a given patient has a certain disease or not, on the basis of 3 variables, called features.

This means there are two possible outcomes:

- The patient has the said disease. Basically, a result labeled “Yes” or “True”.
- The patient is disease-free. A result labeled “No” or “False”.

*This is a binary classification problem.*

We have a set of observations called the training data set, which comprises sample data with actual classification results. We train a model, called Classifier on this data set, and use that model to predict whether a certain patient will have the disease or not.

The outcome, thus now depends upon :

- How well these features are able to “map” to the outcome.
- The quality of our data set. By quality, I refer to statistical and Mathematical qualities.
- How well our Classifier generalizes this relationship between the features and the outcome.
- The values of the x1 and x2.

Following is the generalized block diagram of the classification task.

** Generalized Classification Block Diagram.**

- X: pre-classified data, in the form of an N*M matrix. N is the no. of observations and M is the number of features
- y: An N-d vector corresponding to predicted classes for each of the N observations.
- Feature Extraction: Extracting valuable information from input X using a series of transforms.
- ML Model: The “Classifier” we’ll train.
- y’: Labels predicted by the Classifier.
- Quality Metric: Metric used for measuring the performance of the model.
- ML Algorithm: The algorithm that is used to update weights w’, which updates the model and “learns” iteratively.

**Types of Classifiers (algorithms)**

There are various types of classifiers. Some of them are :

- Linear Classifiers: Logistic Regression
- Tree-Based Classifiers: Decision Tree Classifier
- Support Vector Machines
- Artificial Neural Networks
- Bayesian Regression
- Gaussian Naive Bayes Classifiers
- Stochastic Gradient Descent (SGD) Classifier
- Ensemble Methods: Random Forests, AdaBoost, Bagging Classifier, Voting Classifier, ExtraTrees Classifier

A detailed description of these methodologies is beyond an article!

**Practical Applications of Classification**

- Google’s self-driving car uses deep learning-enabled classification techniques which enables it to detect and classify obstacles.
- Spam E-mail filtering is one of the most widespread and well-recognized uses of Classification techniques.
- Detecting Health Problems, Facial Recognition, Speech Recognition, Object Detection, and Sentiment Analysis all use Classification at their core.

**Implementation:**

Let’s get a hands-on experience with how Classification works. We are going to study various Classifiers and see a rather simple analytical comparison of their performance on a well-known, standard data set, the Iris data set.

Requirements for running the given script:

- Python 3.8.10
- Scipy and Numpy
- Pandas for data i/o
- Scikit-learn Provides all the classifiers

**Example**

## Python

`# Python program to perform classification on Iris dataset` `# Run this program on your local Python interpreter` `# provided you have installed the required libraries` `# Importing the required libraries` `import` `numpy as np` `import` `pandas as pd` `from` `sklearn.model_selection ` `import` `train_test_split` `from` `sklearn.metrics ` `import` `accuracy_score` `from` `sklearn ` `import` `datasets` `from` `sklearn ` `import` `svm` `from` `sklearn.tree ` `import` `DecisionTreeClassifier` `from` `sklearn.naive_bayes ` `import` `GaussianNB` `# import the iris dataset` `iris ` `=` `datasets.load_iris()` `X ` `=` `iris.data` `y ` `=` `iris.target` `# splitting X and y into training and testing sets` `X_train, X_test, y_train, y_test ` `=` `train_test_split(` ` ` `X, y, test_size` `=` `0.3` `, random_state` `=` `1` `)` `# GAUSSIAN NAIVE BAYES` `gnb ` `=` `GaussianNB()` `# train the model` `gnb.fit(X_train, y_train)` `# make predictions` `gnb_pred ` `=` `gnb.predict(X_test)` `# print the accuracy` `print` `(` `"Accuracy of Gaussian Naive Bayes: "` `, accuracy_score(y_test, gnb_pred))` `# DECISION TREE CLASSIFIER` `dt ` `=` `DecisionTreeClassifier(random_state` `=` `0` `)` `# train the model` `dt.fit(X_train, y_train)` `# make predictions` `dt_pred ` `=` `dt.predict(X_test)` `# print the accuracy` `print` `(` `"Accuracy of Decision Tree Classifier: "` `, accuracy_score(y_test, dt_pred))` `# SUPPORT VECTOR MACHINE` `svm_clf ` `=` `svm.SVC(kernel` `=` `'linear'` `) ` `# Linear Kernel` `# train the model` `svm_clf.fit(X_train, y_train)` `# make predictions` `svm_clf_pred ` `=` `svm_clf.predict(X_test)` `# print the accuracy` `print` `(` `"Accuracy of Support Vector Machine: "` `,` ` ` `accuracy_score(y_test, svm_clf_pred))` |

Conclusion:Classification is a very vast field of study. Even though it comprises a small part of Machine Learning as a whole, it is one of the most important ones.

That’s all for now. In the next article, we will see how Classification works in practice and get our hands dirty with Python Code.

This article is contributed by **Sarthak Yadav**. If you like GeeksforGeeks and would like to contribute, you can also write an article and mail your article to review-team@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or if you want to share more information about the topic discussed above.

## Please

Loginto comment...