# What is the difference between ‘transform’ and ‘fit_transform’ in sklearn-Python?

In this article, we will discuss the difference between ‘transform’ and ‘fit_transform’ in sklearn using Python.

In Data science and machine learning the methods like fit(), transform(), and fit_transform() provided by the scikit-learn package are one of the vital tools that are extensively used in data preprocessing and model fitting. The task here is to discuss what is the difference between **fit()**, **transform, and fit_transform() **and how they are implemented using in-built functions that come with this package.

- The
**fit(data)**method is used to compute the mean and std dev for a given feature to be used further for scaling. - The
**transform(data)**method is used to perform scaling using mean and std dev calculated using the .fit() method. - The
**fit_transform()**method does both fits and transform.

All these 3 methods are closely related to each other. Before understanding them in detail, we will have to split the dataset into training and testing datasets in any typical machine learning problem. All the data processing steps performed on the training dataset apply to the testing dataset as well but in a slightly different format. This difference could be understood well when we understand these three methods.

**Required Packages**

pip install scikit-learn

pip install pandas

Let us consider we will have to perform scaling as one of the data processing steps to be performed. To demonstrate this example let us consider an inbuilt iris dataset.

**Example:**

## Python3

`from` `sklearn ` `import` `datasets ` `import` `pandas as pd ` ` ` `iris ` `=` `datasets.load_iris() ` `data ` `=` `pd.DataFrame(iris.get(` `'data'` `), columns` `=` `[ ` ` ` `'sepal length'` `, ` `'petal length'` `, ` `'sepal width'` `, ` `'sepal width'` `]) ` `data.head() ` |

**Output:**

Let us split the data as train and test splits.

## Python3

`from` `sklearn.model_selection ` `import` `train_test_split ` ` ` `X_train, X_test, y_train, y_test ` `=` `train_test_split( ` ` ` `data.iloc[:, :` `-` `1` `], data[` `'sepal width'` `], ` ` ` `test_size` `=` `0.33` `, random_state` `=` `42` `) ` |

Now let us perform a standard scaling on the sepal width column. Scaling in general means converting the column to a common number scale, Standard scaling in particular converts the column of interest by transforming it to a range of numbers with mean = 0 and standard deviation = 1.

## The fit() Method

The fit function computes the formulation to transform the column based on Standard scaling but doesn’t apply the actual transformation. The computation is stored as a fit object. The fit method doesn’t return anything.

**Example:**

## Python3

`from` `sklearn.preprocessing ` `import` `StandardScaler ` ` ` `scaler ` `=` `StandardScaler() ` `scaler.fit(data[` `'sepal width'` `]) ` |

**Output:**

StandardScaler()

**The transform() Method**

The transform method takes advantage of the fit object in the fit() method and applies the actual transformation onto the column. So, fit() and transform() is a **two-step process** that completes the transformation in the second step. Here, Unlike the fit() method the transform method returns the actually transformed array.

**Example:**

## Python3

`scaler.transform(data[` `'sepal width'` `]) ` |

**Output:**

**The fit_transform() Method**

As we discussed in the above section, fit() and transform() is a two-step process, which can be brought down to a one-shot process using the fit_transform method. When the fit_transform method is used, we can compute and apply the transformation in a **single step**.

**Example:**

## Python3

`scaler.fit_transform(X_train) ` |

**Output:**

As we can see, the final output of fit(), transform(), and fit_transform() is going to be the same. Now, we will have to ensure that the same transformation is applied to the test dataset. But, we cannot use the fit() method on the test dataset, because it will be the **wrong approach as it could introduce bias to the testing dataset**. So, let us try to use the transform() method directly on the test dataset.

**Example:**

## Python3

`scaler.transform(X_test) ` |

**Output:**

As we can see, both have different outputs this could be one of the reasons that sklearn has split this kind of data processing step into two.