Skip to content
Related Articles

Related Articles

FaceMask Detection using TensorFlow in Python

Improve Article
Save Article
  • Difficulty Level : Expert
  • Last Updated : 11 Jul, 2022
Improve Article
Save Article

In this article, we’ll discuss our two-phase COVID-19 face mask detector, detailing how our computer vision/deep learning pipeline will be implemented.

We’ll use this Python script to train a face mask detector and review the results. Given the trained COVID-19 face mask detector, we’ll proceed to implement two more additional Python scripts used to:

  • Detect COVID-19 face masks in images
  • Detect face masks in real-time video streams

Flow Chart of FaceMask Detection System


In order to train a custom face mask detector, we need to break our project into two distinct phases, each with its own respective sub-steps (as shown by Figure 1 above):

  • Training: Here we’ll focus on loading our face mask detection dataset from disk, training a model (using Keras/TensorFlow) on this dataset, and then serializing the face mask detector to disk
  • Deployment: Once the face mask detector is trained, we can then move on to loading the mask detector, performing face detection, and then classifying each face as with_mask or without_mask.


We will use these images to build a CNN model using TensorFlow to detect if you are wearing a face mask by using the webcam of your PC. Additionally, you can also use your phone’s camera to do the same!

Stepwise Implementation

Step 1: Data Visualization

In the first step, let us visualize the total number of images in our dataset in both categories. We can see that there are 690 images in the ‘yes’ class and 686 images in the ‘no’ class.

The number of images with facemask labelled ‘yes’: 690 
The number of images with facemask labelled ‘no’: 686

Step 2: Data Augmentation

In the next step, we augment our dataset to include more number of images for our training. In this step of data augmentation, we rotate and flip each of the images in our dataset. We see that, after data augmentation, we have a total of 2751 images with 1380 images in the ‘yes’ class and ‘1371’ images in the ‘no’ class.

Number of examples: 2751 
Percentage of positive examples: 50.163576881134134%, number of pos examples: 1380 
Percentage of negative examples: 49.836423118865866%, number of neg examples: 1371

Step 3: Splitting the data

In this step, we split our data into the training set which will contain the images on which the CNN model will be trained and the test set with the images on which our model will be tested. In this, we take split_size =0.8, which means that 80% of the total images will go to the training set and the remaining 20% of the images will go to the test set.

The number of images with facemask in the training set labelled ‘yes’: 1104
The number of images with facemask in the test set labelled ‘yes’: 276
The number of images without facemask in the training set labelled ‘no’: 1096
The number of images without facemask in the test set labelled ‘no’: 275

After splitting, we see that the desired percentage of images has been distributed to both the training set and the test set as mentioned above.

Step 4: Building the Model

In the next step, we build our Sequential CNN model with various layers such as Conv2D, MaxPooling2D, Flatten, Dropout and Dense. In the last Dense layer, we use the ‘softmax’ function to output a vector that gives the probability of each of the two classes.


model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(100, (3, 3), activation='relu',
                           input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(100, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(2, activation='softmax')
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

Here, we use the ‘adam’ optimizer and ‘binary_crossentropy’ as our loss function as there are only two classes. Additionally, you can even use the MobileNetV2 for better accuracy.


Step 5: Pre-Training the CNN model

After building our model, let us create the ‘train_generator’ and ‘validation_generator’ to fit them to our model in the next step. We see that there are a total of 2200 images in the training set and 551 images in the test set.

Found 2200 images belonging to 2 classes. 
Found 551 images belonging to 2 classes.

Step 6: Training the CNN model

This step is the main step where we fit our images in the training set and the test set to our Sequential model we built using keras library. I have trained the model for 30 epochs (iterations). However, we can train for more number of epochs to attain higher accuracy lest there occurs over-fitting.

history = model.fit_generator(train_generator,
>>Epoch 30/30
220/220 [==============================] – 231s 1s/step – loss: 0.0368 – acc: 0.9886 – val_loss: 0.1072 – val_acc: 0.9619

We see that after the 30th epoch, our model has an accuracy of 98.86% with the training set and an accuracy of 96.19% with the test set. This implies that it is well trained without any over-fitting.

Step 7: Labeling the Information

After building the model, we label two probabilities for our results. [‘0’ as ‘without_mask’ and ‘1’ as ‘with_mask’]. I am also setting the boundary rectangle color using the RGB values.[‘RED’ for ‘without_mask’ and ‘GREEN’ for ‘with_mask]


Step 8: Importing the Face detection Program

After this, we intend to use it to detect if we are wearing a face mask using our PC’s webcam. For this, first, we need to implement face detection. In this, we are using the Haar Feature-based Cascade Classifiers to detect the facial features.


This cascade classifier is designed by OpenCV to detect the frontal face by training thousands of images. The .xml file for the same needs to be downloaded and used in detecting the face. We have uploaded the file to the GitHub repository.

Step 9: Detecting the Faces with and without Masks

In the last step, we use the OpenCV library to run an infinite loop to use our web camera in which we detect the face using the Cascade Classifier. The code webcam = cv2.VideoCapture(0) denotes the usage of webcam.

The model will predict the possibility of each of the two classes ([without_mask, with_mask]). Based on the higher probability, the label will be chosen and displayed around our faces.


# import the necessary packages
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from import VideoStream
import numpy as np
import imutils
import time
import cv2
import os
def detect_and_predict_mask(frame, faceNet, maskNet):
    # grab the dimensions of the frame and 
    # then construct a blob from it
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(frame, 1.0, (224, 224),
                                 (104.0, 177.0, 123.0))
    # pass the blob through the network 
    # and obtain the face detections
    detections = faceNet.forward()
    # initialize our list of faces, their
    # corresponding locations, and the list
    # of predictions from our face mask network
    faces = []
    locs = []
    preds = []
    # loop over the detections
    for i in range(0, detections.shape[2]):
        # extract the confidence (i.e.,
        # probability) associated with
        # the detection
        confidence = detections[0, 0, i, 2]
        # filter out weak detections by 
        # ensuring the confidence is
        # greater than the minimum confidence
        if confidence > 0.5:
            # compute the (x, y)-coordinates
            # of the bounding box for
            # the object
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # ensure the bounding boxes fall 
            # within the dimensions of
            # the frame
            (startX, startY) = (max(0, startX), max(0, startY))
            (endX, endY) = (min(w - 1, endX), min(h - 1, endY))
            # extract the face ROI, convert it
            # from BGR to RGB channel
            # ordering, resize it to 224x224, 
            # and preprocess it
            face = frame[startY:endY, startX:endX]
            face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
            face = cv2.resize(face, (224, 224))
            face = img_to_array(face)
            face = preprocess_input(face)
            # add the face and bounding boxes 
            # to their respective lists
            locs.append((startX, startY, endX, endY))
    # only make a predictions if at least one
    # face was detected
    if len(faces) > 0:
        # for faster inference we'll make 
        # batch predictions on *all*
        # faces at the same time rather 
        # than one-by-one predictions
        # in the above `for` loop
        faces = np.array(faces, dtype="float32")
        preds = maskNet.predict(faces, batch_size=32)
    # return a 2-tuple of the face locations
    # and their corresponding locations
    return (locs, preds)
# load our serialized face detector model from disk
prototxtPath = r"face_detector\deploy.prototxt"
weightsPath = r"face_detector\res10_300x300_ssd_iter_140000.caffemodel"
faceNet = cv2.dnn.readNet(prototxtPath, weightsPath)
# load the face mask detector model from disk
maskNet = load_model("mask_detector.model")
# initialize the video stream
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
# loop over the frames from the video stream
while True:
    # grab the frame from the threaded 
    # video stream and resize it
    # to have a maximum width of 400 pixels
    frame =
    frame = imutils.resize(frame, width=400)
    # detect faces in the frame and 
    # determine if they are wearing a
    # face mask or not
    (locs, preds) = detect_and_predict_mask(frame, faceNet, maskNet)
    # loop over the detected face 
    # locations and their corresponding
    # locations
    for (box, pred) in zip(locs, preds):
        # unpack the bounding box and predictions
        (startX, startY, endX, endY) = box
        (mask, withoutMask) = pred
        # determine the class label and 
        # color we'll use to draw
        # the bounding box and text
        label = "Mask" if mask > withoutMask else "No Mask"
        color = (0, 255, 0) if label == "Mask" else (0, 0, 255)
        # include the probability in the label
        label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
        # display the label and bounding box 
        # rectangle on the output frame
        cv2.putText(frame, label, (startX, startY - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
        cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)
    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF
    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
# do a bit of cleanup



My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!