Real-time Object Detection with TensorFlow, YOLOv2 – Part I (with Python codes)

Data Science Jun 07, 2019

YOU ONLY LOOK ONCE(Real-Time Object detection, YOLO)

This deep learning technique is used in self-driving cars nowadays

This tutorial covers real-time object detection Deep Learning Model(using YOLO) in google colab with TensorFlow on a custom dataset. (We will do all our work completely inside google colab it is much faster than own machine, and training YOLO is resource-intensive task)

YOLO is an extremely fast real-time object detection algorithm, this algorithm can detect multiple objects at the same time in a given in image. YOLO stands for “You Only Look Once”. You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Titan X, it processes images at 40-90 FPS(Frames Per Second) and has a mAP on VOC 2007 of 78.6% and a mAP of 48.1% on COCO test-dev.

How It Works

Prior object detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales, i.e to locate the object in the given image it has to go through whole image multiple times. High scoring regions of the image are considered detections.

YOLO uses a totally different approach. YOLO applies a single neural network to the full image. This neural network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities(Confidence Score).

YOLO model has several advantages over classifier-based object detection systems. It looks at the whole image at test time predictions made are informed by global context in the image. It also makes predictions with a single network evaluation, that makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.

If you want to learn more about architecture and mathematics involved in YOLO please read the original paper.

Watch this interesting video about Real-Time object detection it will motivate you to explore this topic even more.

How computer learn to recognize object instantly

Original paper (CVPR 2016. OpenCV People’s Choice Award):


Cool things about YOLO:

  • Speed , detects object 45 Frame Per Second which is better than real time.
  • Network understands generalized object representation i.e we can train our network on real world images and get prediction on artworks or Computer grapics generated images,and vice versa.
  • faster version (with smaller architecture,sometimes known as Tiny YOLO) — detects 155 frames per sec but is less accurate than original YOLO.
  • And above all YOLO is open source

Although there are a lot of pre-trained models on the internet on various datasets,

In this tutorial, We will train our own model, and detect objects that we are interested in. We will train our model in the cloud using GOOGLE COLAB, you only need a browser and working internet connection.


  1. Preparing Dataset
  2. Uploading everything to Google Drive
  3. Setting up environment in GOOGLE COLAB
  4. Training model
  5. Doing Prediction on Images and Video
  6. Saving models weight and Configuration file for future use:

In this part of the blog, we will cover the first three steps and other steps are covered in part2:

Step 1: Preparing Dataset

This is a crucial step and performance of your model depend on the quality of data you collected.

a) Search for images and videos related to the problem your problem domain

b) Be aware of what problem you are going to solve.

  • For example in my case monkeys can be anywhere in the frame. They are of different shape colour and orientation.
  • So i collected almos almost 5000 images from internet to train my model
  • So you can also search and scrap internet as per your requirement

If You have video and want to take a screenshot then use ffmpeg

$ sudo apt install ffmpeg
$ ffmpeg -i input.mp4 output.avi

To read more about what you can do with ffmpeg(changing frame number and times) go to this link

Pro Tip:

$ ffmpeg -i monkey2.mp4 -vf fps=1/3 thumb%04d.jpg -hide_banner

use this command to change your Frame per second (fps)

Making annotations by drawing rectangles:

We will use labelImg  for this task:

(make sure you have pyqt installed)

$ git clone

$ cd labelImg/

$ make all

$ ./ 

To start again in future

$ cd labelImg/

$ python3

Make two folders in your directory:

images and annotations

Folders for YOLO dataset

make boxes around the objects that you want to detect:

make sure to change 'change save dir' to the location of annotations folder.

Congrats now your dataset is ready to be uploaded in Google Drive

Step 2: Uploading everything to Google Drive

What I mean by everything

a)you Images and annotations file

b)Weight file of YOLO

We are using yolov2 because it is much faster

go to the link and download the weight file from there.

Download weight file of YOLOV2 544*544

Upload everything to your drive:

(you have to upload  three files named as images annotations and yolov2-voc.weights)

Congratulations You are now ready to set up your environment in the cloud

Step 3: Setting up an environment in GOOGLE COLAB:

(This part will be different for your problem domain so read it carefully)

  1. Fork my github repository so that you can make changes according to your requirements.
  2. file structure of the repository is:
Your's might be slightly difference

3. Our main configuration file is :

yolov2-voc.cfg and



You have to edit yolov2-voc-1c.cfg  and labels.txt if you are training for more numbers of classes

(Do not do anything with yolov2-voc.cfg here is why )

A) Changing labels.txt

Change the values you want to give 

(Make sure your labels are same as you did during making annotations file using labelImg )

B) Changing yolov2-voc-1c.cfg:

I am only training on one class so, I named it as yolov2-voc-1c.cfg  . Suppose you are training for 4 objects to detect renamed it as yolov2-voc-4c.cfg, This is a general convention followed in official implementation.

Make changes in line 244 and 237

rules to change config files:

at line 244 change number of class

at line 237 change number of filters:

filters = (classes+5)*5

i.e  for four class it will be equal to (5+4)*5=45

now everything is done related to configuration now it is time to install packages on colab


! git clone

#clone your own repository
cd ./darkflow/
! python3 build_ext --inplace
you will see results like this
! pip install -e .
! pip install .
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

from import TFNet
import cv2
you'll see results like this

Import your drive into Google colab:

## Start by connecting gdrive into the google colab
from google.colab import drive
your present working directory should look like this

All right then we did great work till now.

  1. Dataset preparation
  2. Moving everything to Google Drive
  3. Setting up an environment in COLAB

Now comes the most exciting part of this project

let's go

Training and prediction are covered in Part2 of this blog.


Learn Face Detection Step by Step With Code In tensorflow.

Face Detection Using Faceboxes.

Deep Learning Interview Questions you can learn in 30 seconds or less.

Result that we'll achieve at the end.


For any query feel free to comment Down

Please Like👍 and share your valuable feedback and suggestions.

sheetala tiwari

I am passionate about Data Science and Machine Learning. I am currently building an AI community on DataDiscuss and we are committed to providing free access to education for everyone.