Real-time Object Detection with TensorFlow, YOLOv2 – Part II (with Python codes)

Data Science Jun 07, 2019

Related: Learn Face Detection Step by Step With Code In tensorflow.

Step 4 : Training the model

This step involves training your Yolov2 model based on dataset and labels you've generated

options = {"model": "./cfg/yolov2-voc-1c.cfg", 
           "load": "/content/gdrive/My Drive/YOLO/yolov2-voc.weights",
           "batch": 16,
           "epoch": 100,
           "gpu": 1.0,
           "train": True,
           "annotation": "/content/gdrive/My Drive/YOLO/yolo_dataset/annotations/",
           "dataset": "/content/gdrive/My Drive/YOLO/yolo_dataset/images/"}

This is main code for setting up your configuration files, dataset and annotation. make sure that path given for each file exist and is correctly mentioned. Here are some tips you can use to tweak parameters

a) Batch size can be increased depends upon your gpu capacity.

b) Batch size should be a divisor of sample size otherwise all the images will not be used in training.

c) Lower the batch size better will be accuracy , but if you have high GPU then use high batch size.

d) Lower batch size would consume less resources and more time.

tfnet = TFNet(options)

After running this option all the configuratuion and network architecture will be shown in the result. If something is not correct this step will show error.

Output is expected in this way

Run this option, go and make yourself a cup of coffee this will take time . Some metrics to consider during training

a) ave loss denotes how well model is performing lesser the ave loss better is the model.

b) ave loss less than 0.5 is considered as good score

c) Keep tracking ave loss if it is not decresing for long period of time then stop the training and save the checkpoint files.

Step 5 & 6: Inference and saving the model


# After this step model is saved in built_graph folder
# Download the files if you are doing this in colab otherwise it will be lost and # can not be found later once colab reset or 
# internet is disconnected

options = {"model": "cfg/yolov2-voc-1c.cfg",
           "load": -1,
           "gpu": 1.0,

# Load : -1  denotes that we are loading model from last checkpoint
# Thresold parameter is for confidence score this means model will output every box having probability greater than the thresold

options = {"model": "cfg/yolov2-voc-1c.cfg",
           #"load": 5500,
           "gpu": 1.0,
          "pbLoad":"/content/gdrive/My Drive/YOLO/yolo_info/yolov2-voc-1c.pb" ,
          "metaLoad":"/content/gdrive/My Drive/YOLO/yolo_info/yolov2-voc-1c.meta"}

## This can be used to do inference by trained model 
## pbload shows destination of .pb file saved during tfnet.savepb()
## metaload shows destination of .meata file saved during tfnet.savepb()
tfnet2 = TFNet(options)

This option will show configuration and network used. If there is no error so far you are good to go for inference.


# do not use this option if you are using .pb and .meta file for inference
import pprint as pp

original_img = cv2.imread("/content/gdrive/My Drive/monkey.71.jpg")
original_img = cv2.cvtColor(original_img, cv2.COLOR_BGR2RGB)
results = tfnet2.return_predict(original_img)
fig, ax = plt.subplots(figsize=(15, 15))

This will show your original image on which you want to use prediction

def boxing(original_img , predictions):
    newImage = np.copy(original_img)

    for result in predictions:
        top_x = result['topleft']['x']
        top_y = result['topleft']['y']

        btm_x = result['bottomright']['x']
        btm_y = result['bottomright']['y']

        confidence = result['confidence']
        label = result['label'] + " " + str(round(confidence, 3))
        if confidence > 0.2:
            newImage = cv2.rectangle(newImage, (top_x, top_y), (btm_x, btm_y), (255,0,0), 3)
            newImage = cv2.putText(newImage, label, (top_x, top_y-5), cv2.FONT_HERSHEY_COMPLEX_SMALL , 0.8, (0, 230, 0), 1, cv2.LINE_AA)
    return newImage

This is a python function used to create bounding boxes and confidence score around monkeys.

fig, ax = plt.subplots(figsize=(20, 10))
ax.imshow(boxing(original_img, results))

This will show output having bounding box around it

In my case 

For doing inference on multiple images at same time

from math import ceil

fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(20, 10))

for i in range(5):
    original_img = cv2.imread("/content/gdrive/My Drive/test_image" + str(i+1) + ".jpg")
    original_img = cv2.cvtColor(original_img, cv2.COLOR_BGR2RGB)
    results = tfnet2.return_predict(original_img)
    ax[ceil(i/3)-1, i%3].imshow(boxing(original_img, results))

To work with video

#Inference on video
# This code also can be used with webcam

cap = cv2.VideoCapture('/content/gdrive/My Drive/monkey_test.avi')
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)   
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT) 

fourcc = cv2.VideoWriter_fourcc(*'DIVX')
out = cv2.VideoWriter('/content/gdrive/My Drive/output_test.avi',fourcc, 20.0, (int(width), int(height)))

    # Capture frame-by-frame
    ret, frame =
    if ret == True:
        frame = np.asarray(frame)      
        results = tfnet2.return_predict(frame)
        new_frame = boxing(frame, results)

        # Display the resulting frame
        #cv2.imshow('frame', new_frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
# When everything done, release the capture


Final result in form of video



My github repo

further exploration

Please suggest us topics in the comment-box that you want to explore and learn


Learn Face Detection Step by Step With Code In tensorflow.

sheetala tiwari

I am passionate about Data Science and Machine Learning. I am currently building an AI community on DataDiscuss and we are committed to providing free access to education for everyone.