Object detection -Using ssd_mobilenet

Chocky _18

Apr 5, 20224 min read

The project is about the detection of objects and estimating the distance difference between the target and the camera using intel depth sense technology — vision intelligence.

Object detection refers to the task of identifying various objects within an image and drawing a bounding box around each of them.

Depth is a key prerequisite to performing multiple tasks such as perception, navigation, and planning in the world of industries. Our human eyes are capable to view the world in three dimensions, which us enables to perform a wide range of tasks.

Similarly, providing a perception of depth to machines that already have computer vision opens a boundless range of applications in the field of robotics, industrial automation, and various autonomous systems.

The project consists of reading an intel real sense camera output(RGB + depth) aligning the streams. Afterwards, an object detection module is included using ssd_mobilenet weights on RGB images. The coordinates are converted from SSD to bbox format and the detections are then projected onto a depth stream. The objects are cropped and histograms are projected for depth analysis.

Build Real-Time Object Detector:

We Will:

Perform object detection on custom images using Tensorflow Object Detection API
Use Google Colab free GPU for training and Google Drive to keep everything synced.

Steps to achieve Object detection on custom data:

Collecting Images.
Labelling Images.
Setting up the environment:

Runtime > GPU from the hardware accelerator.

Splitting the data into training and testing.
importing and installing required packages/ libraries:

We need Tensorflow version 1.15.0. Check the Tensorflow version by running.

Preprocessing Images and Labels:

We need to create two csv files for the .xml files in each train_labels/ and test_labels/

Downloading the Tensorflow model:

The Tensorflow model contains the object detection API we are interested in. We will get it from the official repo.

Generating TFRecords.
Choose a Pre-Trained Model and Download it.
Download the configuration file and model which is used to train the images. Make a new directory called training. Now you have the following directories training, data, and images. There are many models which can be downloaded. Please visit the link Tensorflow detection model zoo to download the model you want. I will personally download the ssd_mobilenet_v1_coco_11_06_2017 model and ssd_mobilenet_v1_pets.config
We have to make some changes in the file ssd_mobilenet_v1_pets.config . Please open it with any text editor (use sublime, nano, vim, gedit) and set the number of the classes = to whatever you have labelled.
change this line to:

fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt" #replace PATH_TO_BE_CONFIGURED with the name of the model you have downloaded.

next lines of the code as below, Replace these lines with the following lines:

train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "training/object-detection.pbtxt" }

next lines of the code as below, Replace these lines with the following lines:

tf_record_input_reader { input_path: "data/test.record" } label_map_path: "training/object-detection.pbtxt"

To train the model:

!python train.py --logtostderr --train_dir=training/ -- pipeline_config_path=training/ssd_mobilenet_v1_pets.config

export_inference_graph:

!python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_pets.config --trained_checkpoint_prefix training/model.ckpt-20000 --output_directory sheet_inference_graph

Model preparation:

Loading label map:

# List of the strings that are used to add a correct label for each box. PATH_TO_LABELS = 'training/object-detection.pbtxt' category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

test on images:

# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS. PATH_TO_TEST_IMAGES_DIR = pathlib.Path('images/test') TEST_IMAGE_PATHS = sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg"))) TEST_IMAGE_PATHS

Image_size:

# Size, in inches, of the output images. IMAGE_SIZE = (24, 24)

Detection:

load_image_into_numpy_array:

def load_image_into_numpy_array(image): (im_width, im_height) = image.size return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8)

run_inference_for_single_image:

def run_inference_for_single_image(image, graph): with graph.as_default(): with tf.Session() as sess: # Get handles to input and output tensors ops = tf.get_default_graph().get_operations() all_tensor_names = {output.name for op in ops for output in op.outputs} tensor_dict = {} for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks' ]: tensor_name = key + ':0' if tensor_name in all_tensor_names: tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name) if 'detection_masks' in tensor_dict: # The following processing is only for single image detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0]) # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size. real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32) detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1]) detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1]) detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[0], image.shape[1]) detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8) # Follow the convention by adding back the batch dimension tensor_dict['detection_masks'] = tf.expand_dims( detection_masks_reframed, 0) image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0') # Run inference output_dict = sess.run(tensor_dict, feed_dict={image_tensor: np.expand_dims(image, axis=0)}) # all outputs are float32 numpy arrays, so convert types as appropriate output_dict['num_detections'] = int(output_dict['num_detections'][0]) output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.uint8) output_dict['detection_boxes'] = output_dict['detection_boxes'][0] output_dict['detection_scores'] = output_dict['detection_scores'][0] if 'detection_masks' in output_dict: output_dict['detection_masks'] = output_dict['detection_masks'][0] return output_dict

run_detection:

def run_detection(path_to_frozen_graph, title): detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(path_to_frozen_graph, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') plt.figure(figsize=IMAGE_SIZE) for i, image_path in enumerate(TEST_IMAGE_PATHS): image = Image.open(image_path) # the array based representation of the image will be used later in order to prepare the # result image with boxes and labels on it. image_np = load_image_into_numpy_array(image) # Actual detection. output_dict = run_inference_for_single_image(image_np, detection_graph) # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) plt.subplot(3, 3, i+1) plt.imshow(image_np) plt.title(title)

Run inference.

Inference using frozen_inference_graph.pb.

run_detection('/content/drive/MyDrive/SSD/models/research/object_detection/sheet_inference_graph/frozen_inference_graph.pb', 'sheet_inference_graph')

Conclusion:

This article is showing a small number of examples for using deep learning.

Object detection -Using ssd_mobilenet

Build Real-Time Object Detector:

Conclusion:

Recent Posts

8 Comments