How can our robot understand what it's looking at? How can it search for, say, a colored block, or the "fish tank"? One possibility is to use a camera with some sort of object recognition. I currently have a Pi Camera Module connected to my test robot, and I was wondering if there's a simple way to use that to identify objects. I searched around a bit on YouTube, and found some examples of object detection, using neural networks to classify objects in the camera's image frame.
Most of the examples I found used the open source library TensorFlow Light to run an object classifier. TensorFlow Light is a less resource-intensive version of the TensorFlow library, and is more suitable for devices like the Pi. This method of object detection supposedly works on any camera with adequate resolution.
But, how do these neural network classifiers work? As I understand it, TensorFlow uses a particular classification algorithm, called a Deep Neural Network object detection algorithm. This is a module that can be trained to search for a particular type of object, and then draw boxes around each instance of that object it sees in the image.
The algorithm is trained by giving it a large collection of training pictures, and another collection of test pictures. For each training picture, you have to manually draw a box around the object in that picture, and feed that information into the algorithm, along with the image. The neural network will then incrementally adjust itself, over and over, to find patterns between the training images. It will then look at the test images, and try to find that same object in those. You also draw identifying boxes in the test images, so the algorithm can see how accurate it was. You keep running the algorithm over the training and test images until its test-image accuracy reaches an adequate level. At that point, the values of the neural network's connections can be saved as a classifier file, and loaded onto the Raspberry Pi for use in the robot.
In our case, if the arena floor and walls are a uniform color that's distinct from the objects, then it should be relatively easy for the classifier to identify the objects. We could place the blocks in various places in the arena, take a bunch of training pictures of them, and then train the classifier.
It looks like there are multiple DNN classifer algorithms that TensorFlow can use. The SSD Mobilenet algorithm seems to run faster than the others, and is designed for low-powered devices, so that's probably a good place to start.
Once the classifier file is loaded onto the Pi, actually using it seems fairly simple. You can load it using the TensorFlow Python library, then take an image from the camera, and pass that to the classifier. The classifier will then return an array of box coordinates, drawn around each instance of the object it found in that frame, along with a confidence percentage.
If we want the robot to turn toward an object, we could turn the robot in a circle a few degrees at a time, and take a camera picture after each turn, then run the classifier on that picture. If the object is found in the circle, we could keep turning toward it until the robot is facing directly toward the object. Then, we could determine the distance to the object by the relative size of its bounding box.
I'm not sure if we could determine the orientation to the object easily using these tools. For example, I don't know if these classification algorithms could tell us whether the robot is facing the corner of the green box, or its side.