Using the ESP-WHO library and a pan and tilt platform to track a moving face.
With the Espressif ESP-FACE library it’s easy to detect a face and find its location in the frame. The library provides a function called draw_face_boxes that is normally used to display a box around a detected face.
The X and Y co-ordinates of this box combined with its height and width can be used find the centre of the box and therefore the centre of the face.
For example, if X is at 105px, Y is at 90px, and the box has a width of 50px and height of 70px then the centre can be found by adding half the width or height of the box to the X or Y values like this: x+w/2, y+h/2 so for the figures above, 105+50/2 and 90+70/2 would give the face centre as x:130 and y:125.
One of the tricky parts of using a pan and tilt platform to track a face is converting the distance of the face from the centre in pixels to the degrees the platform needs to move. I’ve chosen the simplest method using a basic conversion from pixels to degrees.
One guide I found recommended using the diagonal measurement of the sensor as below:
For QVGA (320×240)
sqrt(sq(320) + sq(240)) = 400
and then dividing the field of view (for my camera 45 degrees) by this to get the pixels per degree of rotation:
400/45 = 8.89
So for every ~9 pixels of movement in the frame, the servo moves 1 degree in that direction.
However, with the platforms I’ve used, the degrees of movement of the servos don’t coincide with the change in degrees of the view area because either the pan or tilt is offset from the centre of rotation.
My original plan was to get the reading and move the platform straight to that location but often it would overshoot (possibly a problem with the off-centred sensor or maybe just my maths) and start oscillating back and forth. So I changed the code to only move half the registered distance each time until it reached the new location. I experimented with looping this movement until completed and then return to detecting, but I went for continuous detection and calculation in the end.
I’ve seen other tutorials where the servos are moved in the direction of the face until the face is in the centre of the frame which is another approach. I think this might only work well when the frame rate is higher. The face detection runs about a 3 frames per second.
Another thing I’ve noticed is that variations in the detected face location mean the pan and tilt platform wanders a little when the face is centred. Some code could be added that so the servos are only activated if the face is outside of the centre area.
Face Tracking Video Demonstration
If you’ve not used the ESP32-CAM before you will need to read through this tutorial first – https://robotzero.one/esp32-cam-arduino-ide/ to get familiar with it.
You also need to install the ArduinoWebsockets library by searching in Tools > Manage Libraries:
Copy and paste the Sketch below and save it. Copy these two files: camera_index.h and camera_pins.h to the same directory. You should be able to compile and run the same way as other ESP32-CAM projects. This project works with version 1.0.4 of the ESP32 hardware libraries for the Arduino IDE.
I’ve also created a version with the green box around the face. The frame rate is less on this version because it takes time to combine the box with the frame and convert to jpg. You can download it from pastebin here: https://pastebin.com/ECQPxuec
If anyone has suggestions for improving the maths or how to calculate degrees of movement when the sensor pan or tilt movement is off the axis centre please let me know via the comments or contact form.
If you found something useful above please say thanks by buying me a coffee here...
3D printable pan tilt mount: https://www.thingiverse.com/thing:3579507
Pan and tilt location calculation (complicated): https://stackoverflow.com/questions/44253787/translating-screen-coordinates-x-y-to-camera-pan-and-tilt-angles
Pan and tilt location calculation (simple – the one I used): https://stackoverflow.com/questions/17499409/opencv-calculate-angle-between-camera-and-pixel
The reason simple isn’t accurate: https://www.quora.com/How-can-I-find-the-pixels-per-degree-if-I-know-the-resolution-and-angle-of-view-for-a-pi-cam