Face Tracking Robot with an ESP32-CAM

Face Tracking Robot ESP32-CAM

An auto-balancing robot that can track your face and follow you around a room

Elegoo kindly sent me a Tumbller self-balancing robot for my project. It’s easy to put together and comes with clear instructions. Fully assembled it looks like this.

Tumbller Robot Assembled

The kit comes with all the parts you need to build a remote control or autonomous self-balancing robot.

The robot has a custom motherboard with a socket for the Arduino Nano and a Bluetooth module soldered on.

Robot Bluetooth
Robot Arduino Nano
Arduino Nano
Robot Motherboard

The robot uses various sensors to keep balance and measure its environment.

Robot Motor Controller
Motor Controller
Robot Gyro Accelerometer
Gyro / Accelerometer
Robot Ultrasonic Sensor
Ultrasonic Sensor

I wanted to add face tracking to the robot so it could follow a person around the house. I added an ESP32-CAM camera module, mounted with 3d printed parts and connected via cables to serial on the Arduino Nano on the Robot.

ESP32-CAM Face Tracking Robot
Robot with 3d printed camera mount
ESP32-CAM Face Tracking Camera
Camera mount close-up

On the ESP32 I use the ESP-FACE libraries to first detect a face and then measure the distance and location of the face in the frame.

This data is then sent to the Arduino Nano which processes this data and controls the robots direction. I have previous face detection projects for the ESP32-CAM and I used the https://robotzero.one/face-tracking-esp32-cam/ project as the basis.

ESP32-CAM Code

The height of a detected face is measured using the ESP32 face detection library and this is used to calculate the approximate distance to the face using the following formula:

int eq_top = 3.6 * 200 * 240; // f(mm) x real height(mm) x image height(px)
int eq_bottom = smoothed_face_height * 2.7; //object height(px) x sensor height(mm)
int face_distance = eq_top / eq_bottom;

eq_top is the focal length (f) of the ov2640 I’m using (3.6) multiplied by the real height (estimate) of a face (20mm) multiplied by the image height (240px) of the camera frame.

eq_bottom is the smoothed (averaged) detected face height – calculated by averaging the last 5 face height readings from the camera – multiplied by the physical size of the sensor (2.7mm)

eq_top is then divided by eq_bottom to get the distance of the person’s face from the camera. In testing over the serial monitor this worked quite well to get the distance

The data captured by the ESP32 is sent using a second serial connection on pins 2 and 14 to the Arduino Nano on the robot. The code for this looks like this:

Serial2.print('<'); // start marker
Serial2.print(','); // comma separator
Serial2.print(','); // comma separator
Serial2.println('>'); // end marker

face_center_pan is calculated the same way as the https://robotzero.one/face-tracking-esp32-cam/ tutorial. face_center_tilt isn’t used for now in the project.

In the code repository on Github there are two Sketches for the ESP32. The esp32-wifi-version.ino sketch has extra code which enables viewing of the camera feed in a browser, including the green box around the face. This extra code reduces the frame rate and therefore the fluidity of the face capture. The robot is more responsive when the esp32-fast-version.ino sketch is used.

Arduino Nano Code

On the Arduino Nano the serial data is captured as a character array by the recvWithStartEndMarker() function. The received characters are then converted into variables and mapped to values by this code:

int result = sscanf(receivedChars, "%i,%i,%i", &pan, &tilt, &distance);

// face location on camera, mapped to turn
new_setting_turn_speed = map(pan, 0, 320, 40, -40);
// face distance,  mapped to speed
new_setting_car_speed = map(distance, 10, 1200, -20, 20); 

The further the face is from the centre of the frame, the faster the robot will turn and the greater the distance measured from the face, the faster the robot will move.


One problem with the robot is the distance from the face to the camera also includes a vertical distance. Increasing the height of the camera makes the robot less stable and therefore less able to capture faces.

Face Tracking Distance Calculation
Showing the distance error when the camera and face are at different heights

The face recognition library and camera work really well but only in a limited distance range. It’s possible to increase the distance by reducing the minimum detected face size in the settings but this makes the detection slower and can make the robot less responsive.

This project can also be found on YouTube here: https://www.youtube.com/watch?v=cdpqpqMnBwI


Calculating Distance – https://photo.stackexchange.com/questions/12434/how-do-i-calculate-the-distance-of-an-object-in
Arduino Smoothing – https://www.arduino.cc/en/tutorial/smoothing
Comma Separated Variables Over Serial – https://forum.arduino.cc/index.php?topic=541887.0
Arduino Easing Library (potential future motor control improvement) – https://github.com/luisllamasbinaburo/Arduino-Easing

15 Replies to “Face Tracking Robot with an ESP32-CAM”

  1. Gabriele says:

    Very nice work!
    How hard would be to add a voice recognition module and a voice speech one?
    Can we replace the bt module with a standard one and recreate an app to control it? Could we use a generic arduino remote controller app in that case?
    I’m going to post my progress on this bot (my first one) here

    1. WordBot says:

      If you want to use the ESP32 there’s this: https://github.com/espressif/esp-skainet but you have to pay for custom words. I’m pretty sure there are Arduino voice modules available. This might be helpful https://www.youtube.com/watch?v=_r0Y7wJrM70
      I’ve not seen a text to speech for the ESP32 but you could play MP3s through a speaker: https://www.esp32.com/viewtopic.php?f=20&t=14717&p=56965#p56965
      I’ve not investigated but I think the BT module is standard.
      The big problem with both the ESP32 and the robot is the lack of pins that are available.

  2. Gabriele says:

    I just started with arduino and co. so I’m a complete newbie but I’m not able to connect with the phone to the bot directly and I read the chipset is tied to the elegoo ble tool app (https://forum.arduino.cc/index.php?topic=529411.0).
    Were you able to connect to it and use one of the standard arduino controller apps the play store is full of?
    Can you also explain why you say the board misses the pins? How many esp32 is possible to connect to it? Which pins are actually usable?

    1. WordBot says:

      The Tumbller is pretty complicated for a newbie to learn with. I’ve not tried with the Bluetooth. I had assumed you could use it like a normal bluetooth module but I don’t see a library for it in the software so maybe it is hardcoded somehow. I’m not sure why Elegoo would bother to do that though.
      The Tumbler pins are nearly all connected to sensors etc so finding a spare one you can use might be hard. The ESP32-CAM has a few spare pins depending what you want to do with them.

  3. Gabriele says:

    I found this tutorial for speech and voice recognition https://create.arduino.cc/projecthub/msb4180/speech-recognition-and-synthesis-with-arduino-2f0363
    You think it can fit it?

    1. WordBot says:

      You can use the results of the speech recognition to control the robot. For example if you say ‘left’ and the Duo recognises it (well the online service recognises it) you can then send the command via serial and then have the nano control the wheels to make it go left.

  4. Facundo says:

    Hi. Good project. I tried to add this recognition too: https://robotzero.one/esp-who-recognition-with-names

    but it was impossible for me. You’ve tried?
    It would be nice to add both projects.

    1. WordBot says:

      Hi, The problem is the face detection runs with just about a fast enough frame rate to work for the robot tracking but face recognition is too slow. I have something else in the pipeline for this… subscribe to the newsletter (on the right) and you’ll get an email when it goes on the site.

  5. Facundo says:

    okay. Generally I use this library: to be able to integrate it with the Web Viewer of the app inventor. But adapting face detection was impossible for me.

  6. Facundo says:

    lib: W eb Se rv er . h

  7. Gabriele says:

    All images are gone in this page, can you repost them?
    I wanted to see the pis you used for the new module

    1. WordBot says:

      Looks like there was a problem with the cache. Can you see them now?

      1. Gabriele says:

        All ok now, but already found what I was looking for in your video

  8. Christina says:


    I am trying to play around with the tumbller and I wonder how you find the free pins which you could use for the esp32-cam in the first place? Can you please tell?


    1. WordBot says:

      Hi, It talks over serial. I don’t remember which pins those are on the two boards but should be easy to find out.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

scroll to top