ESP-WHO Face Recognition with WebSocket Communication


Using the ESP-WHO library to record faces with names and then display the name when a face is recognised.

This project uses the ArduinoWebsockets library for two way communication between the ESP32 and the browser. All the face detection, capturing and recognising are done on the ESP32. The browser sends instructions and receives notifications via WebSockets for updating the interface. The same WebSocket library is used to send the camera data to the browser as binary blobs.

Video showing the interface in action

The Interface

The interface consists a camera feed plus the following elements:

A status area, showing the current status of the ESP32:

Interface Status

A form field to enter the name of the person:

Interface Name

Four buttons that control the ESP32. They are STREAM to just stream the frames from the camera, DETECT for detecting faces in the stream, CAPTURE for capturing the current face and RECOGNISE for matching a face from the camera to a previous captured face:

Interface Buttons

If a face has been captured it can be seen in the list under Captured Faces. A face can be deleted by clicking the X next to it:

Interface Captured Face

DELETE ALL will delete all faces stored on the ESP32:

Interface Delete

Setting Up

If you’ve not set up or tested your ESP32 Camera in the Arduino IDE yet then please follow this tutorial first: You will also need to set up persistent storage on your board. Follow the steps under Persistent Storage Partition Scheme in this tutorial:

This application needs the latest version of the ESP32 package in the Arduino IDE. Update the ESP32 board library to 1.0.2 or higher. Tools > Board > Board Manager:

Arduino Board Manager

You also need to install the WebSockets library in the IDE by navigating Tools > Manage Libraries and searching for arduinowebsockets and installing it. Version 0.4.5 works for me.

Library Manager showing ArduinoWebsockets

Copy and paste the Sketch below and save it as a new Sketch. Add to the folder where the Sketch has been saved these two files: camera_index.h and camera_pins.h . camera_index.h is the HTML for the interface and camera_pins.h is the camera definitions.

In the pasted Sketch, edit the ssid and password to match your WiFi and uncomment the define for the camera you are using.

The Code

#include <ArduinoWebsockets.h>
#include "esp_http_server.h"
#include "esp_timer.h"
#include "esp_camera.h"
#include "camera_index.h"
#include "Arduino.h"
#include "fd_forward.h"
#include "fr_forward.h"
#include "fr_flash.h"

const char* ssid = "NSA";
const char* password = "orange";


// Select camera model
#include "camera_pins.h"

using namespace websockets;
WebsocketsServer socket_server;

camera_fb_t * fb = NULL;

long current_millis;
long last_detected_millis = 0;

void app_facenet_main();
void app_httpserver_init();

typedef struct
  uint8_t *image;
  box_array_t *net_boxes;
  dl_matrix3d_t *face_id;
} http_img_process_result;

static inline mtmn_config_t app_mtmn_config()
  mtmn_config_t mtmn_config = {0};
  mtmn_config.type = FAST;
  mtmn_config.min_face = 80;
  mtmn_config.pyramid = 0.707;
  mtmn_config.pyramid_times = 4;
  mtmn_config.p_threshold.score = 0.6;
  mtmn_config.p_threshold.nms = 0.7;
  mtmn_config.p_threshold.candidate_number = 20;
  mtmn_config.r_threshold.score = 0.7;
  mtmn_config.r_threshold.nms = 0.7;
  mtmn_config.r_threshold.candidate_number = 10;
  mtmn_config.o_threshold.score = 0.7;
  mtmn_config.o_threshold.nms = 0.7;
  mtmn_config.o_threshold.candidate_number = 1;
  return mtmn_config;
mtmn_config_t mtmn_config = app_mtmn_config();

face_id_name_list st_face_list;
static dl_matrix3du_t *aligned_face = NULL;

httpd_handle_t camera_httpd = NULL;

typedef enum
} en_fsm_state;
en_fsm_state g_state;

typedef struct
  char enroll_name[ENROLL_NAME_LEN];
} httpd_resp_value;

httpd_resp_value st_name;

void setup() {

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  //init with high specs to pre-allocate larger buffers
  if (psramFound()) {
    config.frame_size = FRAMESIZE_UXGA;
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_SVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;

  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);

  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);

  sensor_t * s = esp_camera_sensor_get();
  s->set_framesize(s, FRAMESIZE_QVGA);

  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);

  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
  Serial.println("WiFi connected");


  Serial.print("Camera Ready! Use 'http://");
  Serial.println("' to connect");

static esp_err_t index_handler(httpd_req_t *req) {
  httpd_resp_set_type(req, "text/html");
  httpd_resp_set_hdr(req, "Content-Encoding", "gzip");
  return httpd_resp_send(req, (const char *)index_ov2640_html_gz, index_ov2640_html_gz_len);

httpd_uri_t index_uri = {
  .uri       = "/",
  .method    = HTTP_GET,
  .handler   = index_handler,
  .user_ctx  = NULL

void app_httpserver_init ()
  httpd_config_t config = HTTPD_DEFAULT_CONFIG();
  if (httpd_start(&camera_httpd, &config) == ESP_OK)
    httpd_register_uri_handler(camera_httpd, &index_uri);

void app_facenet_main()
  face_id_name_init(&st_face_list, FACE_ID_SAVE_NUMBER, ENROLL_CONFIRM_TIMES);
  aligned_face = dl_matrix3du_alloc(1, FACE_WIDTH, FACE_HEIGHT, 3);

static inline int do_enrollment(face_id_name_list *face_list, dl_matrix3d_t *new_id)
  int left_sample_face = enroll_face_id_to_flash_with_name(face_list, new_id, st_name.enroll_name);
  ESP_LOGD(TAG, "Face ID %s Enrollment: Sample %d",
           ENROLL_CONFIRM_TIMES - left_sample_face);
  return left_sample_face;

static esp_err_t send_face_list(WebsocketsClient &client)
  client.send("delete_faces"); // tell browser to delete all faces
  face_id_node *head = st_face_list.head;
  char add_face[64];
  for (int i = 0; i < st_face_list.count; i++) // loop current faces
    sprintf(add_face, "listface:%s", head->id_name);
    client.send(add_face); //send face to browser
    head = head->next;

static esp_err_t delete_all_faces(WebsocketsClient &client)

void handle_message(WebsocketsClient &client, WebsocketsMessage msg)
  if ( == "stream") {
    g_state = START_STREAM;
  if ( == "detect") {
    g_state = START_DETECT;
  if (, 8) == "capture:") {
    g_state = START_ENROLL;
    char person[FACE_ID_SAVE_NUMBER * ENROLL_NAME_LEN] = {0,};, sizeof(person));
    memcpy(st_name.enroll_name, person, strlen(person) + 1);
  if ( == "recognise") {
    g_state = START_RECOGNITION;
  if (, 7) == "remove:") {
    char person[ENROLL_NAME_LEN * FACE_ID_SAVE_NUMBER];, sizeof(person));
    delete_face_id_in_flash_with_name(&st_face_list, person);
    send_face_list(client); // reset faces in the browser
  if ( == "delete_all") {

void loop() {
  auto client = socket_server.accept();
  dl_matrix3du_t *image_matrix = dl_matrix3du_alloc(1, 320, 240, 3);
  http_img_process_result out_res = {0};
  out_res.image = image_matrix->item;


  while (client.available()) {

    fb = esp_camera_fb_get();

    if (g_state == START_DETECT || g_state == START_ENROLL || g_state == START_RECOGNITION)
      out_res.net_boxes = NULL;
      out_res.face_id = NULL;

      fmt2rgb888(fb->buf, fb->len, fb->format, out_res.image);

      out_res.net_boxes = face_detect(image_matrix, &mtmn_config);

      if (out_res.net_boxes)
        if (align_face(out_res.net_boxes, image_matrix, aligned_face) == ESP_OK)

          out_res.face_id = get_face_id(aligned_face);
          last_detected_millis = millis();
          if (g_state == START_DETECT) {
            client.send("FACE DETECTED");

          if (g_state == START_ENROLL)
            int left_sample_face = do_enrollment(&st_face_list, out_res.face_id);
            char enrolling_message[64];
            sprintf(enrolling_message, "SAMPLE NUMBER %d FOR %s", ENROLL_CONFIRM_TIMES - left_sample_face, st_name.enroll_name);
            if (left_sample_face == 0)
              ESP_LOGI(TAG, "Enrolled Face ID: %s", st_face_list.tail->id_name);
              g_state = START_STREAM;
              char captured_message[64];
              sprintf(captured_message, "FACE CAPTURED FOR %s", st_face_list.tail->id_name);


          if (g_state == START_RECOGNITION  && (st_face_list.count > 0))
            face_id_node *f = recognize_face_with_name(&st_face_list, out_res.face_id);
            if (f)
              char recognised_message[64];
              sprintf(recognised_message, "RECOGNISED %s", f->id_name);
              client.send("FACE NOT RECOGNISED");

        if (g_state != START_DETECT) {
          client.send("NO FACE DETECTED");

      if (g_state == START_DETECT && millis() - last_detected_millis > 500) { // Detecting but no face detected


    client.sendBinary((const char *)fb->buf, fb->len);

    fb = NULL;


At the moment the recognition code always finds a matching face even when a face hasn’t been captured. This problem has been fixed but isn’t in the Arduino release code yet.


Arduino WebSocket library I used:
ESP-WHO WeChat Example for the face recognition with names code –
Hidden face photo by Honey Yanibel Minaya Cruz on Unsplash

19 Replies to “ESP-WHO Face Recognition with WebSocket Communication”

  1. Peter says:

    I have the same Problem with the the same Code for my ttgo pir board. It recognizes every face as enrolled face.
    After I downgraded to esp-face 0.34, everything is good. My Intension is a face recognation Background Task, wake up by pir and sending Image and message per MQTT. For watching i use a simple webserver.


    1. WordBot says:

      Hi, There’s a couple of issues and a possible solution on the ESP-WHO Github page:

  2. Cobiam says:

    Good day robotzero, first I thank you for your great work to publish and explain to a large extent this project of libre use. I am a manager of technological projects in a Colombian university and we want to know if we can in any way implement or embed this project in Moodle.
    We also need a good manual to emulate the use of this project on a Debian 9 virtual machine and later try to include it in Moodle for the use of facial recognition for the presentation of virtual exams.
    Thank you very much for your collaboration and support.
    Happy day.

    1. WordBot says:

      Hi there. I don’t think you will be able to use the ESP32 like this easily or at all. You are better to look for something in OpenCV or similar libraries.

  3. Felipe Messias says:

    Hello, Congratulations for the code, I added to this code an MQTT client that publishes messages to the Broker whenever it detects new faces, however I noticed 2 problems in the code, if I do not have the web interface of the camera running in browser there is no face detection and a another problem is that if I configure to recognize mode the camera image freezes in the browser in a few minutes, how could I fix these 2 problems?

    1. WordBot says:

      Hi, I’ll take a look tomorrow but it might be the websockets stuff will only work when you are running in a browser. This sketch is really to load up the faces and then you could use another sketch for normal operations (like this: I had a problem with it crashing when it’s been detecting a while but I was hoping it was a bug in the Arduino ESP32 library so I was waiting for the new version to be released. It might also be overheating. I have an ESP-EYE that is on the way out because of the heat generated when streaming the camera over WiFi.

  4. WordBot says:

    Do you see an error in the serial monitor? I’ve been testing and I get crashes but there doesn’t seem to be a pattern. This is the error I see:
    CORRUPT HEAP: multi_heap.c:308 detected at 0x3ffe7264
    I put a new camera in my ESP-EYE today because the heat had damaged the one it comes with.

  5. Felipe Messias says:

    Thanks for the answer, I’ll try this solution joining the two codes, in fact I checked some random errors, sometimes the camera rebooted, sometimes the camera stopped communicating with the camera, sometimes only the image in the recognizable mode crashed, hence I updated all files from esp32 direct from the github directory ( and started using the AI Thinker ESP 32 CAM card, I made the changes to use the partition created by you and practically a good part of the problems are gone, I just notice the camera restarting the times (every start an MQTT message is sent to my broker) and the image freeze in recognize mode, but other than that, all random errors were corrected

  6. Felipe Messias says:

    On the websocket library somehow when the line “auto client = socket_server.accept ();” is read in its code the main loop is “locked”, it is only “freed” after some client access the camera server via browser, still behaves strangely, the loop passes to re-read the function to each action taken in the camera web server interface, I tried to implement a reading of offline faces of the camera, however I was prevented by this, this solution that you suggested to me seems to be promising, I will try to implement and communicate you about the results

  7. David says:

    Hi Robotzero,

    Many thanks for sharing, the project is promised . Unfortunately I cant make it work. when arduino compile it complain with this:
    esp32CAM:48:15: error: ‘struct mtmn_config_t’ has no member named ‘type’
    mtmn_config.type = FAST;
    esp32CAM:51:15: error: ‘struct mtmn_config_t’ has no member named ‘pyramid_times’
    mtmn_config.pyramid_times = 4;

    I did tried with head files from the “ESP who GIT”, still same issue. Very strange since I can see TYPE & pyramid_times in the *.h file. So why did it complain?

    #include “c:\Users\tuan\Documents\Arduino\esp-who\components\esp-face\face_detection\include\fd_forward.h”
    #include “c:\Users\tuan\Documents\Arduino\esp-who\components\esp-face\face_recognition\include\fr_forward.h”
    #include “c:\Users\tuan\Documents\Arduino\esp-who\components\esp-face\face_recognition\include\fr_flash.h”

    My board is “esp32 dev module” , camera CAMERA_MODEL_AI_THINKER. My esp32-cam work fine with this example CameraWebServer.ino . can you see why?

    1. WordBot says:

      Which version of the ESP32 boards do you have in the Arduino IDE? I used 1.0.3rc1 in the tutorial. You shouldn’t need to link like that to include files. They might not be the correct versions. The guys at Espressif are making lots of changes to the code for ESP-WHO.

  8. David says:

    Thanks, it helps with 1.0.3rc1 !. no more error.
    i found the following line at that site
    add this in arduino preferences. Install board 1.0.3rc1 done

    it works fine, except I need to look at my SD card (32GB but only one primary partition 4Gb).

    17:52:45.303 -> ………….
    17:52:51.306 -> WiFi connected
    17:52:51.306 -> httpd_start
    17:52:51.306 -> E (11077) fr_flash: Not found
    17:52:51.306 -> Camera Ready! Use ‘’ to connect


  9. David says:

    a cup of java coffee is on the way, I wish I could buy 2 or 3 for you at once 🙂

  10. david says:

    hi robotzero
    is it the SD card
    I have insert a SD card with 4GB partition (the card is 32GB, I create 1 partition with windows fat32)
    I get this when start esp32
    17:52:51.306 -> httpd_start
    17:52:51.306 -> E (11077) fr_flash: Not found

    why fr_flash? there is a 4GB SD in the slot

    and when I click capture (after enter a name), nothing were saved as it show in your clip.

    1. WordBot says:

      fr_flash isn’t the SD card. It’s a partition on the flash memory on the board. Maybe try with this tutorial first to set up a new partition type..

  11. David says:

    thanks, i am now satisfied with your C-code on , quite stable, fast

  12. Alan says:

    Hi robotzero,
    I have been using your code to capture some faces and it has been working successfully.
    I am developing a different application that uses the recognised faces. To add more faces, I recompiled your code, flashed it to the ESP and I’m getting an error when a browser connects to the ESP32. The error appears immediately after the browser connects. The error shows that the socket has been disconnected. Confusingly, using Safari as the browser on an iMAC, theWeb Inspector reports the error as “WebSocket connection to ‘ws://’ failed: Could not decode a text frame as UTF-8.” whereas using Firefox as the browser the equivalent development tool (Web Console) shows the error as “The connection to ws:// was interrupted while the page was loading.”
    The serial monitor output is shown below:

    21:05:51.758 -> [D][esp32-hal-psram.c:47] psramInit(): PSRAM enabled
    21:05:51.827 ->
    21:05:52.586 -> [D][WiFiGeneric.cpp:336] _eventCallback(): Event: 0 – WIFI_READY
    21:05:52.586 -> [D][WiFiGeneric.cpp:336] _eventCallback(): Event: 2 – STA_START
    21:05:52.796 -> [D][WiFiGeneric.cpp:336] _eventCallback(): Event: 4 – STA_CONNECTED
    21:05:52.831 -> [D][WiFiGeneric.cpp:336] _eventCallback(): Event: 7 – STA_GOT_IP
    21:05:52.831 -> [D][WiFiGeneric.cpp:379] _eventCallback(): STA IP:, MASK:, GW:
    21:05:53.071 -> .
    21:05:53.071 -> WiFi connected
    21:05:53.108 -> httpd_start
    21:05:53.108 -> Camera Ready! Use ‘’ to connect
    21:05:53.108 -> Code file: ESP-who-recognition-with-names
    21:05:58.803 -> [D][WiFiClient.cpp:482] connected(): Disconnected: RES: 0, ERR: 128

    I am using core 1.0.3-rc1 and the hardware works successfully with the example:
    ESP->camera->webserver successfully so I believe the hardware is OK. To make sure that the file “camera_index.h” had not become corrupt, I redownloaded the file and repeated the compilation but I still get the same error(s).
    I am using the most recent Arduino web sockets library (version 0.4.9) and since the error appears to be with the websockets library I downgraded to the original version you had used in your tutorial – version 0.4.0 but still get the same error(s).
    Can you advise how I might track down the error please?

    1. WordBot says:


      I’ve seen errors like this during various project testing but usually after running for a while. I’m not sure if it’s a bug in the WiFiClient library, a bug in my code, trying to process to much data, or just the module overheating and the WiFi quitting. I’ve been waiting for them to release 1.0.3 final to see if it gets better.

      This tools gives more information about what happened during a crash if the module crashes with an exception –

  13. Alan says:

    Hi there, many thanks for your speedy reply. The error occurs when connecting a browser to the ESP and is consistent. The code doesn’t crash so the exception decoder doesn’t help. I have been inserting Serial.print’s to trace exactly where the error occurs. I have narrowed the search down and the error occurs in the function “index_handler” when sending the file “index_ov2640_html_gz”. I have done quite a lot of searching for errors similar to ones reported in the browser and they seem to point to a problem in the WiFiClient library: reference:
    I do not believe there is a bug in your code. I plan to continue to track the problem down and will update you if I find anything meaningful. I too await formal release of 1.0.3.
    Many thanks for your response so far.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

scroll to top