ESP-WHO Face Recognition with WebSocket Communication


Using the ESP-WHO library to record faces with names and then display the name when a face is recognised.

This project uses the ArduinoWebsockets library for two way communication between the ESP32 and the browser. All the face detection, capturing and recognising are done on the ESP32. The browser sends instructions and receives notifications via WebSockets for updating the interface. The same WebSocket library is used to send the camera data to the browser as binary blobs.

Video showing the interface in action

The Interface

The interface consists a camera feed plus the following elements:

A status area, showing the current status of the ESP32:

Interface Status

A form field to enter the name of the person:

Interface Name

Four buttons that control the ESP32. They are STREAM to just stream the frames from the camera, DETECT for detecting faces in the stream, CAPTURE for capturing the current face and RECOGNISE for matching a face from the camera to a previous captured face:

Interface Buttons

If a face has been captured it can be seen in the list under Captured Faces. A face can be deleted by clicking the X next to it:

Interface Captured Face

DELETE ALL will delete all faces stored on the ESP32:

Interface Delete

Setting Up

If you’ve not set up or tested your ESP32 Camera in the Arduino IDE yet then please follow this tutorial first: You will also need to set up persistent storage on your board. Follow the steps under Persistent Storage Partition Scheme in this tutorial:

This application needs the latest version of the ESP32 package in the Arduino IDE. Update the ESP32 board library to 1.0.2 or higher. Tools > Board > Board Manager:

Arduino Board Manager

You also need to install the WebSockets library in the IDE by navigating Tools > Manage Libraries and searching for arduinowebsockets and installing it. Version 0.4.5 works for me.

Library Manager showing ArduinoWebsockets

Copy and paste the Sketch below and save it as a new Sketch. Add to the folder where the Sketch has been saved these two files: camera_index.h and camera_pins.h . camera_index.h is the HTML for the interface and camera_pins.h is the camera definitions.

In the pasted Sketch, edit the ssid and password to match your WiFi and uncomment the define for the camera you are using.

The Code

#include <ArduinoWebsockets.h>
#include "esp_http_server.h"
#include "esp_timer.h"
#include "esp_camera.h"
#include "camera_index.h"
#include "Arduino.h"
#include "fd_forward.h"
#include "fr_forward.h"
#include "fr_flash.h"

const char* ssid = "NSA";
const char* password = "orange";


// Select camera model
#include "camera_pins.h"

using namespace websockets;
WebsocketsServer socket_server;

camera_fb_t * fb = NULL;

long current_millis;
long last_detected_millis = 0;

void app_facenet_main();
void app_httpserver_init();

typedef struct
  uint8_t *image;
  box_array_t *net_boxes;
  dl_matrix3d_t *face_id;
} http_img_process_result;

static inline mtmn_config_t app_mtmn_config()
  mtmn_config_t mtmn_config = {0};
  mtmn_config.type = FAST;
  mtmn_config.min_face = 80;
  mtmn_config.pyramid = 0.707;
  mtmn_config.pyramid_times = 4;
  mtmn_config.p_threshold.score = 0.6;
  mtmn_config.p_threshold.nms = 0.7;
  mtmn_config.p_threshold.candidate_number = 20;
  mtmn_config.r_threshold.score = 0.7;
  mtmn_config.r_threshold.nms = 0.7;
  mtmn_config.r_threshold.candidate_number = 10;
  mtmn_config.o_threshold.score = 0.7;
  mtmn_config.o_threshold.nms = 0.7;
  mtmn_config.o_threshold.candidate_number = 1;
  return mtmn_config;
mtmn_config_t mtmn_config = app_mtmn_config();

face_id_name_list st_face_list;
static dl_matrix3du_t *aligned_face = NULL;

httpd_handle_t camera_httpd = NULL;

typedef enum
} en_fsm_state;
en_fsm_state g_state;

typedef struct
  char enroll_name[ENROLL_NAME_LEN];
} httpd_resp_value;

httpd_resp_value st_name;

void setup() {

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  //init with high specs to pre-allocate larger buffers
  if (psramFound()) {
    config.frame_size = FRAMESIZE_UXGA;
    config.jpeg_quality = 10;
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_SVGA;
    config.jpeg_quality = 12;
    config.fb_count = 1;

  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);

  // camera init
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);

  sensor_t * s = esp_camera_sensor_get();
  s->set_framesize(s, FRAMESIZE_QVGA);

  s->set_vflip(s, 1);
  s->set_hmirror(s, 1);

  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
  Serial.println("WiFi connected");


  Serial.print("Camera Ready! Use 'http://");
  Serial.println("' to connect");

static esp_err_t index_handler(httpd_req_t *req) {
  httpd_resp_set_type(req, "text/html");
  httpd_resp_set_hdr(req, "Content-Encoding", "gzip");
  return httpd_resp_send(req, (const char *)index_ov2640_html_gz, index_ov2640_html_gz_len);

httpd_uri_t index_uri = {
  .uri       = "/",
  .method    = HTTP_GET,
  .handler   = index_handler,
  .user_ctx  = NULL

void app_httpserver_init ()
  httpd_config_t config = HTTPD_DEFAULT_CONFIG();
  if (httpd_start(&camera_httpd, &config) == ESP_OK)
    httpd_register_uri_handler(camera_httpd, &index_uri);

void app_facenet_main()
  face_id_name_init(&st_face_list, FACE_ID_SAVE_NUMBER, ENROLL_CONFIRM_TIMES);
  aligned_face = dl_matrix3du_alloc(1, FACE_WIDTH, FACE_HEIGHT, 3);

static inline int do_enrollment(face_id_name_list *face_list, dl_matrix3d_t *new_id)
  int left_sample_face = enroll_face_id_to_flash_with_name(face_list, new_id, st_name.enroll_name);
  ESP_LOGD(TAG, "Face ID %s Enrollment: Sample %d",
           ENROLL_CONFIRM_TIMES - left_sample_face);
  return left_sample_face;

static esp_err_t send_face_list(WebsocketsClient &client)
  client.send("delete_faces"); // tell browser to delete all faces
  face_id_node *head = st_face_list.head;
  char add_face[64];
  for (int i = 0; i < st_face_list.count; i++) // loop current faces
    sprintf(add_face, "listface:%s", head->id_name);
    client.send(add_face); //send face to browser
    head = head->next;

static esp_err_t delete_all_faces(WebsocketsClient &client)

void handle_message(WebsocketsClient &client, WebsocketsMessage msg)
  if ( == "stream") {
    g_state = START_STREAM;
  if ( == "detect") {
    g_state = START_DETECT;
  if (, 8) == "capture:") {
    g_state = START_ENROLL;
    char person[FACE_ID_SAVE_NUMBER * ENROLL_NAME_LEN] = {0,};, sizeof(person));
    memcpy(st_name.enroll_name, person, strlen(person) + 1);
  if ( == "recognise") {
    g_state = START_RECOGNITION;
  if (, 7) == "remove:") {
    char person[ENROLL_NAME_LEN * FACE_ID_SAVE_NUMBER];, sizeof(person));
    delete_face_id_in_flash_with_name(&st_face_list, person);
    send_face_list(client); // reset faces in the browser
  if ( == "delete_all") {

void loop() {
  auto client = socket_server.accept();
  dl_matrix3du_t *image_matrix = dl_matrix3du_alloc(1, 320, 240, 3);
  http_img_process_result out_res = {0};
  out_res.image = image_matrix->item;


  while (client.available()) {

    fb = esp_camera_fb_get();

    if (g_state == START_DETECT || g_state == START_ENROLL || g_state == START_RECOGNITION)
      out_res.net_boxes = NULL;
      out_res.face_id = NULL;

      fmt2rgb888(fb->buf, fb->len, fb->format, out_res.image);

      out_res.net_boxes = face_detect(image_matrix, &mtmn_config);

      if (out_res.net_boxes)
        if (align_face(out_res.net_boxes, image_matrix, aligned_face) == ESP_OK)

          out_res.face_id = get_face_id(aligned_face);
          last_detected_millis = millis();
          if (g_state == START_DETECT) {
            client.send("FACE DETECTED");

          if (g_state == START_ENROLL)
            int left_sample_face = do_enrollment(&st_face_list, out_res.face_id);
            char enrolling_message[64];
            sprintf(enrolling_message, "SAMPLE NUMBER %d FOR %s", ENROLL_CONFIRM_TIMES - left_sample_face, st_name.enroll_name);
            if (left_sample_face == 0)
              ESP_LOGI(TAG, "Enrolled Face ID: %s", st_face_list.tail->id_name);
              g_state = START_STREAM;
              char captured_message[64];
              sprintf(captured_message, "FACE CAPTURED FOR %s", st_face_list.tail->id_name);


          if (g_state == START_RECOGNITION  && (st_face_list.count > 0))
            face_id_node *f = recognize_face_with_name(&st_face_list, out_res.face_id);
            if (f)
              char recognised_message[64];
              sprintf(recognised_message, "RECOGNISED %s", f->id_name);
              client.send("FACE NOT RECOGNISED");

        if (g_state != START_DETECT) {
          client.send("NO FACE DETECTED");

      if (g_state == START_DETECT && millis() - last_detected_millis > 500) { // Detecting but no face detected


    client.sendBinary((const char *)fb->buf, fb->len);

    fb = NULL;


At the moment the recognition code always finds a matching face even when a face hasn’t been captured. This problem has been fixed but isn’t in the Arduino release code yet.


Arduino WebSocket library I used:
ESP-WHO WeChat Example for the face recognition with names code –
Hidden face photo by Honey Yanibel Minaya Cruz on Unsplash

4 Replies to “ESP-WHO Face Recognition with WebSocket Communication”

  1. Peter says:

    I have the same Problem with the the same Code for my ttgo pir board. It recognizes every face as enrolled face.
    After I downgraded to esp-face 0.34, everything is good. My Intension is a face recognation Background Task, wake up by pir and sending Image and message per MQTT. For watching i use a simple webserver.


    1. WordBot says:

      Hi, There’s a couple of issues and a possible solution on the ESP-WHO Github page:

  2. Cobiam says:

    Good day robotzero, first I thank you for your great work to publish and explain to a large extent this project of libre use. I am a manager of technological projects in a Colombian university and we want to know if we can in any way implement or embed this project in Moodle.
    We also need a good manual to emulate the use of this project on a Debian 9 virtual machine and later try to include it in Moodle for the use of facial recognition for the presentation of virtual exams.
    Thank you very much for your collaboration and support.
    Happy day.

    1. WordBot says:

      Hi there. I don’t think you will be able to use the ESP32 like this easily or at all. You are better to look for something in OpenCV or similar libraries.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

scroll to top