Analyze a Scene with an ESP32-CAM

ESP32-CAM with LCD and Speaker

Using the Microsoft Cloud Vision API with an ESP32-CAM to describe a scene with audio.

This project uses the ‘describe’ API of the Microsoft Cognitive AI system to describe an entire scene rather than individual objects. Objects in the scene are also provided as part of the response.

The project connects an ESP32-CAM to an LCD display and optionally a DAC amplifier and speaker to read out and display a text description of a scene returned from Microsoft’s server.

Scene Analyzer Assembled Front
Scene Analyzer Assembled Rear

Video Demonstration

Components

You need the following items:

ESP32-CAM: https://s.click.aliexpress.com/e/_AfPjT7
LCD Screen with 3.3V backpack: https://es.aliexpress.com/item/32774955921.html
Capacitive Touch Button: https://es.aliexpress.com/item/32964219843.html
Max98357 DAC amplifier (optional): https://es.aliexpress.com/item/33043664469.html
Small speaker (optional): https://s.click.aliexpress.com/e/_AenhyU

Project Components

A 5v power supply for the ESP32-CAM is needed. I used a USB power bank as shown above.

The 3D printed parts and assembly looks like this:

3D Prints Front
3D Prints Back
3D Prints Assembled

Fully assembled, the project looks like this:

Assembled Project Front
Assembled Project Rear

Microsoft Azure Cognitive Services

First sign up for a free 12 months of Azure Cognitive Services here – https://azure.microsoft.com/en-gb/free/cognitive-services/ You need a Microsoft account (Hotmail, Outlook, Live etc) and credit card but it won’t be charged.

A possible alternative is signing up to the free Basic Account with RapidApi here – Microsoft Computer Vision API: Pricing & Cost (microsoft-azure-org-microsoft-cognitive-services) | RapidAPI where no card is needed. I’ll try to test and update the tutorial to work with this option.

When you have the Azure account set up click here: https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision to set up the instance on the server. Fill in the details similar to the screenshot below:

Resource Sign Up Screen

When completed, click ‘Go to resource’. In the resource, click Overview at the top of the left menu and then in the panel on the right that opens, ‘Click here to manage keys’.

Arduino IDE Setup

In the Arduino IDE Library Manager (Sketch > Include Library > Manage Libraries… ), install the following libraries:
ArduinoJson (v6 or greater) by Benoit Blanchon
Extensible hd44780 LCD Library by Bill Perry

Optionally for the audio output version:
ESP8266Audio by Earle F. Philhower
SP8266SAM library from https://github.com/earlephilhower/ESP8266SAM, either by unzipping the download in the IDE library folder or just import the zip file with ‘Sketch > Include Library > Add ZIP Library…’

Copy or download the Sketch from Github here: https://github.com/robotzero1/esp32cam-cognitive-scene. You need to make the following changes to the code:

  • Change the Wi-Fi ssid and password to your router
  • Change host to be your server at Microsoft
  • Change the POST URL (around line 152) to the correct URL for your server

Upload the sketch to the ESP32-CAM.

Wiring Diagram

Connect the pins as below for the LCD only version. Audio version wiring diagram is in section below.

ESP32-CAM LCD Scene Analyser Wiring

Audio Version

ESP32-CAM LCD DAC Scene Analyser Wiring

3D printer files: https://robotzero.one/wp-content/uploads/2021/02/Scene-Analyzer-STL-Files.zip
Arduino Sketch with audio: https://pastebin.com/GCLqEFp0

Improvements

Investigate using a cloud service to convert the text the speech as a sound file and play this to improve the quality of the sound.

References:

More Voices for SAM: https://github.com/earlephilhower/ESP8266SAM/issues/13
More info about SAM Text to Speech: https://simulationcorner.net/index.php?page=sam

9 Replies to “Analyze a Scene with an ESP32-CAM”

  1. Visnu Bharat says:

    Hi,,Thanks for the code.

    I’m getting Guru meditation Error -Load Prohibited after running the code.EXCVADDER points to NULL..Please advise

    1. WordBot says:

      Hi, Which version of the Arduino IDE and ESP32 Hardware libraries are you using?

  2. Sitsopé SEKPONA says:

    Hello. I’m menber. How can i get the Audio version wiring diagram? I have the code of audio part. But not the diagram. How can i get it? Help me please.

    1. WordBot says:

      Hi, sorry not sure why that was missing. I’ve just added into the tutorial.

  3. Alfons says:

    Muy buen artículo !!!!
    ya dispongo de una placa ESP32-CAM voy a comprar el resto de componentes.
    si me das permiso, publico tu artículo en mi página de facebook, https://www.facebook.com/xamarin.bcn.3
    Quedo a la espera de tu confirmación y te mantendré informado.
    Te felicito por el trabajo realizado !!!!

    1. WordBot says:

      Gracias! Sí, puedes publicar el artículo en tu página.

  4. Steve says:

    Great tutorials!

    Having trouble with a 404.

    const char* host = “uksouth.api.cognitive.microsoft.com”; //edit for your chosen server
    const char* Ocp_Apim_Subscription_Key = “SUB ID or API Key ID?”;
    const int Port = 443;
    const char* boundry = “dgbfhfh”;

    Tried all kinds of combinations etc, but always results in Connecting to uksouth.api.cognitive.microsoft.com:443… Failure in connection with the server. Also tried my endpoint URL in manage keys in Azure.

    Any help is appreciated!

  5. Steve says:

    Hi,

    If WifiClientSecure can’t connect and this results in guru errors because of invalid host, just add:

    client.setInsecure(); //Add this to connect!
    Serial.printf(“Connecting to %s:%d… “, host, Port);

    Great tutorial, and yes adding a Google TTS output would be awesome!

    1. WordBot says:

      Cool, glad you got it working. The main ESP32 Arduino library keeps being updated and with the libraries changing as well the tutorials can go out of date fast.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

scroll to top