Analyze a Scene with an ESP32-CAM

Using the Microsoft Cloud Vision API with an ESP32-CAM to describe a scene with audio.

This project uses the ‘describe’ API of the Microsoft Cognitive AI system to describe an entire scene rather than individual objects. Objects in the scene are also provided as part of the response.

The project connects an ESP32-CAM to an LCD display and optionally a DAC amplifier and speaker to read out and display a text description of a scene returned from Microsoft’s server.

Video Demonstration

Components

You need the following items:

ESP32-CAM: https://s.click.aliexpress.com/e/_AfPjT7
LCD Screen with 3.3V backpack: https://es.aliexpress.com/item/32774955921.html
Capacitive Touch Button: https://es.aliexpress.com/item/32964219843.html
Max98357 DAC amplifier (optional): https://es.aliexpress.com/item/33043664469.html
Small speaker (optional): https://s.click.aliexpress.com/e/_AenhyU

A 5v power supply for the ESP32-CAM is needed. I used a USB power bank as shown above.

The 3D printed parts and assembly looks like this:

Fully assembled, the project looks like this:

Microsoft Azure Cognitive Services

First sign up for a free 12 months of Azure Cognitive Services here – https://azure.microsoft.com/en-gb/free/cognitive-services/ You need a Microsoft account (Hotmail, Outlook, Live etc) and credit card but it won’t be charged.

A possible alternative is signing up to the free Basic Account with RapidApi here – Microsoft Computer Vision API: Pricing & Cost (microsoft-azure-org-microsoft-cognitive-services) | RapidAPI where no card is needed. I’ll try to test and update the tutorial to work with this option.

When you have the Azure account set up click here: https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision to set up the instance on the server. Fill in the details similar to the screenshot below:

When completed, click ‘Go to resource’. In the resource, click Overview at the top of the left menu and then in the panel on the right that opens, ‘Click here to manage keys’.

Arduino IDE Setup

In the Arduino IDE Library Manager (Sketch > Include Library > Manage Libraries… ), install the following libraries:
ArduinoJson (v6 or greater) by Benoit Blanchon
Extensible hd44780 LCD Library by Bill Perry

Optionally for the audio output version:
ESP8266Audio by Earle F. Philhower
SP8266SAM library from https://github.com/earlephilhower/ESP8266SAM, either by unzipping the download in the IDE library folder or just import the zip file with ‘Sketch > Include Library > Add ZIP Library…’

Copy or download the Sketch from Github here: https://github.com/robotzero1/esp32cam-cognitive-scene. You need to make the following changes to the code:

Change the Wi-Fi ssid and password to your router
Change host to be your server at Microsoft
Change the POST URL (around line 152) to the correct URL for your server

Upload the sketch to the ESP32-CAM.

Wiring Diagram

Connect the pins as below for the LCD only version. Audio version wiring diagram is in section below.

Audio Version

3D printer files: https://robotzero.one/wp-content/uploads/2021/02/Scene-Analyzer-STL-Files.zip
Arduino Sketch with audio: https://pastebin.com/GCLqEFp0

Improvements

Investigate using a cloud service to convert the text the speech as a sound file and play this to improve the quality of the sound.

References:

More Voices for SAM: https://github.com/earlephilhower/ESP8266SAM/issues/13
More info about SAM Text to Speech: https://simulationcorner.net/index.php?page=sam

Post Views: 1,548

9 Replies to “Analyze a Scene with an ESP32-CAM”

Visnu Bharat says:

December 14, 2021 at 12:30 pm

Hi,,Thanks for the code.

I’m getting Guru meditation Error -Load Prohibited after running the code.EXCVADDER points to NULL..Please advise

WordBot says:

December 14, 2021 at 1:34 pm

Hi, Which version of the Arduino IDE and ESP32 Hardware libraries are you using?

Reply

Sitsopé SEKPONA says:

January 2, 2022 at 6:51 pm

Hello. I’m menber. How can i get the Audio version wiring diagram? I have the code of audio part. But not the diagram. How can i get it? Help me please.

WordBot says:

January 2, 2022 at 7:18 pm

Hi, sorry not sure why that was missing. I’ve just added into the tutorial.

Reply

Alfons says:

March 27, 2022 at 8:31 pm

Muy buen artículo !!!!
ya dispongo de una placa ESP32-CAM voy a comprar el resto de componentes.
si me das permiso, publico tu artículo en mi página de facebook, https://www.facebook.com/xamarin.bcn.3
Quedo a la espera de tu confirmación y te mantendré informado.
Te felicito por el trabajo realizado !!!!

WordBot says:

March 28, 2022 at 9:41 am

Gracias! Sí, puedes publicar el artículo en tu página.

Reply

Steve says:

June 27, 2022 at 7:01 pm

Great tutorials!

Having trouble with a 404.

const char* host = “uksouth.api.cognitive.microsoft.com”; //edit for your chosen server
const char* Ocp_Apim_Subscription_Key = “SUB ID or API Key ID?”;
const int Port = 443;
const char* boundry = “dgbfhfh”;

Tried all kinds of combinations etc, but always results in Connecting to uksouth.api.cognitive.microsoft.com:443… Failure in connection with the server. Also tried my endpoint URL in manage keys in Azure.

Any help is appreciated!

Steve says:

June 28, 2022 at 9:01 am

Hi,

If WifiClientSecure can’t connect and this results in guru errors because of invalid host, just add:

client.setInsecure(); //Add this to connect!
Serial.printf(“Connecting to %s:%d… “, host, Port);

Great tutorial, and yes adding a Google TTS output would be awesome!

WordBot says:

June 28, 2022 at 9:55 am

Cool, glad you got it working. The main ESP32 Arduino library keeps being updated and with the libraries changing as well the tutorials can go out of date fast.

Reply

Robot Zero One

Analyze a Scene with an ESP32-CAM