Using the Microsoft Cloud Vision API with an ESP32-CAM to describe a scene with audio.
This project uses the ‘describe’ API of the Microsoft Cognitive AI system to describe an entire scene rather than individual objects. Objects in the scene are also provided as part of the response.
The project connects an ESP32-CAM to an LCD display and optionally a DAC amplifier and speaker to read out and display a text description of a scene returned from Microsoft’s server.
Video Demonstration
Components
You need the following items:
ESP32-CAM: https://s.click.aliexpress.com/e/_AfPjT7
LCD Screen with 3.3V backpack: https://es.aliexpress.com/item/32774955921.html
Capacitive Touch Button: https://es.aliexpress.com/item/32964219843.html
Max98357 DAC amplifier (optional): https://es.aliexpress.com/item/33043664469.html
Small speaker (optional): https://s.click.aliexpress.com/e/_AenhyU
A 5v power supply for the ESP32-CAM is needed. I used a USB power bank as shown above.
The 3D printed parts and assembly looks like this:
Fully assembled, the project looks like this:
Microsoft Azure Cognitive Services
First sign up for a free 12 months of Azure Cognitive Services here – https://azure.microsoft.com/en-gb/free/cognitive-services/ You need a Microsoft account (Hotmail, Outlook, Live etc) and credit card but it won’t be charged.
A possible alternative is signing up to the free Basic Account with RapidApi here – Microsoft Computer Vision API: Pricing & Cost (microsoft-azure-org-microsoft-cognitive-services) | RapidAPI where no card is needed. I’ll try to test and update the tutorial to work with this option.
When you have the Azure account set up click here: https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision to set up the instance on the server. Fill in the details similar to the screenshot below:
When completed, click ‘Go to resource’. In the resource, click Overview at the top of the left menu and then in the panel on the right that opens, ‘Click here to manage keys’.
Arduino IDE Setup
In the Arduino IDE Library Manager (Sketch > Include Library > Manage Libraries… ), install the following libraries:
ArduinoJson (v6 or greater) by Benoit Blanchon
Extensible hd44780 LCD Library by Bill Perry
Optionally for the audio output version:
ESP8266Audio by Earle F. Philhower
SP8266SAM library from https://github.com/earlephilhower/ESP8266SAM, either by unzipping the download in the IDE library folder or just import the zip file with ‘Sketch > Include Library > Add ZIP Library…’
Copy or download the Sketch from Github here: https://github.com/robotzero1/esp32cam-cognitive-scene. You need to make the following changes to the code:
- Change the Wi-Fi ssid and password to your router
- Change host to be your server at Microsoft
- Change the POST URL (around line 152) to the correct URL for your server
Upload the sketch to the ESP32-CAM.
Wiring Diagram
Connect the pins as below for the LCD only version. Audio version wiring diagram is in section below.
Audio Version
3D printer files: https://robotzero.one/wp-content/uploads/2021/02/Scene-Analyzer-STL-Files.zip
Arduino Sketch with audio: https://pastebin.com/GCLqEFp0
Improvements
Investigate using a cloud service to convert the text the speech as a sound file and play this to improve the quality of the sound.
References:
More Voices for SAM: https://github.com/earlephilhower/ESP8266SAM/issues/13
More info about SAM Text to Speech: https://simulationcorner.net/index.php?page=sam
Buy Me A Coffee
If you found something useful above please say thanks by buying me a coffee here...
Hi,,Thanks for the code.
I’m getting Guru meditation Error -Load Prohibited after running the code.EXCVADDER points to NULL..Please advise
Hi, Which version of the Arduino IDE and ESP32 Hardware libraries are you using?
Hello. I’m menber. How can i get the Audio version wiring diagram? I have the code of audio part. But not the diagram. How can i get it? Help me please.
Hi, sorry not sure why that was missing. I’ve just added into the tutorial.
Muy buen artículo !!!!
ya dispongo de una placa ESP32-CAM voy a comprar el resto de componentes.
si me das permiso, publico tu artículo en mi página de facebook, https://www.facebook.com/xamarin.bcn.3
Quedo a la espera de tu confirmación y te mantendré informado.
Te felicito por el trabajo realizado !!!!
Gracias! Sí, puedes publicar el artículo en tu página.
Great tutorials!
Having trouble with a 404.
const char* host = “uksouth.api.cognitive.microsoft.com”; //edit for your chosen server
const char* Ocp_Apim_Subscription_Key = “SUB ID or API Key ID?”;
const int Port = 443;
const char* boundry = “dgbfhfh”;
Tried all kinds of combinations etc, but always results in Connecting to uksouth.api.cognitive.microsoft.com:443… Failure in connection with the server. Also tried my endpoint URL in manage keys in Azure.
Any help is appreciated!
Hi,
If WifiClientSecure can’t connect and this results in guru errors because of invalid host, just add:
client.setInsecure(); //Add this to connect!
Serial.printf(“Connecting to %s:%d… “, host, Port);
Great tutorial, and yes adding a Google TTS output would be awesome!
Cool, glad you got it working. The main ESP32 Arduino library keeps being updated and with the libraries changing as well the tutorials can go out of date fast.