SoundHound combines voice recognition with real-time image analysis

Written by

Published 13 Aug 2025

Fact checked by

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

disclosure

soundhound ai voice vision technology

SoundHound AI became the first company to blend voice and vision technology for business use when it launched Vision AI on Thursday.

The Santa Clara firm’s new system processes speech and visual information at the same time. This creates more natural conversations between people and machines.

    “At SoundHound, we believe the future of AI isn’t just multimodal – it’s deeply integrated, responsive, and built for real-world impact,” said CEO Keyvan Mohajer. “With Vision AI, we’re extending our leadership in voice and conversational AI to redefine how humans interact with products and services offered and used by businesses.”

    The technology connects live camera feeds with voice commands. Devices can now understand both what users say and see without delays.

    Vision AI targets practical business problems. A mechanic wearing smart glasses could look at engine parts and ask for repair help without putting down tools. Store workers could scan shelves by looking at them to instantly check inventory.

    Drive-thru restaurants could confirm orders visually while customers speak. Car passengers could point at buildings and ask questions to get immediate answers.

    “With Vision AI, we are fusing visual recognition and conversational intelligence into a single, synchronized flow,” said Pranav Singh, VP of Engineering. “Every frame, every utterance, every intent is interpreted within the same ecosystem – ensuring faster, more natural user experiences that scale across surfaces from kiosks to embedded devices.”

    The launch follows strong earnings reported on August 7. Revenue jumped 217 percent to $43 million, beating Wall Street’s $33 million estimate. SoundHound also narrowed its loss to 3 cents per share, better than the expected 6-cent loss.

    Stock prices soared 17.3 percent on Monday after the Vision AI announcement. Investors see potential in voice commerce, which SoundHound estimates could reach $35 billion annually for automakers.

    The company works with Mercedes-Benz, Honda, and Hyundai. Its voice technology operates in more than 14,000 restaurant locations worldwide.

    SoundHound also released Amelia 7.1 this month, speeding up AI agents and giving businesses more control. The update includes better accuracy and clearer data tracking.

    Management raised full-year revenue guidance to $160-178 million from the previous $157-177 million range. The company expects profitability by year-end 2025.

    Vision AI integrates with SoundHound’s Polaris platform across mobile devices, cars, kiosks, and embedded systems. The technology eliminates typing or scanning by processing voice and visual data together.

    This puts SoundHound ahead of Google, Amazon, and Baidu in creating integrated multimodal AI for businesses. Other companies offer separate voice and vision tools, but SoundHound combines both into one system.