OpenAI launches advanced speech tools with improved accuracy
OpenAI has announced new AI voice models that enhance their existing tools. These models are part of a trend in generative AI called agentic models, which perform tasks like booking flights or changing orders through conversations. The new models include Gpt-4o-transcribe and gpt-4o-mini-transcribe for speech-to-text, along with Gpt-4o-mini-tts for text-to-speech. Developers can access these tools through the OpenAI API, allowing them to create various AI applications. OpenAI aims to make interactions with AI more intuitive. However, the introduction of advanced voice capabilities also raises concerns. There is a risk that these synthetic voices could be used in scams, making it easier for fraudsters to mislead people. The newly released tools are designed for high accuracy and reliability, even in challenging conditions like noisy environments and conversations with different accents. They are well-suited for customer service centers and meeting transcriptions. Users can also customize the voices to match different tones, such as cheerful or dramatic. In the future, OpenAI plans to allow developers to create "custom voices" for personalized experiences. The company is also exploring the possibility of incorporating video into their agentic AI, indicating that more innovations may be on the horizon.