More on Technology
OpenAI has announced voice and image capabilities for its widely used AI-driven chatbot, ChatGPT. These newly integrated capabilities provide a more natural conversational experience with ChatGPT by speaking to it and showing images.
This allows several opportunities to use OpenAI’s ChatGPT in everyday activities. For instance, while traveling, you can send ChatGPT a picture of a landmark and have a real-time discussion. Similarly, at home, you can click photos of the contents of your refrigerator and explore meal ideas or request step-by-step recipes.
OpenAI will gradually introduce these features to Plus and Enterprise users in the coming weeks. The voice feature will be accessible through mobile apps, while the image feature will be available across all platforms.
The new voice functionality enables you to engage in spoken conversations with ChatGPT. It can respond audibly using one of five synthesized voices. You can opt-in by adjusting your iOS and Android mobile app settings to activate this feature.
OpenAI has revealed that this voice feature relies on an advanced text-to-speech model, which has been trained on recordings from voice actors. For speech recognition, it uses Whisper, OpenAI’s open-source speech system.
To start, follow these simple instructions:
You can now show one or multiple images to ChatGPT, which provides visual context and focuses the conversation.
For instance, sharing a picture of a malfunctioning appliance can help ChatGPT identify problems and propose solutions. On mobile, a drawing tool is available, allowing you to encircle or highlight specific areas within an image.
These image-related features leverage a multimodal version of the GPT-3.5 and GPT-4 models, specially refined to analyze visual inputs. OpenAI conducted rigorous safety assessments to evaluate potential risks before introducing these image capabilities.
To begin using image prompts, follow these simple instructions:
OpenAI is focusing on taking a cautious approach to roll out its features.
The newly introduced voice technology brings the potential for innovative applications and raises concerns like public figure impersonation. The voice feature is currently restricted to conversational chat to reduce risks.
OpenAI has implemented limitations for images to prevent ChatGPT from directly analyzing people in photos. The system advises refraining from engaging in high-risk use cases without verification.
OpenAI’s recent announcement marks a significant milestone in the evolution of ChatGPT, introducing voice and image capabilities that promise a more natural and immersive conversational experience. These enhancements open doors to countless practical applications, from on-the-go travel help to creative meal planning at home. However, introducing new technology imposes high risks like safety and public figure impersonation.
Overall, this advancement showcases the potential of AI-driven chatbots like ChatGPT to provide more versatile and engaging interactions while prioritizing user safety and security. It’s a significant step forward in the ongoing development of conversational AI.
OpenAI’s ChatGPT introduces voice and image capabilities that promise a more natural and immersive conversational experience.