Tel Aviv-based D-ID released today the first multimodal generative AI video platform to combine text, image and animation in one interface. The self-service video platform integrates D-ID’s proprietary generative AI technology with GPT-3 from Open AI and Stable Diffusion from Stability AI, allowing users to generate digital composite faces and speech in 119 languages based on their text prompts.
“This is a game changer for creators,” says Gil Perry, D-ID co-founder and CEO. “It’s the bleeding edge of generative AI,” he asserts, touting the startup’s expertise in deep learning and computer vision. When I talked to Perry last year, he said that the company’s long-term vision is “to lead the next disruption in the video entertainment space by creating AI-generated synthetic media in a responsible way.”
In the rapidly evolving generative AI space, “long-term” means “next year,” so now Perry talks about providing “digital humans” to enterprises, “transforming the way we communicate with machines and elevating our capabilities as humans.” He hopes that sometime next year, we could chat with the digital humans we will create with D-ID’s help.