텍스트나 이미지로부터 동영상을 생성하는 AI 「Stable Video Diffusion」을 Stability AI가 공개

AI · 인공지능/이미지 생성 AI

텍스트나 이미지로부터 동영상을 생성하는 AI 「Stable Video Diffusion」을 Stability AI가 공개

두우우부 2023. 11. 23. 12:48

이미지 생성 AI 「Stable Diffusion」을 개발하는 Stability AI가 텍스트나 화상으로부터 고해상도의 동영상을 생성할 수 있는 잠재 동영상 확산 모델 「Stable Video Diffusion 」을 공개했습니다.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets — Stability AI

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets — Stability AI

We present Stable Video Diffusion — a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.

stability.ai

Stable Video Diffusion은 연구 미리 보기로 게시되며 소스 코드는 GitHub 리포지토리에 공개됩니다.

GitHub - Stability-AI/generative-models: Generative Models by Stability AI
https://github.com/Stability-AI/generative-models

GitHub - Stability-AI/generative-models: Generative Models by Stability AI

Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.

github.com

또한 로컬에서 모델을 실행하는 데 필요한 가중치는 HuggingFace에서 확인할 수 있습니다.

stabilityai/stable-video-diffusion-img2vid-xt · Hugging Face
https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

stabilityai/stable-video-diffusion-img2vid-xt · Hugging Face

Stable Video Diffusion Image-to-Video Model Card Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. Model Details Model Description (SVD) Image-to-Video is a

huggingface.co

Stable Video Diffusion은 14 프레임과 25 프레임을 생성할 수 있는 2가지 이미지 to Video 모델로 출시되었으며 3fps ~ 30fps로 사용자 정의 가능한 프레임 속도로 동영상을 생성할 수 있습니다.

"Ice dragon in the mountains(산 속의 아이스 드래곤)"를 입력하면 그대로의 애니메이션이 생성됩니다.

"Astronaut walking on the moon(달을 걷는 우주 비행사)"

"Two blue jays on the top of building(건물 꼭대기에 머무는 2마리의 어치 )"

Stability AI는 runway Research의 GEN-2와 pika.art의 PikaLabs의 사용자에 의한 영상 품질 평가(세로축)를 비교한 결과입니다. 14 프레임으로 생성한 Stable Video Diffusion(보라색)의 비교는 이렇습니다.

25 프레임 생성할 수 있는 Stable Video Diffusion XT(보라색)의 경우가 이하.

저작자표시 비영리 변경금지