Overview

How it works

Host your assets at a publicly accessible URL

Upload your video, photo, and audio files so our servers can retrieve them.

Send an API request with the appropriate parameters

Reference your hosted assets and specify your desired mode (Standard or Precision).

Wait or query status

Use our webhook callback or poll the API with your job ID until processing is complete.

Download video output

Retrieve the finished talking photo or lip‑synced video from the provided URL.

Usage Limitation:

You may have up to 10 concurrent jobs (including queued requests).

Only single‑face videos or photos are supported.

Estimated queue time: 1–120 minutes, depending on system load.

Standard Mode processing time: ~10 minutes.

Precision Mode processing time: ~20 minutes.

If a video or photo contains multiple faces, only the largest detected face will be lip‑synced.

API Error Codes

Code	Description
5	Invalid request parameters.
7	No permission to request.
104	Insufficient credits.
814	Your account is not a member and is not allowed to call the API.
1000	Internal Server Error.
1301	Callback Challenge failed.
1302	API key has been revoked.
1304	API key has reached the maximum number of concurrent requests.
1502	Your audio driver is either invalid or cannot be downloaded.
1503	Your account is not authorized to call the API.
1305	Only business plan is allowed.

Job Error Codes

Code	Description
999	Failed to download the file.
20403	Not enough faces.
20407	The number of face tracks is too many.
20408	The image-to-video facial detection has not been passed.
20601	There are no faces in the picture.
20602	Unknown image format.
20611	Video triggering flow limit.
20613	Generate video input sensitive to images.

How it works