Analyzing DiffuEraser's Challenges for VFX Cleanup in Production

29 May 2026 by

Suraj Barman

Initial Impressions of DiffuEraser

DiffuEraser initially presents itself as a compelling tool for video inpainting, boasting claims of state-of-the-art texture and temporal consistency. Its demonstration materials showcase results that seem superior to competitors like ProPainter. For VFX professionals, such features might make it an attractive solution for cleaning up real-world footage. However, real-world performance often diverges from the idealized scenarios portrayed in promotional content, as revealed during a week-long evaluation process.

Testing Setup and Objective

The evaluation was conducted on an Apple Silicon M4 Max with 128GB unified memory, utilizing a ComfyUI Desktop setup integrated with the ComfyUIDiffuEraser custom node. The footage tested was ProRes Log3G10 16-bit, a commonly used format in professional workflows. The task involved removing a textual sign from a 45-frame office window shot, with the cleanup zone constrained to the specific region of interest. This controlled scenario was designed to assess the tool's precision and compatibility with production-grade data.

Key Issues Identified

During the first test run using default settings, the tool exhibited artifacts across the entire frame, not limited to the masked region. Additionally, there were global color shifts and the introduction of noise in untouched areas. These results are contradictory to the core purpose of an inpainting tool, which should offer surgical precision without degrading unmasked portions of the footage.

The root cause of these issues stems from DiffuEraser's reliance on the VAE (Variational Autoencoder) roundtrip within the Stable Diffusion 1.5 model. The VAE encodes and decodes the entire frame through latent space, causing degradation across untouched pixels during the encode-decode cycle. This is an architectural limitation, not a mere configuration issue, and cannot be resolved through parameter adjustments alone.

Challenges with Log3G10 Input

Another significant challenge arose when processing Log3G10 input data. The custom node in the ComfyUI pipeline forced tensors through PIL's byte conversion, collapsing the original 16-bit log data into 8-bit sRGB before it reached the VAE. This conversion led to a severe loss of color fidelity, further exacerbating the tool's inability to handle professional-grade footage.

An attempt was made to bypass PIL entirely by directly passing float32 log tensors to the VAE. However, this modification resulted in catastrophic failures, including NaN and Infinity outputs and severe fractal-like artifacts. These findings highlight that the issue lies not in the input-output layers but in the VAE's training distribution, which is incompatible with high-dynamic-range data.

Training Distribution Limitations

The underlying problem with DiffuEraser's VAE is its training on sRGB 8-bit images. Professional Log3G10 values occupy a significantly narrower range, which the model has never encountered during training. This disparity causes the generated latent representations to fall out of distribution, leading to mathematical instabilities and unusable results. Even with optimized input-output pipelines, the tool is inherently unsuitable for high-fidelity VFX cleanup tasks.

Conclusion and Comparative Observations

Compared to other tools like ProPainter and Netflix's VOID, DiffuEraser fails to meet the expectations of VFX professionals dealing with complex production pipelines. Its reliance on a VAE architecture trained on limited datasets fundamentally limits its applicability to high-end, color-critical workflows. To achieve reliable results, future iterations would need to address these architectural shortcomings and broaden the training data distribution to include high-dynamic-range formats like Log3G10.

in Visual Effects Breakdown