Optimizing Netflix VOID Workflow on Apple Silicon
The Core Problem in VFX Pipeline Integration
The key challenge analyzed is whether the Netflix VOID model can be effectively integrated into a real cinema-grade VFX pipeline on Apple Silicon, while adhering to log sources and ACES color management. The tests revolved around two distinct routes: one using ComfyUI with a community node and the other employing the standalone Netflix VOID setup. The results highlighted critical performance and quality issues, particularly with ComfyUI, and shed light on the adjustments needed for the standalone pipeline to deliver optimal results.
Examining the ComfyUI Workflow
ComfyUI initially appeared promising due to its visual workflow, ease of parameter swapping, and built-in integration with SAM2 for mask generation. However, deeper testing revealed key limitations. The community node was a reimplementation, not the original Netflix code, leading to architectural shortcuts. For instance, the CFG parameter was found non-functional, as different values (e.g., 10 vs. 60) produced identical outputs.
Moreover, critical features like Temporal MultiDiffusion were absent. The VOIDSampler used here was a custom implementation of DDIM, lacking the precision of Netflixs original DDIMOrigin. Additionally, the processing time for Pass 2 on the same hardware was significantly longer (12 hours compared to 30 minutes for the standalone). These limitations resulted in a visible ceiling on output quality, rendering this route unviable for professional-grade VFX work.
Performance Insights on the Standalone Workflow
The standalone Netflix VOID workflow on the M4 Max with 128GB of memory yielded far better results. Running on MPS in float32, the model required patching to address issues such as NaN outputs in fp16. Key modifications included correcting device autodetection, switching to 16-bit PNG output via OpenCV instead of MP4 H264, and implementing proper unpadding for the temporal VAE. These adjustments resulted in clean, high-quality outputs in 18 minutes per 75-frame shot.
The standalone method matched the generation quality of a CUDA workstation, albeit with a slower processing speed. Remarkably, it preserved grain, eliminated compression artifacts, and maintained color stability across frames, making it a viable choice for Apple Silicon users. However, Pass 2 proved inefficient due to hardware limitations and was deemed unnecessary for most use cases.
Addressing Pass 2 Inefficiencies
Pass 2 of the standalone workflow is designed to enhance temporal consistency by using warped noise refinement. Unfortunately, this process exhibited severe inefficiencies on the M4 Max. The noise generation algorithm, GowiththeFlow, includes a hardcoded 10-minute timeout tailored for A100 GPUs, but it required 20 minutes on the M4 Max. This substantial delay undermines its utility in practical workflows.
Moreover, the rp library, critical for runtime dependencies, triggered interactive prompts during non-interactive sessions, causing workflow interruptions. These factors made Pass 2 impractical on Apple Silicon without significant modifications to the underlying codebase.
Comparative Analysis of the Two Routes
When comparing ComfyUI to the standalone Netflix VOID setup, the latter outperformed in nearly all aspects. ComfyUIs reliance on a community-implemented node introduced performance bottlenecks and quality limitations. Conversely, the standalone setup, once patched, delivered results comparable to a CUDA workstation, albeit with longer processing times.
For professionals seeking to integrate Netflix VOID into a cinema VFX pipeline on Apple Silicon, the standalone method is the superior choice. While it requires a series of technical adjustments, the resulting output quality and performance justify the effort. ComfyUI, despite its user-friendly interface, remains unsuitable due to its inherent limitations and lower quality ceiling.
Key Modifications for Optimal Performance
To achieve the best results with Netflix VOID on Apple Silicon, several critical modifications are necessary. First, ensure the model runs on MPS in float32 to avoid NaN issues associated with fp16. Second, replace MP4 H264 output with 16-bit PNG using OpenCV to eliminate compression artifacts. Third, implement proper unpadding for the temporal VAE to maintain frame consistency.
These adjustments significantly enhance the output quality, making the standalone Netflix VOID workflow a viable option for high-end VFX production. However, users should note that Pass 2 remains a bottleneck and may not be worth the additional processing time on Apple Silicon hardware.