Edited By
Lisa Fernandez

A recent comparison of 62 samplers and 16 schedulers for WAN 2.1 image generation has sparked discussion among people in user boards. The findings reveal notable variations in image quality, raising questions about consistency and workflow practices.
The analysis included a table that clearly indicates a ranking of performance, showing a gradient from Red to Green in terms of quality. However, the methods used to arrive at these conclusions have drawn both praise and criticism. One user stated, "Without a reproducible workflow and outputs to judge ourselves, this is not very useful."
Additionally, a few commenters highlighted the necessity of performing multiple tests per sampler, advocating for varied seeds and resolutions. This suggests that the conclusions might not offer a clear baseline due to these variations in testing methods.
Testing Methods: Users pointed out that conducting one test per sampler could be misleading. Comments emphasized the importance of doing at least five tests with different prompts for conclusive results.
Open Issues with Specific Samplers: Recommendations surfaced regarding which samplers are effective with flow models. A user mentioned, "Euler_cfg_pp is the only built-in CFG++ sampler that works with flow models." The message here is clear: some combinations simply do not yield satisfactory results.
Step Count Impact: Notably, there are questions about the step count consistency across ratings. One comment stated, "A lot of the sampler differences collapse as you push step count up." This points toward a potential hidden variable influencing the findings significantly.
Overall, the sentiment is mixed. While some celebrate the effort put into the extensive testing, others express skepticism regarding the utility of the outcomes without a proper workflow and multiple test iterations. A user emphasized the complexity of the issue by saying, "Doing enough tests seems impractical."
π 62 samplers and 16 schedulers ranked, showing significant variability in image quality.
π Testing methods questioned: Multiple seeds and resolutions are crucial for accurate ratings.
β οΈ Specific combinations may fail: Not everyone works well with flow models, limiting options for optimal performance.
As the discourse unfolds, it begs the question: What are the best practices moving forward for evaluating image generation tools in this space?
Thereβs a strong chance that the discussions around samplers and schedulers will lead to an industry-wide push for standardized testing methods. As people highlight the necessity for reproducible workflows, developers may introduce guidelines to ensure consistent results across various tests. Experts estimate around a 60% probability that enhanced collaboration between manufacturers and the user community will emerge. This could result in an effective feedback loop, strengthening the quality of AI-generated images. Furthermore, the spotlight on step count variations hints at a growing trend to investigate the underlying factors affecting performance, which could reshape design philosophies in this space.
A comparison can be drawn between the current situation and the early days of digital photography, where varying camera models produced inconsistent results. Just as users had to navigate the complexities of lenses and settings, today's people face a landscape of samplers and schedules with mixed performance metrics. The innovations that followedβsuch as the development of standardized imaging protocolsβtransformed photography into a reliable medium. Similarly, as the AI image generation community grapples with these quality assessments, it may very well spark a revolution in how image generation tools are evaluated, leading to clearer insights and higher standards.