A recent AI coding competition has ignited debate about model effectiveness, placing GPT-5 Pro in direct comparison with Claude Sonnet 4 and Gemini 2.5 Pro. Conducted on August 15, 2025, the challenge tasked these systems with creating a responsive image gallery.
The models were required to create an image gallery that:
Dynamically loads images from numerous URLs
Utilizes CSS Grid or Flexbox for full responsiveness
Provides smooth image transitions
Incorporates lazy loading and user interactions
GPT-5 Pro delivered an aesthetically pleasing and functional user interface. A forum member remarked, "Not exactly groundbreaking, but the theme and UI are nice."
Claude Sonnet 4 utilized Bind AI, offering a simple yet effective UI. Many appreciated its user-friendliness, although some felt it lacked depth, placing it as the second-best performer.
Gemini 2.5 Pro, however, fell short. It struggled with image loading and failed to provide the expected infinite scrolling feature. One frustrated participant noted, "Images NOT loading what are you on about, mate? Fully functional and displaying ALL images Bias much?"
The response from observers raised multiple points:
Many criticized the test setup for not mirroring real-world tasks, implying that most projects involve modifying existing, complex code rather than forming simple apps.
Users noted GPT-5βs quick problem-solving capabilities, with feedback suggesting it outperformed Claude in practical scenarios.
Some discussed their transitions to GPT-5; one user mentioned switching due to Claude's limitations, citing the lower overall efficiency.
A comment mentioned preferences in design, with one user stating, "I don't like dark mode, so I think Claude one is better. Maybe people who like dark mode would prefer GPT-5 more."
Others suggested future tests focus on more intricate designs, with one user proposing a challenge involving a web calculator.
"A more interesting test would involve giving the AI a flawed codebase and asking it to fix bugs," a contributor suggested, pointing to flaws in current evaluation metrics.
β Real-World Relevance: There's a widespread belief that AI evaluations should reflect true development activities.
β οΈ Performance Disappointment: Many voices expressed dissatisfaction with Gemini 2.5 Pro's inability to fulfill basic functions.
π‘ Innovative Testing Suggestions: Users are pushing for more complex challenges to yield meaningful insights.
As AI tools progress, the focus on usability and features relevant to development will be paramount. Predictive analysis indicates a higher likelihood that future enhancements will emphasize interactivity, spotlighting a growing collaboration between AI and human developers. As this coding contest echoes earlier tech transformations, the anticipation remains strong for advancements that could significantly impact development processes.