
An intriguing experiment testing four leading AI models in stock trading has sparked mixed reactions. Participants invested $1,000 in ChatGPT, Gemini, Claude, and Perplexity, evaluating their trading skills under a controlled setup.
Two months ago, I began this unique trial, assigning identical prompts to each AI model every weekday before the market opens. Using Alpaca APIs for trading on paper accounts, I implemented controls to prevent interference. After nine weeks, ChatGPT emerged as the front-runner with a +21.1% increase, showcasing a noteworthy shift in strategy.
ChatGPT initially sat on cash for three weeks before making aggressive investments in healthcare, resulting in significant gains, including a 52% rise in ACHC and substantial profits from IOVA. "ChatGPT went from worst to first almost overnight," noted an observer.
Perplexity managed only +1.1%, holding minimal positions while predominantly retaining cash. Gemini lost -6.6% after risky ventures into memes and crypto trading, facing multiple losses. Claude had the poorest performance at -11.5%, plagued by erratic trading behavior despite a recent uptick with IOVA.
A key observation from commentators emphasized the inherent risk tolerance of each model. ChatGPT's pattern of cash conservation before jumping into high-conviction trades aligns with recognized principles in quantitative finance, known as conviction-based position sizing. In contrast, Claude's frequent trading reflected the common pitfall of over-trading, leading to disappointing returns.
"More trades donβt equate to better results β that's a principle backed by behavioral finance research," a forum participant pointed out.
Some participants suggested improvements for future experiments, including having models provide reasoning memos prior to trade decisions for better trade analysis comparison.
Feedback from various forums shows a mix of admiration and skepticism:
"This experiment is fascinating, but we need more data to draw real conclusions," commented one participant.
"Iβve personally tested the models and found that, generally, LLMs donβt understand stock trading well," stated another.
β³ ChatGPT leads with 21.1%, outperforming the S&P 500 by over 22 points
β½ Perplexity achieved modest gains at +1.1%
βΌ Gemini and Claude lag behind at -6.6% and -11.5%, respectively
π¬ "This shows the importance of backing trades with solid reasoning" β Forum participant.
As the experiment extends for another three weeks, analysts are keen to see if ChatGPT can sustain its performance. Observers speculate that adjustments based on community feedback could significantly influence results. Other models may face continued challenges, as they delve deeper into volatile trading strategies.
Looking forward, shifts in trading efficiency will be under scrutiny. Can ChatGPT maintain its lead as market conditions evolve? With potential enhancements to trading methodologies, the ongoing experiment may redefine how AI models approach stock trading.
For full results and updates on methodology, the ongoing data is available on GitHub.