An example of why I think current LLMs are enough to change lots of work even if they don’t get better, once we start integrating them with other systems GPT-4 (now obsolete) went from 30% accuracy to 87% accuracy in clinical oncology decisions when acting like an agent & given access to medical tools and reference material. Paper: https://lnkd.in/gwQAcAfX
I agree. How it’s implemented always matters more than the tech itself.
Agree, sadly the first paragraph went right over my head, so I'll take your word for it 😀
I've been saying for a while that GenAI is useful when it is baked into the software and systems people already use vs. startups' obsession with disruption. Great to see Ethan Mollick expanding on how that works for AI agents.
Soon the Era of Self-Driving.company is here.
Ethan Mollick, thank you for sharing this insightful example of llms' potential impact! 🌟
Ethan Mollick, this is the part the headlines will miss. Everyone will point to the 87% accuracy. But the real unlock wasn’t the model. It was the scaffolding around the model. The accuracy didn’t improve because GPT-4 got smarter. It improved because it got governed. • The agent was routed through protocol-based steps. • Its decisions were gated by tool checks, not guesswork. • And every response was traceable to a logic layer beyond the LLM. This is Thinking OS™ territory. Because intelligence without interpretive control is just high-confidence entropy. In high-stakes domains like clinical oncology, performance gains don’t come from better output. They come from decision constraint, escalation friction, and upstream adjudication — before the model gets to answer anything at all. The agent didn’t become brilliant. It just stopped hallucinating.
Contrary I think they will go down in use, as people realize deterministic approaches will work much better with lower cost than using AI.
I wonder when it will become official malpractice for an oncologist not to be using AI for co-intelligence!
That oncology accuracy jump from 30% to 87% shows how LLMs get game-changing results through smart system integration, not just bigger models. What other fields do you think are about to see similar accuracy wins through better tool integration?
CEO @mason | AI for hyper-personalized storefronts
1moThe frontier for agents is no longer tools but memory - statelessness only takes you so far. Tools will expand the ability of models but will also need adaptation for the model world - more context around usage, long tail APIs, verification simulators.