12.06.2025 | 00:28
Ne znam je li ovo info s WWDC-a, ali je zanimljiv:
Apple’s research shows we’re far from AGI and the metrics we use today are misleading
Here’s everything you need to know:
→ Apple built new logic puzzles to avoid training data contamination.
→ They tested top models like Claude Thinking, DeepSeek-R1, and o3-mini.
→ These models completely failed on unseen, complex problems.
→ Accuracy collapsed to 0% as puzzle difficulty increased.
→ Even when given the exact step-by-step algorithm, models failed.
→ Performance showed pattern matching, not actual reasoning.
→ Three phases emerged: easy = passable, medium = some gains, hard = total collapse.
→ More compute didn’t help. Better prompts didn’t help.
→ Apple says we’re nowhere near true AGI, and the metrics we use today are misleading.
This could mean today’s “thinking” AIs aren’t intelligent, just really good at memorizing training data.
(izvor The Rundown AI):