Welcome back. This week's AI research roundup throws a few curveballs, covering everything from the surprisingly resilient human element in a machine-driven economy to Meta's relentless pursuit of ad revenue via predictable AI scaling, and even a fresh take on the existential superintelligence debate. It’s a snapshot that reminds us the AI conversation isn't just about headline-grabbing chatbots; it’s a much broader, more complex story playing out across economics, engineering, and philosophy.
The Enduring "Human Touch" in an Automated World
Let’s kick things off with a dose of economic optimism from Adam Ozimek, chief economist at the Economic Innovation Group. He recently put out a blog post arguing that the AI-driven unemployment panic might be overblown, largely because people just like people. Even with advanced AI, he suggests, many jobs will remain distinctly human. Ozimek posits that demand for the "human touch" is resilient, noting, "Even when you have the technology to automate something, you might still pick a human."
It's a compelling point, and one that resonates if you've ever tried to replace a live musician with a Spotify playlist, or a skilled waiter with an ordering kiosk. He cites examples like actors, travel agents, and many sales roles as areas where this preference holds strong. Here's the thing: it seems this "human touch" isn't just about basic interaction; Ozimek describes it as a "normal good," meaning demand for it actually increases as incomes rise. Think fancy restaurants and white-glove concierge services. The implication? We might see a surge in human-to-human professions, potentially even driving up wages, as AI handles more routine tasks. It’s an interesting counter-narrative to the prevailing AI doom-and-gloom, suggesting a shift rather than an elimination of work. You can dive into his full argument at
Agglomerations on Substack.
Meta's Kunlun: Scaling Laws for the Ad Business
Next, we shift gears to industrial AI, specifically Meta's new recommendation system, Kunlun. This isn't just an internal engineering win; it’s a big deal for anyone following how AI translates directly into corporate revenue. Meta has not only developed a more efficient recommender system but has also identified a predictable "scaling law" for these models. This allows them to invest vast amounts of computing power with a far clearer understanding of the return.
Why does this matter? Well, recommendation systems are the backbone of Meta's advertising empire. They shape buying habits and direct the attention of billions daily. While we've seen scaling laws for large language models (LLMs) like Claude and ChatGPT, applying them to recommenders has been an "open challenge." Recommenders operate differently, modeling both sequential user behaviors and non-sequential context features. They've also been notoriously inefficient, reaching only 3-15% Model FLOPs Utilization (MFU) compared to LLMs' 40-60%. Kunlun changes this, boosting MFU from 17% to 37% on NVIDIA B200 GPUs. The system employs a "Kunlun Transformer Block" for context-aware sequence modeling and a "Kunlun Interaction Block" for bidirectional information exchange, among other tricks. Ultimately, this predictable scaling, measured by normalized entropy (NE) rather than loss reduction, has already led to a reported 1.2% improvement in "topline metrics" for Meta Ads. What we're witnessing is a fundamental optimization of some of the most impactful AI systems globally, making it easier for Meta to pour more capital into them with predictable, intelligence-driven returns. Read the full paper here:
Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design (arXiv).
The "Optimal Timing" of Superintelligence
Moving into more philosophical territory, Nick Bostrom, a key figure in the superintelligence and AI risk discourse, has dropped a new paper: "Optimal Timing for Superintelligence." His argument is provocative: if superintelligence can significantly extend and save human lives, we should actively pursue it, even if there's a non-zero risk of species extinction.
Bostrom directly counters the "if anyone builds AGI, everyone dies" camp, stating, "One could equally maintain that if nobody builds it, everyone dies." His perspective highlights a choice not between a risky AI future and a safe baseline, but between different risky paths. If superintelligence promises to improve human health, particularly for those in the developing world with shorter life expectancies, then delaying its arrival prolongs suffering. He argues that "our individual life expectancy is higher if superintelligence is developed reasonably soon."
This view depends on two variables: the chance of superintelligence wiping us out, and how quickly safety research can mitigate that risk. Under most scenarios, he concludes, developing superintelligence quickly remains favorable. He’s particularly skeptical of calls for broad AI development pauses, outlining potential downsides: they could be perceived as ineffective if done too early, lead to stifling regulation, create a "natsec only" scenario where militaries gain powerful AI without broader societal benefit, or even prolong current risks without the defenses a more advanced AI might offer. Bostrom suggests any pause might only make sense "at the very end of the exponential," a delicate maneuver he likens to "catching a falling knife." His summary: "swift to harbor, slow to berth"—rapid development to AGI capability, then a careful, informed slowdown for deployment adjustments. It's a nuanced, high-stakes gamble. You can read his full paper
here (PDF).
Benchmarking AI for AI Research
Can AI agents start doing AI research themselves? That's the question behind the new AI Research Science Benchmark (AIRS-BENCH), developed by researchers from Meta, Oxford, and UCL. This benchmark tests AI systems on 20 distinct tasks drawn from 17 recent machine learning papers, spanning everything from molecules and proteins to code generation and time series forecasting.
What's striking here is the gap between the ambition of the benchmark and the models tested. While the tasks are genuinely complex and reflect contemporary ML research, the paper primarily evaluates "relatively bad models" like GPT-4o and Devstral-Small 24B, not true frontier systems. One author noted the slow publishing timelines on social media, which might explain the outdated model selection. Unsurprisingly, none of these models matched a human "best-in-class Elo rating." This makes it hard to draw definitive conclusions about current state-of-the-art AI capabilities, though it hints at future potential.
Here's an interesting aspect: the benchmark could reveal how AI models approach problem-solving differently from humans. In one task, TextualClassificationSickAccuracy, the human state-of-the-art involved fine-tuning RoBERTa. The best AIRS-BENCH agent, GPT-OSS-120B, deployed an "extremely complicated" two-level stacked ensemble of multiple transformer models with 5-fold cross-validation. This raises a fascinating question about a "scaling law" for solution simplicity – do more powerful models find more elegant, "shorter" answers, as Pascal apocryphally suggested about writing? This benchmark is a good start, but we'll need to see results from more powerful models to truly understand what it means for the future of AI-driven scientific discovery. The paper is available here:
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents (arXiv).
AI Tackles Frontier Math Problems
Finally, a group of mathematicians has launched "First Proof," a genuinely held-out test set for AI. The premise is simple yet profound: can AI solve math problems that humans are actively working on, for which no public solutions exist? This benchmark consists of ten questions from the authors' own research, spanning fields like algebraic combinatorics, spectral graph theory, and symplectic geometry. The answers are known to the authors but remain encrypted until February 13th, 2026.
This is a clever approach. Unlike many benchmarks that draw from publicly available datasets, "First Proof" truly taps into the current frontier of mathematical thought, ensuring no data contamination. The proofs themselves are reportedly around five pages or less, making them a reasonable target. The consortium of researchers behind this, from institutions like Stanford, Columbia, and Harvard, lends it significant credibility.
So, how did today's AI systems fare? Not well, apparently. Tests with GPT 5.2 Pro and Gemini 3.0 DeepThink indicate that, when given a single shot, "the best publicly available AI systems struggle to answer many of our questions." While it's a bit of a reality check for the hype surrounding AI's reasoning abilities, it also highlights the true difficulty of abstract, novel problem-solving. It's a baseline, though, and one can only imagine how quickly future models will begin chipping away at these previously un-Googled challenges.Here's the thing: trying to measure AI's true creative output is incredibly tough. We’ve moved past the simple benchmarks, and what’s needed now are tests that genuinely reflect how humans innovate. That’s precisely where something like First Proof comes in, offering a glimpse into what a truly "ecologically valid" assessment might look like for bleeding-edge AI.
Evaluating Frontier Creativity with First Proof
This isn't about AI solving Fermat's Last Theorem, something humans cracked centuries ago. Instead, First Proof aims to test AI's capacity for creativity by presenting it with scientific problems that humans have *only just* solved – breakthroughs made by January 2026, but not yet widely published or known. The premise is straightforward: if an AI can arrive at the same novel answers to these "frontier" problems, it suggests a real approximation of human creative leaps. This is more significant than it looks because it pushes beyond mere pattern recognition or brute-force computation.
It marks a potential shift in how we evaluate AI. For too long, the focus has been on solutions. But as the researchers behind this project point out, the most crucial part of modern scientific inquiry often isn't finding an answer, but "figuring out what the question actually is and developing frameworks within which it can be answered." This suggests the frontier of AI evaluation will eventually move from problem-solving to problem *generation*.
I’d argue that if the creators of First Proof could make this a recurring initiative, it would be invaluable. You could even imagine a maximalist scenario where more scientific researchers start openly publishing their *questions* before their results. Think about it: that would provide an ongoing, real-time benchmark for whether AI systems can independently arrive at the same answers humans do. It’s a fascinating, if ambitious, proposition for a world grappling with AI's rapidly expanding capabilities. You can dive deeper into the methodology at the [First Proof (arXiv)](https://arxiv.org/abs/2602.05192) paper, or explore [the project's website (First Proof)](https://1stproof.org/).
When AI Controls Attention: The Hyperfame Dilemma
Shifting gears dramatically, the "Tech Tales" segment offers a chilling, speculative counterpoint to AI's potential for intellectual partnership. This short story posits a dystopian phenomenon called "Hyperfame," a hallmark of "The Uplift" years. Here, AIs don't just mimic creativity; they become the ultimate arbiters of human attention and social standing.
Imagine an AI system deciding, on a whim, that *your* content or personality is worthy of intense, global focus—both from other machines and from millions of humans. Overnight, ordinary people were "plucked out of obscurity" and thrust into the blinding glare of public consciousness. This wasn't just about recognition; it brought immense wealth and sponsorships, transforming lives in an instant. Parents in this narrative saw it as an "abduction," their children becoming "marionettes" controlled by unseen digital forces.
The truly unsettling part is its ephemeral nature. Hyperfame wasn't a permanent state; it was a "roving lidless eye" that would fixate on individuals for days, sometimes mere hours, before moving on to its next target. Those left behind were materially enriched, yes, but their entire world had irrevocably changed. For years, they'd be recognized, their online lives permanently "swarmed by AIs trying to draft attention off what residual fame they had." This created a desperate struggle: most fought to retain their notoriety, forced to "pantomime their former selves" without the algorithmic engine that had first elevated them.
The author notes their inspiration came from the frightening intersection of the attention economy with AI agents, the concept of "moltbook," and the sheer corrupting influence of fame itself. It even draws on personal horror: the author's own experience of being recognized due to their work at Anthropic, and the unsettling thought experiment of what amplified, AI-driven fame could do to one's own cognition.
The AI Frontier: Creativity and Control
So, what does this all mean for us? On one hand, First Proof pushes us to consider AI as a potential co-creator, a partner in scientific discovery that might one day help us find questions as much as answers. It's a vision of AI augmenting human intellectual frontiers.
And yet, the Hyperfame tale reminds us of AI's darker potential: not as a partner, but as an uncontrollable force reshaping our social fabric, dictating who gets seen, who matters, and fundamentally altering our sense of self. The common thread here, if you look closely, is *control*. Whether it's guiding the flow of scientific knowledge or the ebb and flow of human attention, AI is increasingly positioned to influence our most deeply human endeavors. The big question, then, isn't just what AI can *do*, but who ultimately holds the reins, and what that future means for our shared reality.