AI Digest

Digest curado

viernes, 22 de mayo de 2026·weekly-deep·deep·12,485 tokens

🔥 TOP — lo que SÍ o SÍ tenés que ver

  • 💰 Anthropic paga 1.250 millones de dólares al mes por compute en SpaceX — El S-1 de SpaceX revela que Anthropic firmó un acuerdo por $1.25B/mes por acceso a los clusters COLOSSUS y COLOSSUS II, rampeando de mayo 2026 a mayo 2029. Esto explica la escala de cómputo que Anthropic está asegurando para entrenamiento e inferencia de próximos modelos. link

  • 🔧 MCP Tunnels Research Preview + Self-hosted Sandboxes para Claude Managed Agents — Ahora podés conectar MCP servers en tu red privada desde Claude, y ejecutar tools de tus Managed Agents en tu propia infra en vez de en Anthropic. También podés actualizar config de MCP y tools de un agente sin reiniciar la sesión. link


📦 Claude / Anthropic ecosystem

  • 🚀 Claude Managed Agents ya se puede deployar en Cloudflare — Cloudflare integró los Managed Agents de Anthropic como execution environment, dándote un runtime aislado, rápido y global para escalar workflows agentivos con control de acceso a backends privados y tools customizables. link

  • 🔐 Cloudflare CASB ahora integra Claude Compliance API — Si laburás con Claude Enterprise en un contexto con compliance, ahora podés monitorear actividad directamente desde el dashboard de Cloudflare. link

  • 🛠️ Spec-Driven Development workflow para Claude Code — Un approach que descompone tareas en etapas: generás specs (requirements, code analysis, design) primero, después implementás subtareas una por una, limpiando contexto entre pasos. Los specs en disco mantienen persistencia y permiten detectar temprano si el agente te entendió mal. link

  • 🚦 Chrome DevTools MCP para coding agents — El equipo de Chrome DevTools lanzó un MCP server que le da a Claude, Cursor, Copilot y Antigravity control total sobre Chrome: traces de performance, análisis de red, screenshots, console logs con source maps. Ideal para debugging de frontend con agents. link


🛠️ Dev tools & coding

  • 🎯 Las skills oficiales de .NET para coding agents — Microsoft lanzó dotnet/skills, un repo con skills curadas para agents de AI (Claude, Cursor, etc.) que cubren .NET core, data access con EF, MSBuild, NuGet, debugging de performance. Incluye un dashboard de accuracy y efficiency. link

  • 📘 Martin Fowler: "Vibe Coding" ya tiene su bliki entry — Fowler formaliza el concepto de Karpathy: codear solo con prompts sin mirar el código generado. Resulta en software con problemas de mantenibilidad, seguridad y corrección, pero útil para prototipos descartables. Bueno para entender el trade-off. link

  • 📡 Martin Fowler: nuevos sensores de static code analysis para coding agents — Birgitta Böckeler agrega tres sensores más enfocados en modularidad. La data revela que pedirle a un LLM que revise modularidad por prompting funciona mejor que intentar computar métricas de coupling automáticamente. link

  • 🎬 Datasette Agent: el AI assistant extensible para Datasette — Simon Willison finalmente une LLM y Datasette. Un agente conversacional que genera SQL queries y puede hacer charts (con plugin). Si laburás con datos tabulares como parte de tu side project, es un patrón copiable. link

  • 🐍 notebooklm-py: API Python no oficial para Google NotebookLM — Acceso programático completo a NotebookLM desde Python, CLI y agents (Claude Code incluido). Expone features que la UI web no tiene. Si usás NotebookLM para research, esto te permite automatizarlo. link

  • 📝 OpenWA: WhatsApp API Gateway open source y self-hosted — Arquitectura plugueable donde podés cambiar SQLite/PostgreSQL, Local/S3, Memory/Redis sin tocar código. Si tu side project de restaurant SaaS necesita notificaciones por WhatsApp, esto es mejor que depender de Twilio o proveedores caros. link


🏗️ Software engineering

  • 🧠 Async patterns en API design — ByteByteGo desglosa los patrones asincrónicos: polling, webhooks, SSE, WebSockets, y cuándo usar cada uno. Si estás diseñando APIs para tu SaaS o para sistemas distribuidos, este tipo de referencia conceptual siempre suma. link

  • 📺 Cómo Netflix usa multimodal AI para search de video — ByteByteGo explica cómo Netflix construyó el sistema: embeddings multimodales (video + texto + audio), cómo indexan, cómo manejan queries en tiempo real. Aplicable si estás pensando en sistemas de búsqueda o recomendación. link

  • 📱 Snapchat: sirviendo mil millones de predicciones por segundo con ML — ByteByteGo desglosa la arquitectura de inferencia de Snap: cómo escalan modelos, feature stores, latencia. Caso concreto de Big Tech resolviendo problemas de escala reales. link

  • 🏎️ Cómo Grab usa AI agents para boostear productividad del team de data — ByteByteGo cubre cómo el equipo de data engineering de Grab implementó agents para tareas de mantenimiento de infraestructura compartida. Patrón que podrías aplicar si tenés data pipelines que mantener. link

  • 🔬 Análisis de Project Glasswing: modelos frontier aplicados a seguridad de infraestructura — Cloudflare expone qué pasó cuando pusieron modelos como Mythos a auditar código crítico de su infraestructura: strengths, weaknesses, y qué falta para que sea escalable. link


📚 Vale la pena leer

  • 🔁 AgentAtlas: más allá de leaderboards para evaluar agents — Un paper que propone una taxonomía de 6 estados de decisión y 9 categorías de fallos de trayectoria para evaluar agents. Va más allá del "accuracy" para pensar en safe deployability. link

  • 🧩 AgentCo-op: síntesis retrieval-based de workflows multi-agent interoperables — Framework que compone skills, tools y agents externos en workflows ejecutables con handoffs tipados, y aplica reparación local cuando la ejecución falla. Dos case studies en genomics. link

  • 🌍 Open-World Evaluations: midiendo capacidades frontier en condiciones reales — Propuesta de evaluaciones complementarias a benchmarks: tareas largas, desordenadas, con evaluación cualitativa en vez de automatizada. Presentan CRUX para hacerlas de forma regular. link

  • 🔄 SOLAR: agente autónomo que se auto-mejora con meta-learning a nivel de parámetros — Trata los pesos del modelo como un "entorno" para explorar y aprender a auto-optimizarse sin gradient-based fine-tuning. Paper denso pero relevante si te interesan agentes que aprenden en el tiempo. link

  • 🚄 Railway: "The Agent-Native Cloud" — Entrevista con el CEO de Railway: 3M usuarios, 100K signups/semana, data centers propios, gastan $200K+ en coding agents, y declaran la muerte de los PRs. link

  • 🏗️ Daytona: giving agents computers — CEO de Daytona cuenta cómo lograron 74% MoM growth, 850K daily runs, sandboxes bare-metal, RL evals, y la "agent cloud". link


💤 Skippeable pero conviene saber

  • 🤖 Google I/O 2026: Gemini Spark (competidor de OpenClaw), antigravity, y poco más — Simon Willison no encontró mucho para reseñar porque la mayoría son "coming soon". Lo más destacable es Gemini Spark como "personal AI agent" conectado a Gmail, Calendar, Drive, etc. link

  • ⚡ Simulador visual de tokens per second — App HTML que muestra cómo se ve 5, 30, 100, 800 tokens/s. Útil para calibrar expectativas cuando leés benchmarks de velocidad de modelos. link

  • 🏭 Samsung reparte $26.6B en bonos por el boom de AI en semiconductores — Cada empleado de chips recibe ~$340K en promedio. Señal de cuánto está moviendo la demanda de hardware para AI. link

  • 📐 PopPy: explotando paralelismo en compound AI apps de Python — Paper de arXiv sobre un sistema que detecta y explota paralelismo oportunístico en pipelines de AI (tool calls, chains, etc.). link

  • 🧠 Anatomy of an AI Agent (ByteByteGo) — Una visión simplificada: un AI agent es un while-loop. Bueno para compartir con no-técnicos o como intro rápida. link

  • 🎬 Primera película hecha 100% con AI se estrena en Cannes — "Hell Grind", hecha con Higgsfield AI. No sé si es buena o mala, pero es un hito cultural. link

  • 💰 Nuevos unicornios de AI infra: Exa, Modal, TurboPuffer — Latent Space hace un resumen de las últimas rondas de funding. Señal de que la infraestructura de AI sigue atrayendo capital masivo. link

  • ⚖️ FTC multa a Cox Media Group por "active listening" AI marketing — La FTC les exige casi $1M por vender un servicio que escuchaba conversaciones de smart devices para targeting de ads. link

  • 🎲 GPT-next resuelve un problema matemático abierto de 80 años por menos de $1000 — OpenAI afirma que GPT-next (o un modelo cercano) resolvió el "Erdős planar unit distance problem". Si es cierto, es un hito interesante en razonamiento matemático. link

Artículos fetched (51)

  • Personality Engineering with AI Agents: A New Methodology for Negotiation Research
    arxiv-ai· 22-may

    arXiv:2605.20554v1 Announce Type: new Abstract: According to canonical negotiation theory, people's success in a negotiation depends on how well they balance competing demands--empathizing and asserting, demonstrating concern for other and concern for self, being soft on the people and hard on the problem. Yet people struggle to manage these tensions, so researchers have lacked the ability to rigorously test the field's prescriptions under controlled conditions. AI agents do not face the same limitations, and their precision, repertoire, consistency, and scalability enable a new class of experiments to contribute to negotiation theory. In this article, we introduce personality engineering: a methodology that uses AI agents to precisely parameterize, manipulate, and evaluate negotiator per…

  • Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
    arxiv-ai· 22-may

    arXiv:2605.20577v1 Announce Type: new Abstract: Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also …

  • SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
    arxiv-ai· 22-may

    arXiv:2605.20189v1 Announce Type: new Abstract: Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidat…

  • Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
    arxiv-ai· 22-may

    arXiv:2605.20190v1 Announce Type: new Abstract: Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward tha…

  • OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
    arxiv-ai· 22-may

    arXiv:2605.20423v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional su…

  • AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
    arxiv-ai· 22-may

    arXiv:2605.20425v1 Announce Type: new Abstract: Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We propose AgentCo-op, a retrieval-based synthesis framework that composes reusable skills, tools, and external agents into executable workflows through typed artifact handoffs, then applies bounded self-guided local repair to implicated components when execution evidence indicates failure. In two open-world genomics case studies, AgentCo-op composes independently developed scientific agents and external tool repositories into auditable workflows without redesigning them or running global topology search. It coordina…

  • High Quality Embeddings for Horn Logic Reasoning
    arxiv-ai· 22-may

    arXiv:2605.20467v1 Announce Type: new Abstract: Neural networks can be trained to rank the choices made by logical reasoners, resulting in more efficient searches for answers. A key step in this process is creating useful embeddings, i.e., numeric representations of logical statements. This paper introduces and evaluates several approaches to creating embeddings that result in better downstream results. We train embeddings using triplet loss, which requires examples consisting of an anchor, a positive example, and a negative example. We introduce three ideas: generating anchors that are more likely to have repeated terms, generating positive and negative examples in a way that ensures a good balance between easy, medium, and hard examples, and periodically emphasizing the hardest examples…

  • $ECUAS_n$: A family of metrics for principled evaluation of uncertainty-augmented systems
    arxiv-ai· 22-may

    arXiv:2605.20490v2 Announce Type: new Abstract: In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -- to accept or reject predictions based on application-specific cost trade-offs. Such uncertainty-augmented (UA) systems -- i.e., systems that output both predictions and uncertainty scores -- are currently being assessed in the literature in a variety of ways, using separate metrics to evaluate the predictions and the uncertainty scores, setting a cost function with a fixed rejection cost or integrating over a coverage-risk curve. We argue that these evaluation approaches are inadequate for assessing overall performance of the UA system for decision making under uncertainty and propose a novel family of …

  • Open-World Evaluations for Measuring Frontier AI Capabilities
    arxiv-ai· 22-may

    arXiv:2605.20520v1 Announce Type: new Abstract: Benchmark-based evaluation remains important for tracking frontier AI progress. But it can both overstate and understate deployed capability because it privileges tasks that can be precisely specified, automatically graded, easy to optimize for, and run with low budgets and short time horizons. We advocate for a complementary class of evaluations, which we term open-world evaluations: long-horizon, messy, real-world tasks assessed through small-sample qualitative analysis rather than benchmark-scale automation. In this paper we survey recent open-world evaluations, identify their strengths and limitations, and introduce CRUX (Collaborative Research for Updating AI eXpectations), a project for conducting such evaluations regularly. As a first…

  • AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
    arxiv-ai· 22-may

    arXiv:2605.20530v1 Announce Type: new Abstract: Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but the benchmarks used to evaluate them are fragmented: each emphasizes a different unit of measurement (final task success, tool-call validity, repeated-pass consistency, trajectory safety, or attack robustness). A line of 2024-2025 work has converged on the diagnosis that a single accuracy column is no longer the right unit of comparison for deployable agents. AgentAtlas extends this line of work with four components: (i) a six-state control-decision taxonomy (Act / Ask / Refuse / Stop / Confirm / Recover); (ii) a nine-category trajectory-failure taxonomy with two orthogonal hierarchical labels (primary_error_source, impac…

  • How Netflix is Using Multimodal AI to Power Video Search
    bytebytego· 20-may

    In this article, we will understand how Netflix built this system and the challenges it faced.May 20 • ByteByteGo28048

  • LAST CALL FOR ENROLLMENT: Become an AI Engineer - Cohort 6
    bytebytego· 15-may

    Our 6th cohort of Becoming an AI Engineer starts tomorrow, Saturday, May 16. This is a live, cohort-based course created in collaboration with…May 15 • ByteByteGo24823

  • A Guide to Async Patterns in API Design
    bytebytego· 21-may

    In this article, we will look at each of these patterns in detail, along with their advantages.15 hrs ago • ByteByteGo764

  • EP215: The Anatomy of an AI Agent
    bytebytego· 16-may

    An AI agent can be thought of as a simple While-loop.May 16 • ByteByteGo350423

  • How Grab is Using AI Agents to Boost Team Productivity
    bytebytego· 18-may

    Grab’s data engineering team had a problem that looks familiar to anyone who’s maintained shared infrastructure.May 18 • ByteByteGo277211

  • How Snapchat Serves a Billion Predictions Per Second
    bytebytego· 19-may

    For Snap, machine learning is closer to the product itself than a feature on top of it.May 19 • ByteByteGo26939

  • Loading...
    claude-changelog

    Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...

  • Claude Platform
    claude-changelog

    Release notesCopy pageUpdates to the Claude Platform, including the Claude API, client SDKs, and the Claude Console.Copy pageFor release notes on Claude Apps, see the Release notes for Claude Apps in the Claude Help Center.For updates to Claude Code, see the complete CHANGELOG.md in the claude-code repository. May 19, 2026 MCP tunnels is now available as a Research Preview, so you can connect to MCP servers in your private network. Self-hosted sandboxes are now available for Claude Managed Agents, as an alternative to running tool execution in Anthropic's infrastructure. See Self-hosted sandboxes. With Claude Managed Agents, you can now update the agent's MCP server and tool configurations associated with an active session. With Claude Managed Agents, large outputs from agent_toolset and …

  • Announcing Claude Compliance API support with Cloudflare CASB
    cloudflare· 21-may

    Cloudflare now integrates with the Claude Compliance API, so that security teams can monitor Claude Enterprise activity directly in the Cloudflare Dashboard.

  • Announcing Claude Managed Agents on Cloudflare
    cloudflare· 19-may

    Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to private backends and easily customizing their agent’s tools and runtimes.

  • Project Glasswing: what Mythos showed us
    cloudflare· 18-may

    In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them needs to look like before any of it can scale.

  • rmyndharis/OpenWA
    github-trending

    Free, Open Source, Self-Hosted WhatsApp API Gateway OpenWA Open Source WhatsApp API Gateway Features • Quick Start • Docs • API • Contributing ✨ Why OpenWA? OpenWA is a free, open-source WhatsApp API Gateway designed for developers who need full control over their messaging infrastructure—without vendor lock-in or hidden paywalls. Built on a pluggable architecture, OpenWA lets you swap database engines (SQLite/PostgreSQL), storage backends (Local/S3), and cache layers (Memory/Redis) without changing a single line of application code. 🔓 100% Open Source No licensing fees, no feature locks, full source code access 🏗️ Pluggable Architecture Swap adapters for database, storage, and cache via config 🖥️ Full Dashboard Modern React UI for session, webhook, and API key management 🔹 Multi-Sess…

  • ChromeDevTools/chrome-devtools-mcp
    github-trending

    Chrome DevTools for coding agents Chrome DevTools for agents Chrome DevTools for agents (chrome-devtools-mcp) lets your coding agent (such as Antigravity, Claude, Cursor or Copilot) control and inspect a live Chrome browser. It acts as a Model-Context-Protocol (MCP) server, giving your AI coding assistant access to the full power of Chrome DevTools for reliable automation, in-depth debugging, and performance analysis. A CLI is also provided for use without MCP. Tool reference | Changelog | Contributing | Troubleshooting | Design Principles Key features Get performance insights: Uses Chrome DevTools to record traces and extract actionable performance insights. Advanced browser debugging: Analyze network requests, take screenshots and check browser console messages (with source-mapped stack…

  • dotnet/skills
    github-trending

    Repository for skills to assist AI coding agents with .NET and C# .NET Agent Skills This repository contains the .NET team's curated set of core skills and custom agents for coding agents. For information about the Agent Skills standard, see agentskills.io. 📊 Dashboard - Accuracy and efficiency scoring trends for contained plugins (https://dotnet.github.io/skills/) What's Included Plugin Description dotnet Collection of core .NET skills for handling common .NET coding tasks. dotnet-data Skills for .NET data access and Entity Framework related tasks. dotnet-diag Skills for .NET performance investigations, debugging, and incident analysis. dotnet-msbuild Comprehensive MSBuild and .NET build skills: failure diagnosis, performance optimization, code quality, and modernization. dotnet-nuget N…

  • rohitg00/ai-engineering-from-scratch
    github-trending

    Learn it. Build it. Ship it for others. ░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒░░░▒▒▒ 84% of students already use AI tools. Only 18% feel prepared to use them professionally. This curriculum closes that gap. 435 lessons. 20 phases. ~320 hours. Python, TypeScript, Rust, Julia. Every lesson ships a reusable artifact: a prompt, a skill, an agent, an MCP server. Free, open source, MIT. You don't just learn AI. You build it. End-to-end. By hand. How this works Most AI material teaches in scattered pieces. A paper here, a fine-tuning post there, a flashy agent demo somewhere else. The pieces rarely line up. You ship a chatbot but can't explain its loss curve. You hook a function to an agent but can't say what attention does inside the model that's ca…

  • teng-lin/notebooklm-py
    github-trending

    Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw. notebooklm-py A Comprehensive NotebookLM Skill & Unofficial Python API. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw. Source & Development: https://github.com/teng-lin/notebooklm-py ⚠️ Unofficial Library - Use at Your Own Risk This library uses undocumented Google APIs that can change without notice. Not affiliated with Google - This is a community project APIs may break - Google can change internal endpoints anytime Rate limits apply - Hea…

  • PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps
    hn-ai· 22-may

    Article URL: https://arxiv.org/abs/2605.18697 Comments URL: https://news.ycombinator.com/item?id=48232340 Points: 1 # Comments: 0

  • Samsung to distribute up to $26.6B to staff in AI-driven bonuses
    hn-ai· 22-may

    Article URL: https://www.tomshardware.com/tech-industry/big-tech/samsung-reportedly-set-to-distribute-up-to-usd26-6-billion-to-staff-in-ai-driven-semiconductor-bonuses-after-last-minute-union-deal-average-payouts-could-approach-usd400-000-per-chip-employee Comments URL: https://news.ycombinator.com/item?id=48232567 Points: 3 # Comments: 2

  • AI workflows: an industry optimising the wrong variables
    hn-ai· 22-may

    Article URL: https://adsurg.substack.com/p/navigating-ai-with-paper-maps Comments URL: https://news.ycombinator.com/item?id=48231808 Points: 5 # Comments: 1

  • AIMF – An open file format specification for AI-native media
    hn-ai· 22-may

    Article URL: https://github.com/ai-mf/media-engine Comments URL: https://news.ycombinator.com/item?id=48232243 Points: 1 # Comments: 0

  • Show HN: Spec-Driven Development Workflow for Claude Code
    hn-ai· 22-may

    Spec Driven Development approach allows to squeeze more from coding agents thanks to few strong concepts: - decomposition across two dimensions. first you generate specs in multiple steps (requirements, code analysis, design), than you split task into multiple subtasks and implement them one by one - you clear context between every step - after spec generation and after subtask implementation. this helps keep cost low and context clear and focused which boost performance - specs written to disk help with information persistency - delivering specs layer by layer help to catch early when agent got you wrong Repo with claude plugin for spec driven development: https://github.com/sermakarevich/sddw Comments URL: https://news.ycombinator.com/item?id=48231575 Points: 17 # Comments: 3

  • Ask HN: What the Best AI for Coding?
    hn-ai· 22-may

    What the best AI for coding tell us below Comments URL: https://news.ycombinator.com/item?id=48232068 Points: 1 # Comments: 2

  • Fixing LLM Writing with Distribution Fine Tuning
    hn-ai· 22-may

    Article URL: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distribution-fine-tuning/ Comments URL: https://news.ycombinator.com/item?id=48232606 Points: 1 # Comments: 0

  • First ever AI feature film premieres at the Cannes Film Festival
    hn-ai· 22-may

    Article URL: https://www.cgmagonline.com/news/hell-grind-made-only-with-higgsfield-ai/ Comments URL: https://news.ycombinator.com/item?id=48231869 Points: 3 # Comments: 0

  • Samsung Chip Workers to Get Average $340k Bonus in AI Boom
    hn-ai· 22-may

    Article URL: https://www.bloomberg.com/news/articles/2026-05-21/samsung-chip-workers-to-get-average-340-000-bonus-in-ai-boom Comments URL: https://news.ycombinator.com/item?id=48232419 Points: 1 # Comments: 2

  • AI Model Inflation: The Unsustainable Subsidy
    hn-ai· 22-may

    Article URL: https://tomtunguz.com/ai-model-inflation/ Comments URL: https://news.ycombinator.com/item?id=48231614 Points: 3 # Comments: 0

  • Railway: The Agent-Native Cloud — Jake Cooper
    latentspace· 20-may

    3M Users, 100K Signups/Week, Own-Metal Data Centers, $200K+ Coding Agent Spend, and the Death of PRs

  • [AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer
    latentspace· 22-may

    a quiet day lets us feature fundraises!

  • [AINews] OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000
    latentspace· 21-may

    a quiet day but a nice result in AI x mathematics

  • Giving Agents Computers — Ivan Burazin, Daytona
    latentspace· 21-may

    We chat with Daytona's CEO about their insane 74% MoM Growth, 850K Daily Runs, Bare Metal Sandboxes, RL Evals, and the New Agent Cloud

  • Bliki: Vibe Coding
    martin-fowler· 21-may

    Vibe coding is building a software application by prompting an LLM, telling it what to build, trying it out, prompting for changes - but without looking at any of the code that the LLM generates. This technique can be used by people without any knowledge of programming. However the resulting software often shows problems with maintainability, correctness, and security - so is best used for disposable software written for a limited audience. The term was coined in February 2025 by Andrej Karpathy, an experienced programmer, in a post on X: There's a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to …

  • Three more static code analysis sensors
    martin-fowler· 20-may

    Birgitta Böckeler adds discussion of three more sensors for static code analysis, focusing on checking and enforcing better modularity. Computational sensors for dependency checks were good at enforcing rules, but the rules were limited. Building a computational sensor for coupling data proved lackluster. Prompting an inferential sensor to review modularity was more effective. more…

  • How fast is 10 tokens per second really?
    simonw· 20-may

    <p><strong><a href="https://mikeveerman.github.io/tokenspeed/">How fast is 10 tokens per second really?</a></strong></p> Neat little HTML app by Mike Veerman (<a href="https://github.com/MikeVeerman/tokenspeed/blob/master/index.html">source code here</a>) which simulates LLM token output speeds from 5/second to 800/second.</p> <p>Useful if you see a model advertised as "30 tokens/second" and want to get a feel for what that actually looks like. <p><small></small>Via <a href="https://news.ycombinator.com/item?id=48174920">Hacker News</a></small></p> <p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p>

  • Quoting SpaceX S-1
    simonw· 20-may

    <blockquote cite="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm"><p>We have the ability to use compute resources to support our proprietary AI applications (such as Grok 5, which is currently being trained at COLOSSUS II), while also providing access to select compute capacity to third-party customers. For example, in May 2026, we entered into <strong>Cloud Services Agreements with Anthropic PBC</strong> (“Anthropic”), an AI research and development public benefit corporation, with respect to access to <strong>compute capacity across COLOSSUS and COLOSSUS II</strong>. Pursuant to these agreements, the customer <strong>has agreed to pay us $1.25 billion per month</strong> through May 2029, with capacity ramping in May and June 2026 at a r…

  • Google I/O, Gemini Spark, Antigravity
    simonw· 20-may

    <p>It's hard to find much to write about Google I/O this year because I have a policy of not writing about anything that I can't try out myself, and a lot of the big announcements are "coming soon".</p> <p>I actually prefer to write about things that are in general availability, because I've had instances in the past where the previews didn't match what was released to the general public later on.</p> <p>Aside from <a href="https://simonwillison.net/2026/May/19/gemini-35-flash/">Gemini 3.5 Flash</a> the most interesting announcement looks to be Google's upcoming OpenClaw competitor <a href="https://gemini.google/overview/agent/spark/">Gemini Spark</a>, described as "your personal AI agent" which can "connect natively with your favorite Google apps like Gmail, Calendar, Drive, Docs, Sheets…

  • datasette-agent-charts 0.1a1
    simonw· 20-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-agent-charts/releases/tag/0.1a1">datasette-agent-charts 0.1a1</a></p> <blockquote> <ul> <li>More color! Bar and waffle charts without a color column are shaded by magnitude with a sequential color scheme; color columns holding text values use the <code>observable10</code> categorical scheme. #2</li> <li>Now checks <code>execute-sql</code> permission before running the query to find the column names.</li> <li>Charts now display interactive tooltips.</li> <li>Fixed a bug where <code>waffleY</code> charts were not described to the agent.</li> </ul> </blockquote> <p>Tags: <a href="https://simonwillison.net/tags/datasette">datasette</a>, <a href="https://simonwillison.net/tags/datasette-agent">datasette-agent</a></p>

  • datasette-agent 0.1a3
    simonw· 21-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-agent/releases/tag/0.1a3">datasette-agent 0.1a3</a></p> <blockquote> <ul> <li>"View SQL query" buttons for both visible tables and collapsed SQL result tool calls.</li> <li>Don't display empty reasoning chunks</li> <li>Improved handling of truncated responses - table still displays to the user even if the SQL results were truncated when showing the agent.</li> </ul> </blockquote> <p>See <a href="https://datasette.io/blog/2026/datasette-agent/">Datasette Agent, an extensible AI assistant for Datasette</a>.</p> <p>Tags: <a href="https://simonwillison.net/tags/datasette">datasette</a>, <a href="https://simonwillison.net/tags/datasette-agent">datasette-agent</a></p>

  • datasette-agent-charts 0.1a2
    simonw· 21-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-agent-charts/releases/tag/0.1a2">datasette-agent-charts 0.1a2</a></p> <blockquote> <ul> <li>"View SQL query" buttons below rendered charts.</li> </ul> </blockquote> <p>Tags: <a href="https://simonwillison.net/tags/datasette">datasette</a>, <a href="https://simonwillison.net/tags/datasette-agent">datasette-agent</a></p>

  • Datasette Agent
    simonw· 21-may

    <p>We just <a href="https://datasette.io/blog/2026/datasette-agent/">announced the first release of Datasette Agent</a>, a new extensible AI assistant for Datasette. I've been working on my <a href="https://llm.datasette.io/">LLM</a> Python library for just over three years now, and Datasette Agent represents the moment that LLM and <a href="https://datasette.io/">Datasette</a> finally come together. I'm really excited about it!</p> <p>Datasette Agent provides a conversational interface for asking questions of the data you have stored in Datasette. Add the <a href="https://github.com/datasette/datasette-agent-charts">datasette-agent-charts</a> plugin and it can generate charts of your data as well.</p> <h4 id="the-demo">The demo</h4> <p>The <a href="">announcement post</a> (on the new Dat…

  • datasette-agent-sprites 0.1a0
    simonw· 21-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-agent-sprites/releases/tag/0.1a0">datasette-agent-sprites 0.1a0</a></p> <p>A Datasette Agent plugin for running commands in a <a href="https://sprites.dev">Fly Sprites</a> sandbox.</p> <p>Tags: <a href="https://simonwillison.net/tags/sandboxing">sandboxing</a>, <a href="https://simonwillison.net/tags/datasette">datasette</a>, <a href="https://simonwillison.net/tags/fly">fly</a>, <a href="https://simonwillison.net/tags/datasette-agent">datasette-agent</a></p>

  • FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service
    simonw· 22-may

    <p><strong><a href="https://www.ftc.gov/news-events/news/press-releases/2026/05/ftc-require-cox-media-group-two-other-firms-pay-nearly-1-million-settle-charges-they-deceived">FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service</a></strong></p> Back in 2024 Cox Media Group were caught trying to sell advertisers packages based on "active listening", with <a href="https://www.documentcloud.org/documents/25051283-cmg-pitch-deck-on-voice-data-advertising-active-listening/">this deck</a> which claimed:</p> <blockquote> <ul> <li>Smart devices capture real-time intent data by listening to our conversations</li> <li>Advertisers can pair this voice-data with behavioral data to target…