AI Digest

Digest curado

viernes, 08 de mayo de 2026·weekly-deep·deep·14,444 tokens

🔥 TOP — lo que SÍ o SÍ tenés que ver

  • Code w/ Claude 2026: el evento del año — Live blog con todas las novedades del keynote: el deal SpaceX/xAI para usar Colossus (280MW / ~$5B/año), multi-agent sessions, Outcomes en beta pública, webhooks para Managed Agents, filtrado de sesiones por estado, y más. Acá está TODO lo que anunció Anthropic. link
  • Claude Platform Release Notes (May 6) — Lo concreto que podés usar HOY: multi-agent sessions en public beta bajo managed-agents-2026-04-01, Outcomes (sesiones con estado final), vault credential refresh para MCP OAuth, y webhooks para Managed Agents con eventos de sesión/vault. link (si el link no carga, el contenido está detallado en el live blog)
  • Claude Code CVE-2026-39861: sandbox escape via symlink — Vulnerabilidad de seguridad publicada en GitHub Advisory. Si estás usando Claude Code en entornos multi-tenant o sin sandboxing estricto, esto te afecta directo. link
  • Mozilla usó Claude Mythos Preview para hardening de Firefox — Post fascinante de los Hacks de Mozilla: cómo usaron el preview de Mythos para encontrar y corregir CIENTOS de vulnerabilidades en Firefox. Detalle técnico fino de cómo cambió el juego de seguridad en open source. link
  • LCM: Lossless Context Management — paper que supera a Claude Code en long-context — Arquitectura determinística para memoria LLM que, usando Opus 4.6, supera a Claude Code en el benchmark OOLONG en TODAS las longitudes de contexto (32K a 1M tokens). Si te interesa multi-agent y optimización de contexto, este paper es obligatorio. link
  • Anthropic-SpaceX: el deal de Colossus I (300MW/$5B/yr) — Cobertura de Latent Space con los números reales y el contexto geopolítico. La jugada más grande de Anthropic en infraestructura. link
  • Anthropic lanza agents para financial services — Repositorio oficial con reference agents, skills y data connectors para investment banking, equity research, private equity. Disponible como Claude Cowork plugin o vía Managed Agents API. Si hacés SaaS B2B, esto define un nuevo estándar de cómo se empaquetan agentes verticales. link
  • Open Agents de Vercel Labs: template open source para coding agents en cloud — Arquitectura de tres capas (Web → Agent workflow → Sandbox VM) para correr coding agents en background sin tu laptop. La decisión clave: el agente NO corre dentro del sandbox. Forkeable y adaptable. link
  • "When Context Hurts" — crossover effect en multi-agent design exploration — Paper que muestra que más contexto NO siempre es mejor: en algunos tasks mejora 20x, en otros degrada 46%. Identifica una variable medible que predice la dirección (Pearson r = -0.82). Fundamental si diseñás sistemas multi-agente. link

📦 Claude / Anthropic ecosystem

  • Enterprise AI services company: Anthropic + Blackstone + Hellman & Friedman + Goldman Sachs — Nueva empresa de servicios de AI enterprise. Señal de que Anthropic está moviéndose fuerte a servicios profesionales, no solo API. link
  • Higher usage limits for Claude + compute deal con SpaceX — Anuncio oficial de Anthropic sobre nuevos límites más altos para Claude y el acuerdo con SpaceX/xAI para usar su datacenter Colossus. link
  • Silicon Valley se pone serio con Services — Latent Space analiza la serie de anuncios de Anthropic sobre servicios como la próxima gran oportunidad. link
  • Mythos threw the White House AI strategy into chaos (WSJ) — El impacto político de Mythos, cubierto por WSJ. Contexto importante para entender el panorama regulatorio. link
  • Notes on the xAI/Anthropic data center deal — Análisis de Simon Willison sobre el deal, incluyendo el controversial historial ambiental del datacenter Colossus (turbinas de gas sin permisos Clean Air Act). link

🛠️ Dev tools & coding

  • addyosmani/agent-skills: production-grade engineering skills para AI coding agents — Skills empaquetadas que codifican workflows, quality gates y best practices de senior engineers. Siete slash commands para el ciclo de vida completo (spec, plan, build, test, review, ship). link
  • 9Router: FREE AI Router & Token Saver — Conectá Claude Code, Cursor, Copilot, etc. a 40+ providers y 100+ modelos con auto-fallback y RTK que ahorra 20-40% tokens. Nunca más te frenan los rate limits. link
  • dflash: Block Diffusion para Flash Speculative Decoding — Modelo de difusión ligero para speculative decoding paralelo. Soporta Gemma 4, Qwen 3.5/3.6, MiniMax, Kimi K2.5. Si hacés inferencia propia, esto te puede dar speedups enormes. link
  • InsForge: Postgres-based backend para coding agents — Backend platform con semantic layer que los agents pueden entender y operar end-to-end. DB, auth, storage, compute, AI gateway. link
  • PageIndex: vectorless, reasoning-based RAG — RAG sin vector DB ni chunking, con retrieval basado en razonamiento. Soporta MCP y API. Alternativa interesante si estás cansado de los problemas de chunking. link
  • Lattice framework (Rahul Garg / Martin Fowler) — Framework open source para operationalizar patrones de AI-assisted programming. Tres tiers de skills (atoms, molecules, refiners) con Clean Architecture, DDD, secure coding. link
  • Local Deep Research: research assistant local y encriptado — ~95% en SimpleQA con Qwen3.6-27B en una RTX 3090. Soporta LLMs locales y cloud, 10+ search engines. Alternativa open source a deep research. link

🏗️ Software engineering

  • Container Design Patterns for Distributed Systems (ByteByteGo) — Los patrones que cristalizaron en la última década, organizados por scope de coordinación. Lectura sólida para mantener las bases. link
  • How Cloudflare responded to "Copy Fail" Linux vulnerability — Detalle de cómo detectaron, investigaron y mitigaron una critical Linux kernel privilege escalation en toda su flota global con cero impacto. link
  • Cloudflare: Code Orange "Fail Small" complete — Ingeniería masiva para hacer la infraestructura más resiliente. Nuevos tools: Snapstone y Engineering Codex para cambios de configuración más seguros. link
  • Cloudflare Dynamic Workflows: durable execution que sigue al tenant — Librería para routear durable execution a código provisto por el tenant. Permite servir millones de workflows únicos a costo idle casi cero. Patrón relevante para SaaS multi-tenant. link
  • When DNSSEC goes wrong: .de TLD outage — Post-mortem de Cloudflare sobre cómo respondieron cuando DENIC publicó firmas DNSSEC rotas, dejando millones de dominios inaccesibles. Cómo serve stale cushionó el impacto. link
  • Netflix: Model Lifecycle Graph para democratizar ML — Cómo Netflix construyó un grafo del lifecycle de modelos para que equipos no-ML puedan entrenar y desplegar modelos. link
  • Netflix: State of Routing in Model Serving — Cómo Netflix maneja routing de requests en serving de modelos. Decisiones de producción real en scale. link

📚 Vale la pena leer

  • Agent Island: benchmark dinámico multi-agente que resiste saturación y contaminación — Simulación multiplayer donde agents compiten en cooperación y conflicto. Ranking Bayesian Plackett-Luce sobre 999 games y 49 modelos. GPT-5.5 domina. link
  • "More context is better" es FALSO: crossover effect en multi-agent design — Paper imperdible: 10 tasks, 7 condiciones de contexto, 2700+ runs. Contexto mejora 20x algunos tasks y degrada 46% otros. Una variable medible predice todo. link
  • Webdevbench: evaluando AI como agencias de desarrollo web — Benchmark que mide AI como agencia de desarrollo completa, no solo como coding assistant. link
  • Doing Vibe Physics: GPT-5.x derivó nuevos resultados en física teórica — Entrevista de Latent Space con Alex Lupsasca (OpenAI) sobre cómo GPT-5.x generó resultados nuevos en gravedad cuántica. link
  • Vibe coding and agentic engineering se están juntando — y eso asusta a Simon — Reflexión de Simon Willison sobre cómo vibe coding y agentic engineering están convergiendo en su propio workflow. link
  • Martin Fowler: revisita The Mythical Man-Month en 2026 — Brooks' Law, conceptual integrity, y qué sigue vigente del libro clásico de 1975. link
  • ByteByteGo: MCP vs Skills, clearly explained — Diferencia fundamental: MCP y Skills resuelven problemas distintos, elegir mal te cuesta plata o complejidad. link
  • ByteByteGo: Connecting LLMs to the Real World (Tool Use, Function Calling, MCP) — Evolución desde tool use básico hasta MCP. link

💤 Skippeable pero conviene saber

  • "Our AI started a cafe in Stockholm" — Andon Labs replica su experimento de tienda AI-run en San Francisco ahora como cafe. Anécdotas divertidas (120 eggs sin horno, 22.5kg de tomates enlatados). link
  • Ask HN: How are you sandboxing AI agents and developer CLIs? — Discusión activa en HN sobre sandboxing para coding agents. link
  • AI at Discount (Tom Tunguz) — Análisis de mercado sobre la commoditización de AI y la presión de precios. link
  • Three Mile Island restart moves ahead with Microsoft AI deal — La planta nuclear revive para alimentar datacenters de AI. link
  • PARSE: Parallel Prefix Verification for Speculative Generation — Framework de speculative decoding que paraleliza prefix verification a nivel semántico en vez de token-level. link
  • DocuSeal: open source DocuSign alternative — Si tu restaurant SaaS necesita firmas digitales, esta alternativa open source puede servirte. link

Artículos fetched (60)

  • Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
    anthropic-news· 04-may

    May 4, 2026Announcements

  • Higher usage limits for Claude and a compute deal with SpaceX
    anthropic-news· 06-may

    May 6, 2026Announcements

  • Agents for financial services
    anthropic-news· 05-may

    May 5, 2026Announcements

  • Parallel Prefix Verification for Speculative Generation
    arxiv-ai· 08-may

    arXiv:2605.04263v1 Announce Type: new Abstract: We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a dra…

  • Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
    arxiv-ai· 08-may

    arXiv:2605.04312v1 Announce Type: new Abstract: Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We introduce Agent Island, a multiplayer simulation environment in which language-model agents compete in a game of interagent cooperation, conflict, and persuasion. The environment yields a dynamic benchmark designed to mitigate both saturation and contamination; new models can always outperform the current leading player in this winner-take-all game, and agents compete against other adaptive agents rather than face a fixed task set. We rank players with a Bayesian Plackett-Luce model, allowing us to quantify uncertainty in player skill. In 999 games involving 49 unique models, openai/gpt-5.5 dominates its p…

  • The Scaling Properties of Implicit Deductive Reasoning in Transformers
    arxiv-ai· 08-may

    arXiv:2605.04330v1 Announce Type: new Abstract: We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

  • When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration
    arxiv-ai· 08-may

    arXiv:2605.04361v1 Announce Type: new Abstract: The prevailing assumption in agent orchestration is that more context is better. We test this on multi-agent software design across 10 tasks, 7 context-injection conditions, and over 2,700 runs, and find a crossover effect: the same artifact type improves design exploration on some tasks (up to 20$\times$ tradeoff coverage) and actively degrades it on others (up to 46% reduction). On several tasks, an irrelevant document performs as well as or better than every relevant artifact. The direction is predicted by a single measurable variable--baseline exploration without context--with Pearson $r = -0.82$ ($p < 0.001$). Probing the mechanism by manipulating convergence pressure through prompt design reveals two distinct regimes: convergence drive…

  • ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor
    arxiv-ai· 08-may

    arXiv:2605.04193v1 Announce Type: new Abstract: Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-bas…

  • Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
    arxiv-ai· 08-may

    arXiv:2605.04227v1 Announce Type: new Abstract: Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-orient…

  • Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
    arxiv-ai· 08-may

    arXiv:2605.04243v1 Announce Type: new Abstract: Despite significant advances, large language models (LLMs) continue to exhibit brittle performance on complex temporal reasoning tasks. This failure mode is widely attributed to inherent deficits in autoregressive logical deduction. In this paper, we challenge this prevailing narrative, demonstrating that temporal reasoning is not the fundamental bottleneck; rather, the locus of failure lies in unstructured text-to-event representation. We introduce a novel neuro-symbolic question-answering framework governed by a Probabilistic Inconsistency Signal (PIS) that explicitly isolates perceptual errors from reasoning failures. By lifting unstructured text into explicit event graphs and interval constraints, our architecture strictly decouples sema…

  • Regularized Centered Emphatic Temporal Difference Learning
    arxiv-ai· 08-may

    arXiv:2605.04100v1 Announce Type: new Abstract: Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on emphasis, but the follow-on trace can have high variance. We revisit this tradeoff through Bellman-error centering. Although centering naturally removes a common drift term from TD errors, we show that a naive centered emphatic extension introduces an auxiliary coupling that can destroy the positive-definiteness of the ETD key matrix. We propose \emph{Regularized Emphatic Temporal-Difference Learning} (RETD), which preserves the follow-on trace and regularizes only the auxiliary centering recursion, corre…

  • Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs
    arxiv-ai· 08-may

    arXiv:2605.04169v1 Announce Type: new Abstract: Surgical team performance arises from complex interactions between technical execution and non-technical skills, including communication and coordination dynamics. However, current surgical AI systems predominantly model visual workflow signals, lacking structured representations of intraoperative team interactions over time. We propose a real-time actionable approach for modeling surgical team dynamics using time-expanded interaction graphs, where team members are modeled as time-indexed nodes and communication exchanges define directed edges. This spatio-temporal expansion enables dynamic interaction modeling, while allowing efficient inference with a static graph neural network. The model predicts procedural efficiency as the deviation fr…

  • LCM: Lossless Context Management
    arxiv-ai· 08-may

    arXiv:2605.04050v1 Announce Type: new Abstract: We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens. LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access. LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recurs…

  • How Instacart Built a Search for Billions of Products
    bytebytego· 05-may

    In this article, we will learn how Instacart’s search infrastructure evolved over the years and the challenges its engineering team faced.May 5 • ByteByteGo275211

  • Connecting LLMs to the Real World: Tool Use, Function Calling, and MCP
    bytebytego· 04-may

    In this article, we will look at this progression that has happened from basic tool use to function calling to the Model Context Protocol, allowing the…May 4 • ByteByteGo328210

  • Container Design Patterns for Distributed Systems
    bytebytego· 07-may

    In this article, we’ll walk through the patterns that have crystallized over the past decade, organized by the scope of their coordination.14 hrs ago • ByteByteGo883

  • EP213: MCP vs Skills, Clearly Explained
    bytebytego· 02-may

    Both MCP and Skills extend what an agent can do. But they solve different problems, and picking the wrong one adds cost or complexity you don't need.May 2 • ByteByteGo329619

  • Claude Platform
    claude-changelog

    Release notesCopy pageUpdates to the Claude Platform, including the Claude API, client SDKs, and the Claude Console.Copy pageFor release notes on Claude Apps, see the Release notes for Claude Apps in the Claude Help Center.For updates to Claude Code, see the complete CHANGELOG.md in the claude-code repository. May 6, 2026 Multiagent sessions and Outcomes are now in public beta under the standard managed-agents-2026-04-01 beta header. Vault credential background refresh is now supported for mcp_oauth credentials. See Authenticate with vaults. Webhooks for Claude Managed Agents are now supported. Webhook event types include session and vault lifecycle events. See Subscribe to webhooks. Additional filtering and sorting options are now supported. Sessions can be filtered by status, and events…

  • Loading...
    claude-changelog

    Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...

  • How Cloudflare responded to the “Copy Fail” Linux vulnerability
    cloudflare· 07-may

    When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no malicious exploitation.

  • When DNSSEC goes wrong: how we responded to the .de TLD outage
    cloudflare· 06-may

    On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here's what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.

  • Building for the future
    cloudflare· 07-may

    This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloudflare.

  • Code Orange: Fail Small is complete. The result is a stronger Cloudflare network
    cloudflare· 01-may

    We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best practices to prevent future incidents.

  • Introducing Dynamic Workflows: durable execution that follows the tenant
    cloudflare· 01-may

    Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.

  • decolua/9router
    github-trending

    🆓 Unlimited FREE AI coding. Connect Claude Code, Codex, Cursor, Cline, Copilot, Antigravity to FREE Claude/GPT/Gemini via 40+ providers. Auto-fallback, RTK -40% tokens, never hit limits. 9Router - FREE AI Router & Token Saver Never stop coding. Save 20-40% tokens with RTK + auto-fallback to FREE & cheap AI models. Connect All AI Code Tools (Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, OpenClaw...) to 40+ AI Providers & 100+ Models. 🚀 Quick Start • 💡 Features • 📖 Setup • 🌐 Website 🇻🇳 Tiếng Việt • 🇨🇳 中文 • 🇯🇵 日本語 🤔 Why 9Router? Stop wasting money, tokens and hitting limits: ❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provid…

  • addyosmani/agent-skills
    github-trending

    Production-grade engineering skills for AI coding agents. Agent Skills Production-grade engineering skills for AI coding agents. Skills encode the workflows, quality gates, and best practices that senior engineers use when building software. These ones are packaged so AI agents follow them consistently across every phase of development. DEFINE PLAN BUILD VERIFY REVIEW SHIP ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ Idea │ ───▶ │ Spec │ ───▶ │ Code │ ───▶ │ Test │ ───▶ │ QA │ ───▶ │ Go │ │Refine│ │ PRD │ │ Impl │ │Debug │ │ Gate │ │ Live │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ /spec /plan /build /test /review /ship Commands 7 slash commands that map to the development lifecycle. Each one activates the right skills automatically. What you're doing Command Key pr…

  • anthropics/financial-services
    github-trending

    Claude for Financial Services Reference agents, skills, and data connectors for the financial-services workflows we see most — investment banking, equity research, private equity, and wealth management. Everything here is available two ways from one source: install it as a Claude Cowork plugin, or deploy it through the Claude Managed Agents API behind your own workflow engine. Same system prompt, same skills — you choose where it runs. Important Nothing in this repository constitutes investment, legal, tax, or accounting advice. These agents draft analyst work product — models, memos, research notes, reconciliations — for review by a qualified professional. They do not make investment recommendations, execute transactions, bind risk, post to a ledger, or approve onboarding; every output i…

  • docusealco/docuseal
    github-trending

    Open source DocuSign alternative. Create, fill, and sign digital documents ✍️ DocuSeal Open source document filling and signing DocuSeal is an open source platform that provides secure and efficient digital document signing and processing. Create PDF forms to have them filled and signed online on any device with an easy-to-use, mobile-optimized web tool. ✨ Live Demo | ☁️ Try in Cloud Features PDF form fields builder (WYSIWYG) 12 field types available (Signature, Date, File, Checkbox etc.) Multiple submitters per document Automated emails via SMTP Files storage on disk or AWS S3, Google Storage, Azure Cloud Automatic PDF eSignature PDF signature verification Users management Mobile-optimized 7 UI languages with signing available in 14 languages API and Webhooks for integrations Easy to dep…

  • InsForge/InsForge
    github-trending

    InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents. The backend platform for AI-native developers. ⭐ Help us reach more developers and grow the InsForge community. Star this repo! InsForge InsForge is a backend development platform built for AI coding agents and AI code editors. It exposes backend primitives like databases, auth, storage, and functions through a semantic layer that agents can understand, reason about, and operate end to end. How it works InsForge acts as a semantic layer between AI coding agents and backend primitives. It performs backend context engineering so agents can understand, operate, and inspect backend systems. Fetch backend context: Agents can fetch documentation and available operations for the ba…

  • LearningCircuit/local-deep-research
    github-trending

    ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. Local Deep Research AI-powered research assistant for deep, agentic research Performs deep, agentic research using multiple LLMs and search engines with proper citations ▶️ Watch Review by The Art Of The Terminal 🚀 What is Local Deep Research? AI research assistant you control. Run locally for privacy, use any LLM and build your own searchable knowledge base. You own your data and see exactly how it works. ⚡ Quick Start Option 1: Docker Run (Linux) # Step 1: Pull and run Ollama docker run -d -p 11434:11434 --name ollama ollama/ollama docker exec ollama ollama pull gpt-oss:20b # Step 2: …

  • VectifyAI/PageIndex
    github-trending

    📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB or Chunking ◦ Context-Aware ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple agentic, vectorless RAG example with self-hosted PageIndex, using OpenAI Agents SDK. Scale PageIndex to Millions of Documents — PageIndex File System is a file-level tree layer that lets PageIndex reason over an entire corpus, not just a single document, enabling massive-scale document search. PageIndex Chat — Human-like document analysis agent platform for professional long documents. Also available via MCP or API. PageIndex Framework — Deep dive into PageIndex: an age…

  • vercel-labs/open-agents
    github-trending

    An open source template for building cloud agents. Open Agents Open Agents is an open-source reference app for building and running background coding agents on Vercel. It includes the web UI, the agent runtime, sandbox orchestration, and the GitHub integration needed to go from prompt to code changes without keeping your laptop involved. The repo is meant to be forked and adapted, not treated as a black box. What it is Open Agents is a three-layer system: Web -> Agent workflow -> Sandbox VM The web app handles auth, sessions, chat, and streaming UI. The agent runs as a durable workflow on Vercel. The sandbox is the execution environment: filesystem, shell, git, dev servers, and preview ports. The key architectural decision: the agent is not the sandbox The agent does not run inside the VM…

  • z-lab/dflash
    github-trending

    DFlash: Block Diffusion for Flash Speculative Decoding DFlash: Block Diffusion for Flash Speculative Decoding Paper | Blog | Models DFlash is a lightweight block diffusion model designed for speculative decoding. It enables efficient and high-quality parallel drafting. https://github.com/user-attachments/assets/5b29cabb-eb95-44c9-8ffe-367c0758de8c Supported Models Model DFlash Draft gemma-4-26B-A4B-it z-lab/gemma-4-26B-A4B-it-DFlash gemma-4-31B-it z-lab/gemma-4-31B-it-DFlash Qwen3.6-27B z-lab/Qwen3.6-27B-DFlash Qwen3.6-35B-A3B z-lab/Qwen3.6-35B-A3B-DFlash MiniMax-M2.5 (Preview) z-lab/MiniMax-M2.5-DFlash Kimi-K2.5 z-lab/Kimi-K2.5-DFlash Qwen3.5-4B z-lab/Qwen3.5-4B-DFlash Qwen3.5-9B z-lab/Qwen3.5-9B-DFlash Qwen3.5-27B z-lab/Qwen3.5-27B-DFlash Qwen3.5-35B-A3B z-lab/Qwen3.5-35B-A3B-DFlash Qwe…

  • AI at Discount
    hn-ai· 08-may

    Article URL: https://tomtunguz.com/ai-at-discount/ Comments URL: https://news.ycombinator.com/item?id=48057930 Points: 1 # Comments: 0

  • Anthropic's Mythos Threw the White House AI Strategy into Chaos
    hn-ai· 08-may

    Article URL: https://www.wsj.com/tech/ai/trump-ai-anthropic-mythos-regulation-2378971f Comments URL: https://news.ycombinator.com/item?id=48057717 Points: 2 # Comments: 0

  • Show HN: Loxai.tech and Neutboom – Gen AI's frontier of individuality
    hn-ai· 08-may

    Hi- I hope you’re all having a good day so far! I'm not sure where to post this but I do have two things for you guys today, related to the AI space: individuality and a new era to AI! So: - LLM wrappers are crumbling- these businesses will not survive as foundational models begin to offer the functionalities consumers have been seeking. This maybe represents a failure in interpretation of individuality as horizontal growth (these startups didn't last too long...), which brings up the next point. - Individuality and ego are human needs. In the past year, there's been a boom in using AI to personalise products to consumers. A take on this: are companies really doing "individuality as horizontal growth" as consumers would like them to? Do consumers really need businesses to tell them that t…

  • Ask HN: How are you sandboxing AI agents and developer CLIs?
    hn-ai· 08-may

    Comments URL: https://news.ycombinator.com/item?id=48058747 Points: 1 # Comments: 0

  • Claude Code CVE-2026-39861:sandbox escape via symlink
    hn-ai· 08-may

    Article URL: https://github.com/advisories/GHSA-vp62-r36r-9xqp Comments URL: https://news.ycombinator.com/item?id=48057842 Points: 2 # Comments: 1

  • Sley is live: the first native AI programming language
    hn-ai· 08-may

    Article URL: https://github.com/GreyforgeLabs/sley Comments URL: https://news.ycombinator.com/item?id=48058152 Points: 2 # Comments: 2

  • Show HN: AnamDB – An AI-native, differentiable Datalog engine written in Rust
    hn-ai· 08-may

    Article URL: https://github.com/jam5991/anam Comments URL: https://news.ycombinator.com/item?id=48057731 Points: 1 # Comments: 0

  • Webdevbench: Evaluating AI as software development agencies
    hn-ai· 08-may

    Article URL: https://webdevbench-ai-benchmarks.qwikbuild.site/ Comments URL: https://news.ycombinator.com/item?id=48058718 Points: 1 # Comments: 0

  • The AI Revival of the Three Mile Island Nuclear Plant
    hn-ai· 08-may

    Article URL: https://www.bloomberg.com/news/features/2026-05-07/three-mile-island-restart-moves-ahead-with-microsoft-ai-deal Comments URL: https://news.ycombinator.com/item?id=48058663 Points: 1 # Comments: 0

  • 🔬Doing Vibe Physics — Alex Lupsasca, OpenAI
    latentspace· 05-may

    The full story of how GPT‑5.x derived new results in theoretical physics and quantum gravity.

  • [AINews] The Other vs The Utility
    latentspace· 04-may

    a quiet day lets us reflect on the nature of AI "character" in the Clippy vs Anton debate

  • [AINews] Silicon Valley gets Serious about Services
    latentspace· 06-may

    A series of announcements line up to a big theme: Services are the next big opportunity.

  • [AINews] Anthropic-SpaceXai's 300MW/$5B/yr deal for Colossus I, ARR growth is 8000% annualized
    latentspace· 07-may

    And the kingmaker picks a side.

  • Fragments: May 5
    martin-fowler· 05-may

    Over the last couple of months Rahul Garg published a series of posts here on how to reduce the friction in AI-assisted programming. To make it easier to put these ideas into practice he’s now built an open-source framework to operationalize these patterns. AI coding assistants jump straight to code, silently make design decisions, forget constraints mid-conversation, and produce output nobody reviewed against real engineering standards. Lattice fixes this with composable skills in three tiers – atoms, molecules, refiners – that embed battle-tested engineering disciplines (Clean Architecture, DDD, design-first methodology, secure coding, and more), plus a living context layer (the .lattice/ folder) that accumulates your project’s standards, decisions, and review insights. The system gets …

  • Bliki: Mythical Man Month
    martin-fowler· 05-may

    In the early 1960s, Fred Brooks managed the development of IBM's System/360 computer systems. After it was done he penned his thoughts in the book The Mythical Man-Month which became one of the most influential books on software development after its publication in 1975. Reading it in 2026, we'll find some of it outdated, but it also retains many lessons that are still relevant today. The book contains Brooks's law: “Adding manpower to a late software project makes it later.” The issue here is communication, as the number of people grows, the number of communication paths between those people grows exponentially. Unless these paths are skillfully designed, then work quickly falls apart. Perhaps my most enduring lesson from this book is the importance of conceptual integrity I will contend…

  • State of Routing in Model Serving
    netflix-tech· 01-may
  • Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph
    netflix-tech· 04-may
  • GitHub Repo Stats
    simonw· 07-may

    <p><strong>Tool:</strong> <a href="https://tools.simonwillison.net/github-repo-stats">GitHub Repo Stats</a></p> <p>One of the things I always look for when evaluating a new GitHub repository is the number of commits it has... but that number isn't visible on GitHub's mobile site layout. I built this tool to fix that, using this prompt:</p> <blockquote> <p><code>Given a GitHub repo URL or foo/bar repo ID show information about that repo absorbed via wither REST or graphql CORS fetch() including the number of commits in the repo and other useful stats</code></p> </blockquote> <p>Example output for <a href="https://tools.simonwillison.net/github-repo-stats?repo=simonw%2Fdatasette">simonw/datasette</a> and <a href="https://tools.simonwillison.net/github-repo-stats?repo=simonw%2Fllm">simonw/ll…

  • Behind the Scenes Hardening Firefox with Claude Mythos Preview
    simonw· 07-may

    <p><strong><a href="https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/">Behind the Scenes Hardening Firefox with Claude Mythos Preview</a></strong></p> Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox:</p> <blockquote> <p><strong>Suddenly, the bugs are very good</strong></p> <p>Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it.</p> <p>It is difficult to overstate how much this …

  • Big Words
    simonw· 07-may

    <p><strong>Tool:</strong> <a href="https://tools.simonwillison.net/big-words">Big Words</a></p> <p>I'm using my <a href="https://simonwillison.net/2026/Feb/25/present/">vibe coded macOS presentations tool</a> to put together a talk, and I wanted to add a slide with some text on it. The tool only accepts URLs, so I <a href="https://github.com/simonw/tools/pull/279">put together</a> a quick page that accepts query string arguments and turns them into a simple slide.</p> <p>Here's an example: <a href="https://tools.simonwillison.net/big-words?text=simonwillison.net&amp;gradient=1&amp;size=9.5">https://tools.simonwillison.net/big-words?text=simonwillison.net&amp;gradient=1&amp;size=9.5</a></p> <p>Double click or double tap the page to access a form for modifying the different options.</p> <p>…

  • Vibe coding and agentic engineering are getting closer than I'd like
    simonw· 06-may

    <p>I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: <a href="https://www.heavybit.com/library/podcasts/high-leverage/ep-9-the-ai-coding-paradigm-shift-with-simon-willison">Ep. #9, The AI Coding Paradigm Shift with Simon Willison</a>. Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work.</p> <p>One thing I really enjoy about podcasts is that they sometimes push me to think out loud in a way that exposes an idea I've not previously been able to put into words.</p> <h4 id="vibe-coding-and-agentic-engineering-are-starting-to-overlap">Vibe coding and agentic engineering are starting to overlap</h4> <p>A few weeks after vibe coding was first coined I pu…

  • Notes on the xAI/Anthropic data center deal
    simonw· 07-may

    <p>There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center".</p> <p>As I mentioned in my <a href="https://simonwillison.net/2026/May/6/code-w-claude-2026/">live blog of the keynote</a>, that's the one with the <a href="https://www.politico.com/news/2025/05/06/elon-musk-xai-memphis-gas-turbines-air-pollution-permits-00317582">particularly bad environmental record</a>. The gas turbines installed to power the facility initially ran without Clean Air Act permits or pollution control devices, which they got away with by classifying them as "temporary". Credible reports link it to increases in hospital admissions relating to low ai…

  • Our AI started a cafe in Stockholm
    simonw· 05-may

    <p><strong><a href="https://andonlabs.com/blog/ai-cafe-stockholm">Our AI started a cafe in Stockholm</a></strong></p> Andon Labs previously <a href="https://andonlabs.com/blog/andon-market-launch">started an AI-run retail store</a> in San Francisco. Now they're running a similar experiment in Stockholm, Sweden, only this time it's a cafe.</p> <p>These experiments are interesting, and often throw out amusing anecdotes:</p> <blockquote> <p>During the first week of inventory, Mona ordered 120 eggs even though the café has no stove. When the staff told her they couldn’t cook them, she suggested using the high-speed oven, until they pointed out the eggs would likely explode. She also tried to solve the problem of fresh tomatoes being spoiled too fast by ordering 22.5 kg of canned tomatoes for …

  • datasette-referrer-policy 0.1
    simonw· 05-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-referrer-policy/releases/tag/0.1">datasette-referrer-policy 0.1</a></p> <p>The OpenStreetMap tiles on the Datasette <a href="https://datasette.io/global-power-plants/global-power-plants">global-power-plants demo</a> weren't displaying correctly. This turned out to be caused by two bugs.</p> <p>The first is that the CAPTCHA <a href="https://github.com/simonw/datasette-turnstile">I added</a> to that site a few weeks ago was triggering for the <code>.json</code> fetch requests used by the map plugin, and since those weren't HTML the user was not being asked to solve them. Here's <a href="https://github.com/simonw/datasette.io/commit/23a1c8596b75b2094db46035a3b4280109fb3df3">the fix</a>.</p> <p>The second was that Op…

  • datasette-llm 0.1a7
    simonw· 05-may

    <p><strong>Release:</strong> <a href="https://github.com/datasette/datasette-llm/releases/tag/0.1a7">datasette-llm 0.1a7</a></p> <blockquote> <ul> <li>Mechanism for <a href="https://github.com/datasette/datasette-llm/blob/main/README.md#configuration">configuring default options</a> for specific models.</li> </ul> </blockquote> <p>Part of Datasette's evolving support mechanism for plugins that use LLMs. It's now possible to configure a model with default options, e.g. to say all <a href="https://github.com/datasette/datasette-enrichments-llm">enrichment</a> operations should use a specific model with temperature set to 0.5.</p> <p>Tags: <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/tags/datasette">datasette</a></p>

  • Live blog: Code w/ Claude 2026
    simonw· 06-may

    <p>I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions.</p> <p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/live-blog">live-blog</a></p>

  • llm-gemini 0.31
    simonw· 07-may

    <p><strong>Release:</strong> <a href="https://github.com/simonw/llm-gemini/releases/tag/0.31">llm-gemini 0.31</a></p> <blockquote> <ul> <li><code>gemini-3.1-flash-lite</code> is <a href="https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available">no longer a preview</a>. </li> </ul> </blockquote> <p>Here's my write-up of the <a href="https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/">Gemini 3.1 Flash-Lite Preview model</a> back in March. I don't believe this new non-preview model has changed since then.</p> <p>Tags: <a href="https://simonwillison.net/tags/llm-release">llm-release</a>, <a href="https://simonwillison.net/tags/gemini">gemini</a>, <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/…