AI Digest

← Volver

Digest curado

viernes, 08 de mayo de 2026·weekly-deep·deep·14,444 tokens

🔥 TOP — lo que SÍ o SÍ tenés que ver

Code w/ Claude 2026: el evento del año — Live blog con todas las novedades del keynote: el deal SpaceX/xAI para usar Colossus (280MW / ~$5B/año), multi-agent sessions, Outcomes en beta pública, webhooks para Managed Agents, filtrado de sesiones por estado, y más. Acá está TODO lo que anunció Anthropic. link
Claude Platform Release Notes (May 6) — Lo concreto que podés usar HOY: multi-agent sessions en public beta bajo managed-agents-2026-04-01, Outcomes (sesiones con estado final), vault credential refresh para MCP OAuth, y webhooks para Managed Agents con eventos de sesión/vault. link (si el link no carga, el contenido está detallado en el live blog)
Claude Code CVE-2026-39861: sandbox escape via symlink — Vulnerabilidad de seguridad publicada en GitHub Advisory. Si estás usando Claude Code en entornos multi-tenant o sin sandboxing estricto, esto te afecta directo. link
Mozilla usó Claude Mythos Preview para hardening de Firefox — Post fascinante de los Hacks de Mozilla: cómo usaron el preview de Mythos para encontrar y corregir CIENTOS de vulnerabilidades en Firefox. Detalle técnico fino de cómo cambió el juego de seguridad en open source. link
LCM: Lossless Context Management — paper que supera a Claude Code en long-context — Arquitectura determinística para memoria LLM que, usando Opus 4.6, supera a Claude Code en el benchmark OOLONG en TODAS las longitudes de contexto (32K a 1M tokens). Si te interesa multi-agent y optimización de contexto, este paper es obligatorio. link
Anthropic-SpaceX: el deal de Colossus I (300MW/$5B/yr) — Cobertura de Latent Space con los números reales y el contexto geopolítico. La jugada más grande de Anthropic en infraestructura. link
Anthropic lanza agents para financial services — Repositorio oficial con reference agents, skills y data connectors para investment banking, equity research, private equity. Disponible como Claude Cowork plugin o vía Managed Agents API. Si hacés SaaS B2B, esto define un nuevo estándar de cómo se empaquetan agentes verticales. link
Open Agents de Vercel Labs: template open source para coding agents en cloud — Arquitectura de tres capas (Web → Agent workflow → Sandbox VM) para correr coding agents en background sin tu laptop. La decisión clave: el agente NO corre dentro del sandbox. Forkeable y adaptable. link
"When Context Hurts" — crossover effect en multi-agent design exploration — Paper que muestra que más contexto NO siempre es mejor: en algunos tasks mejora 20x, en otros degrada 46%. Identifica una variable medible que predice la dirección (Pearson r = -0.82). Fundamental si diseñás sistemas multi-agente. link

📦 Claude / Anthropic ecosystem

Enterprise AI services company: Anthropic + Blackstone + Hellman & Friedman + Goldman Sachs — Nueva empresa de servicios de AI enterprise. Señal de que Anthropic está moviéndose fuerte a servicios profesionales, no solo API. link
Higher usage limits for Claude + compute deal con SpaceX — Anuncio oficial de Anthropic sobre nuevos límites más altos para Claude y el acuerdo con SpaceX/xAI para usar su datacenter Colossus. link
Silicon Valley se pone serio con Services — Latent Space analiza la serie de anuncios de Anthropic sobre servicios como la próxima gran oportunidad. link
Mythos threw the White House AI strategy into chaos (WSJ) — El impacto político de Mythos, cubierto por WSJ. Contexto importante para entender el panorama regulatorio. link
Notes on the xAI/Anthropic data center deal — Análisis de Simon Willison sobre el deal, incluyendo el controversial historial ambiental del datacenter Colossus (turbinas de gas sin permisos Clean Air Act). link

🛠️ Dev tools & coding

addyosmani/agent-skills: production-grade engineering skills para AI coding agents — Skills empaquetadas que codifican workflows, quality gates y best practices de senior engineers. Siete slash commands para el ciclo de vida completo (spec, plan, build, test, review, ship). link
9Router: FREE AI Router & Token Saver — Conectá Claude Code, Cursor, Copilot, etc. a 40+ providers y 100+ modelos con auto-fallback y RTK que ahorra 20-40% tokens. Nunca más te frenan los rate limits. link
dflash: Block Diffusion para Flash Speculative Decoding — Modelo de difusión ligero para speculative decoding paralelo. Soporta Gemma 4, Qwen 3.5/3.6, MiniMax, Kimi K2.5. Si hacés inferencia propia, esto te puede dar speedups enormes. link
InsForge: Postgres-based backend para coding agents — Backend platform con semantic layer que los agents pueden entender y operar end-to-end. DB, auth, storage, compute, AI gateway. link
PageIndex: vectorless, reasoning-based RAG — RAG sin vector DB ni chunking, con retrieval basado en razonamiento. Soporta MCP y API. Alternativa interesante si estás cansado de los problemas de chunking. link
Lattice framework (Rahul Garg / Martin Fowler) — Framework open source para operationalizar patrones de AI-assisted programming. Tres tiers de skills (atoms, molecules, refiners) con Clean Architecture, DDD, secure coding. link
Local Deep Research: research assistant local y encriptado — ~95% en SimpleQA con Qwen3.6-27B en una RTX 3090. Soporta LLMs locales y cloud, 10+ search engines. Alternativa open source a deep research. link

🏗️ Software engineering

Container Design Patterns for Distributed Systems (ByteByteGo) — Los patrones que cristalizaron en la última década, organizados por scope de coordinación. Lectura sólida para mantener las bases. link
How Cloudflare responded to "Copy Fail" Linux vulnerability — Detalle de cómo detectaron, investigaron y mitigaron una critical Linux kernel privilege escalation en toda su flota global con cero impacto. link
Cloudflare: Code Orange "Fail Small" complete — Ingeniería masiva para hacer la infraestructura más resiliente. Nuevos tools: Snapstone y Engineering Codex para cambios de configuración más seguros. link
Cloudflare Dynamic Workflows: durable execution que sigue al tenant — Librería para routear durable execution a código provisto por el tenant. Permite servir millones de workflows únicos a costo idle casi cero. Patrón relevante para SaaS multi-tenant. link
When DNSSEC goes wrong: .de TLD outage — Post-mortem de Cloudflare sobre cómo respondieron cuando DENIC publicó firmas DNSSEC rotas, dejando millones de dominios inaccesibles. Cómo serve stale cushionó el impacto. link
Netflix: Model Lifecycle Graph para democratizar ML — Cómo Netflix construyó un grafo del lifecycle de modelos para que equipos no-ML puedan entrenar y desplegar modelos. link
Netflix: State of Routing in Model Serving — Cómo Netflix maneja routing de requests en serving de modelos. Decisiones de producción real en scale. link

📚 Vale la pena leer

Agent Island: benchmark dinámico multi-agente que resiste saturación y contaminación — Simulación multiplayer donde agents compiten en cooperación y conflicto. Ranking Bayesian Plackett-Luce sobre 999 games y 49 modelos. GPT-5.5 domina. link
"More context is better" es FALSO: crossover effect en multi-agent design — Paper imperdible: 10 tasks, 7 condiciones de contexto, 2700+ runs. Contexto mejora 20x algunos tasks y degrada 46% otros. Una variable medible predice todo. link
Webdevbench: evaluando AI como agencias de desarrollo web — Benchmark que mide AI como agencia de desarrollo completa, no solo como coding assistant. link
Doing Vibe Physics: GPT-5.x derivó nuevos resultados en física teórica — Entrevista de Latent Space con Alex Lupsasca (OpenAI) sobre cómo GPT-5.x generó resultados nuevos en gravedad cuántica. link
Vibe coding and agentic engineering se están juntando — y eso asusta a Simon — Reflexión de Simon Willison sobre cómo vibe coding y agentic engineering están convergiendo en su propio workflow. link
Martin Fowler: revisita The Mythical Man-Month en 2026 — Brooks' Law, conceptual integrity, y qué sigue vigente del libro clásico de 1975. link
ByteByteGo: MCP vs Skills, clearly explained — Diferencia fundamental: MCP y Skills resuelven problemas distintos, elegir mal te cuesta plata o complejidad. link
ByteByteGo: Connecting LLMs to the Real World (Tool Use, Function Calling, MCP) — Evolución desde tool use básico hasta MCP. link

💤 Skippeable pero conviene saber

"Our AI started a cafe in Stockholm" — Andon Labs replica su experimento de tienda AI-run en San Francisco ahora como cafe. Anécdotas divertidas (120 eggs sin horno, 22.5kg de tomates enlatados). link
Ask HN: How are you sandboxing AI agents and developer CLIs? — Discusión activa en HN sobre sandboxing para coding agents. link
AI at Discount (Tom Tunguz) — Análisis de mercado sobre la commoditización de AI y la presión de precios. link
Three Mile Island restart moves ahead with Microsoft AI deal — La planta nuclear revive para alimentar datacenters de AI. link
PARSE: Parallel Prefix Verification for Speculative Generation — Framework de speculative decoding que paraleliza prefix verification a nivel semántico en vez de token-level. link
DocuSeal: open source DocuSign alternative — Si tu restaurant SaaS necesita firmas digitales, esta alternativa open source puede servirte. link

Artículos fetched (60)

Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
anthropic-news· 04-may
May 4, 2026Announcements
Higher usage limits for Claude and a compute deal with SpaceX
anthropic-news· 06-may
May 6, 2026Announcements
Agents for financial services
anthropic-news· 05-may
May 5, 2026Announcements
Parallel Prefix Verification for Speculative Generation
arxiv-ai· 08-may
arXiv:2605.04263v1 Announce Type: new Abstract: We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a dra…
Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
arxiv-ai· 08-may
arXiv:2605.04312v1 Announce Type: new Abstract: Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We introduce Agent Island, a multiplayer simulation environment in which language-model agents compete in a game of interagent cooperation, conflict, and persuasion. The environment yields a dynamic benchmark designed to mitigate both saturation and contamination; new models can always outperform the current leading player in this winner-take-all game, and agents compete against other adaptive agents rather than face a fixed task set. We rank players with a Bayesian Plackett-Luce model, allowing us to quantify uncertainty in player skill. In 999 games involving 49 unique models, openai/gpt-5.5 dominates its p…
The Scaling Properties of Implicit Deductive Reasoning in Transformers
arxiv-ai· 08-may
arXiv:2605.04330v1 Announce Type: new Abstract: We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.
When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration
arxiv-ai· 08-may
arXiv:2605.04361v1 Announce Type: new Abstract: The prevailing assumption in agent orchestration is that more context is better. We test this on multi-agent software design across 10 tasks, 7 context-injection conditions, and over 2,700 runs, and find a crossover effect: the same artifact type improves design exploration on some tasks (up to 20$\times$ tradeoff coverage) and actively degrades it on others (up to 46% reduction). On several tasks, an irrelevant document performs as well as or better than every relevant artifact. The direction is predicted by a single measurable variable--baseline exploration without context--with Pearson $r = -0.82$ ($p < 0.001$). Probing the mechanism by manipulating convergence pressure through prompt design reveals two distinct regimes: convergence drive…
ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor
arxiv-ai· 08-may
arXiv:2605.04193v1 Announce Type: new Abstract: Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-bas…
Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
arxiv-ai· 08-may
arXiv:2605.04227v1 Announce Type: new Abstract: Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-orient…
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
arxiv-ai· 08-may
arXiv:2605.04243v1 Announce Type: new Abstract: Despite significant advances, large language models (LLMs) continue to exhibit brittle performance on complex temporal reasoning tasks. This failure mode is widely attributed to inherent deficits in autoregressive logical deduction. In this paper, we challenge this prevailing narrative, demonstrating that temporal reasoning is not the fundamental bottleneck; rather, the locus of failure lies in unstructured text-to-event representation. We introduce a novel neuro-symbolic question-answering framework governed by a Probabilistic Inconsistency Signal (PIS) that explicitly isolates perceptual errors from reasoning failures. By lifting unstructured text into explicit event graphs and interval constraints, our architecture strictly decouples sema…
Regularized Centered Emphatic Temporal Difference Learning
arxiv-ai· 08-may
arXiv:2605.04100v1 Announce Type: new Abstract: Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on emphasis, but the follow-on trace can have high variance. We revisit this tradeoff through Bellman-error centering. Although centering naturally removes a common drift term from TD errors, we show that a naive centered emphatic extension introduces an auxiliary coupling that can destroy the positive-definiteness of the ETD key matrix. We propose \emph{Regularized Emphatic Temporal-Difference Learning} (RETD), which preserves the follow-on trace and regularizes only the auxiliary centering recursion, corre…
Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs
arxiv-ai· 08-may
arXiv:2605.04169v1 Announce Type: new Abstract: Surgical team performance arises from complex interactions between technical execution and non-technical skills, including communication and coordination dynamics. However, current surgical AI systems predominantly model visual workflow signals, lacking structured representations of intraoperative team interactions over time. We propose a real-time actionable approach for modeling surgical team dynamics using time-expanded interaction graphs, where team members are modeled as time-indexed nodes and communication exchanges define directed edges. This spatio-temporal expansion enables dynamic interaction modeling, while allowing efficient inference with a static graph neural network. The model predicts procedural efficiency as the deviation fr…
LCM: Lossless Context Management
arxiv-ai· 08-may
arXiv:2605.04050v1 Announce Type: new Abstract: We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens. LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access. LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recurs…
How Instacart Built a Search for Billions of Products
bytebytego· 05-may
In this article, we will learn how Instacart’s search infrastructure evolved over the years and the challenges its engineering team faced.May 5 • ByteByteGo275211
Connecting LLMs to the Real World: Tool Use, Function Calling, and MCP
bytebytego· 04-may
In this article, we will look at this progression that has happened from basic tool use to function calling to the Model Context Protocol, allowing the…May 4 • ByteByteGo328210
Container Design Patterns for Distributed Systems
bytebytego· 07-may
In this article, we’ll walk through the patterns that have crystallized over the past decade, organized by the scope of their coordination.14 hrs ago • ByteByteGo883
EP213: MCP vs Skills, Clearly Explained
bytebytego· 02-may
Both MCP and Skills extend what an agent can do. But they solve different problems, and picking the wrong one adds cost or complexity you don't need.May 2 • ByteByteGo329619
Claude Platform
claude-changelog
Release notesCopy pageUpdates to the Claude Platform, including the Claude API, client SDKs, and the Claude Console.Copy pageFor release notes on Claude Apps, see the Release notes for Claude Apps in the Claude Help Center.For updates to Claude Code, see the complete CHANGELOG.md in the claude-code repository. May 6, 2026 Multiagent sessions and Outcomes are now in public beta under the standard managed-agents-2026-04-01 beta header. Vault credential background refresh is now supported for mcp_oauth credentials. See Authenticate with vaults. Webhooks for Claude Managed Agents are now supported. Webhook event types include session and vault lifecycle events. See Subscribe to webhooks. Additional filtering and sorting options are now supported. Sessions can be filtered by status, and events…
Loading...
claude-changelog
Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...
How Cloudflare responded to the “Copy Fail” Linux vulnerability
cloudflare· 07-may
When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no malicious exploitation.
When DNSSEC goes wrong: how we responded to the .de TLD outage
cloudflare· 06-may
On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here's what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.
Building for the future
cloudflare· 07-may
This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloudflare.
Code Orange: Fail Small is complete. The result is a stronger Cloudflare network
cloudflare· 01-may
We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best practices to prevent future incidents.
Introducing Dynamic Workflows: durable execution that follows the tenant
cloudflare· 01-may
Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.
decolua/9router
github-trending
🆓 Unlimited FREE AI coding. Connect Claude Code, Codex, Cursor, Cline, Copilot, Antigravity to FREE Claude/GPT/Gemini via 40+ providers. Auto-fallback, RTK -40% tokens, never hit limits. 9Router - FREE AI Router & Token Saver Never stop coding. Save 20-40% tokens with RTK + auto-fallback to FREE & cheap AI models. Connect All AI Code Tools (Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, OpenClaw...) to 40+ AI Providers & 100+ Models. 🚀 Quick Start • 💡 Features • 📖 Setup • 🌐 Website 🇻🇳 Tiếng Việt • 🇨🇳 中文 • 🇯🇵 日本語 🤔 Why 9Router? Stop wasting money, tokens and hitting limits: ❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provid…
addyosmani/agent-skills
github-trending
Production-grade engineering skills for AI coding agents. Agent Skills Production-grade engineering skills for AI coding agents. Skills encode the workflows, quality gates, and best practices that senior engineers use when building software. These ones are packaged so AI agents follow them consistently across every phase of development. DEFINE PLAN BUILD VERIFY REVIEW SHIP ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ Idea │ ───▶ │ Spec │ ───▶ │ Code │ ───▶ │ Test │ ───▶ │ QA │ ───▶ │ Go │ │Refine│ │ PRD │ │ Impl │ │Debug │ │ Gate │ │ Live │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ /spec /plan /build /test /review /ship Commands 7 slash commands that map to the development lifecycle. Each one activates the right skills automatically. What you're doing Command Key pr…
anthropics/financial-services
github-trending
Claude for Financial Services Reference agents, skills, and data connectors for the financial-services workflows we see most — investment banking, equity research, private equity, and wealth management. Everything here is available two ways from one source: install it as a Claude Cowork plugin, or deploy it through the Claude Managed Agents API behind your own workflow engine. Same system prompt, same skills — you choose where it runs. Important Nothing in this repository constitutes investment, legal, tax, or accounting advice. These agents draft analyst work product — models, memos, research notes, reconciliations — for review by a qualified professional. They do not make investment recommendations, execute transactions, bind risk, post to a ledger, or approve onboarding; every output i…
docusealco/docuseal
github-trending
Open source DocuSign alternative. Create, fill, and sign digital documents ✍️ DocuSeal Open source document filling and signing DocuSeal is an open source platform that provides secure and efficient digital document signing and processing. Create PDF forms to have them filled and signed online on any device with an easy-to-use, mobile-optimized web tool. ✨ Live Demo | ☁️ Try in Cloud Features PDF form fields builder (WYSIWYG) 12 field types available (Signature, Date, File, Checkbox etc.) Multiple submitters per document Automated emails via SMTP Files storage on disk or AWS S3, Google Storage, Azure Cloud Automatic PDF eSignature PDF signature verification Users management Mobile-optimized 7 UI languages with signing available in 14 languages API and Webhooks for integrations Easy to dep…
InsForge/InsForge
github-trending
InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents. The backend platform for AI-native developers. ⭐ Help us reach more developers and grow the InsForge community. Star this repo! InsForge InsForge is a backend development platform built for AI coding agents and AI code editors. It exposes backend primitives like databases, auth, storage, and functions through a semantic layer that agents can understand, reason about, and operate end to end. How it works InsForge acts as a semantic layer between AI coding agents and backend primitives. It performs backend context engineering so agents can understand, operate, and inspect backend systems. Fetch backend context: Agents can fetch documentation and available operations for the ba…
LearningCircuit/local-deep-research
github-trending
~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. Local Deep Research AI-powered research assistant for deep, agentic research Performs deep, agentic research using multiple LLMs and search engines with proper citations ▶️ Watch Review by The Art Of The Terminal 🚀 What is Local Deep Research? AI research assistant you control. Run locally for privacy, use any LLM and build your own searchable knowledge base. You own your data and see exactly how it works. ⚡ Quick Start Option 1: Docker Run (Linux) # Step 1: Pull and run Ollama docker run -d -p 11434:11434 --name ollama ollama/ollama docker exec ollama ollama pull gpt-oss:20b # Step 2: …
VectifyAI/PageIndex
github-trending
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB or Chunking ◦ Context-Aware ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple agentic, vectorless RAG example with self-hosted PageIndex, using OpenAI Agents SDK. Scale PageIndex to Millions of Documents — PageIndex File System is a file-level tree layer that lets PageIndex reason over an entire corpus, not just a single document, enabling massive-scale document search. PageIndex Chat — Human-like document analysis agent platform for professional long documents. Also available via MCP or API. PageIndex Framework — Deep dive into PageIndex: an age…
vercel-labs/open-agents
github-trending
An open source template for building cloud agents. Open Agents Open Agents is an open-source reference app for building and running background coding agents on Vercel. It includes the web UI, the agent runtime, sandbox orchestration, and the GitHub integration needed to go from prompt to code changes without keeping your laptop involved. The repo is meant to be forked and adapted, not treated as a black box. What it is Open Agents is a three-layer system: Web -> Agent workflow -> Sandbox VM The web app handles auth, sessions, chat, and streaming UI. The agent runs as a durable workflow on Vercel. The sandbox is the execution environment: filesystem, shell, git, dev servers, and preview ports. The key architectural decision: the agent is not the sandbox The agent does not run inside the VM…
z-lab/dflash
github-trending
DFlash: Block Diffusion for Flash Speculative Decoding DFlash: Block Diffusion for Flash Speculative Decoding Paper | Blog | Models DFlash is a lightweight block diffusion model designed for speculative decoding. It enables efficient and high-quality parallel drafting. https://github.com/user-attachments/assets/5b29cabb-eb95-44c9-8ffe-367c0758de8c Supported Models Model DFlash Draft gemma-4-26B-A4B-it z-lab/gemma-4-26B-A4B-it-DFlash gemma-4-31B-it z-lab/gemma-4-31B-it-DFlash Qwen3.6-27B z-lab/Qwen3.6-27B-DFlash Qwen3.6-35B-A3B z-lab/Qwen3.6-35B-A3B-DFlash MiniMax-M2.5 (Preview) z-lab/MiniMax-M2.5-DFlash Kimi-K2.5 z-lab/Kimi-K2.5-DFlash Qwen3.5-4B z-lab/Qwen3.5-4B-DFlash Qwen3.5-9B z-lab/Qwen3.5-9B-DFlash Qwen3.5-27B z-lab/Qwen3.5-27B-DFlash Qwen3.5-35B-A3B z-lab/Qwen3.5-35B-A3B-DFlash Qwe…
AI at Discount
hn-ai· 08-may
Article URL: https://tomtunguz.com/ai-at-discount/ Comments URL: https://news.ycombinator.com/item?id=48057930 Points: 1 # Comments: 0
Anthropic's Mythos Threw the White House AI Strategy into Chaos
hn-ai· 08-may
Article URL: https://www.wsj.com/tech/ai/trump-ai-anthropic-mythos-regulation-2378971f Comments URL: https://news.ycombinator.com/item?id=48057717 Points: 2 # Comments: 0
Show HN: Loxai.tech and Neutboom – Gen AI's frontier of individuality
hn-ai· 08-may
Hi- I hope you’re all having a good day so far! I'm not sure where to post this but I do have two things for you guys today, related to the AI space: individuality and a new era to AI! So: - LLM wrappers are crumbling- these businesses will not survive as foundational models begin to offer the functionalities consumers have been seeking. This maybe represents a failure in interpretation of individuality as horizontal growth (these startups didn't last too long...), which brings up the next point. - Individuality and ego are human needs. In the past year, there's been a boom in using AI to personalise products to consumers. A take on this: are companies really doing "individuality as horizontal growth" as consumers would like them to? Do consumers really need businesses to tell them that t…
Ask HN: How are you sandboxing AI agents and developer CLIs?
hn-ai· 08-may
Comments URL: https://news.ycombinator.com/item?id=48058747 Points: 1 # Comments: 0
Claude Code CVE-2026-39861:sandbox escape via symlink
hn-ai· 08-may
Article URL: https://github.com/advisories/GHSA-vp62-r36r-9xqp Comments URL: https://news.ycombinator.com/item?id=48057842 Points: 2 # Comments: 1
Sley is live: the first native AI programming language
hn-ai· 08-may
Article URL: https://github.com/GreyforgeLabs/sley Comments URL: https://news.ycombinator.com/item?id=48058152 Points: 2 # Comments: 2
Show HN: AnamDB – An AI-native, differentiable Datalog engine written in Rust
hn-ai· 08-may
Article URL: https://github.com/jam5991/anam Comments URL: https://news.ycombinator.com/item?id=48057731 Points: 1 # Comments: 0
Webdevbench: Evaluating AI as software development agencies
hn-ai· 08-may
Article URL: https://webdevbench-ai-benchmarks.qwikbuild.site/ Comments URL: https://news.ycombinator.com/item?id=48058718 Points: 1 # Comments: 0
The AI Revival of the Three Mile Island Nuclear Plant
hn-ai· 08-may
Article URL: https://www.bloomberg.com/news/features/2026-05-07/three-mile-island-restart-moves-ahead-with-microsoft-ai-deal Comments URL: https://news.ycombinator.com/item?id=48058663 Points: 1 # Comments: 0
🔬Doing Vibe Physics — Alex Lupsasca, OpenAI
latentspace· 05-may
The full story of how GPT‑5.x derived new results in theoretical physics and quantum gravity.
[AINews] The Other vs The Utility
latentspace· 04-may
a quiet day lets us reflect on the nature of AI "character" in the Clippy vs Anton debate
[AINews] Silicon Valley gets Serious about Services
latentspace· 06-may
A series of announcements line up to a big theme: Services are the next big opportunity.
[AINews] Anthropic-SpaceXai's 300MW/$5B/yr deal for Colossus I, ARR growth is 8000% annualized
latentspace· 07-may
And the kingmaker picks a side.
Fragments: May 5
martin-fowler· 05-may
Over the last couple of months Rahul Garg published a series of posts here on how to reduce the friction in AI-assisted programming. To make it easier to put these ideas into practice he’s now built an open-source framework to operationalize these patterns. AI coding assistants jump straight to code, silently make design decisions, forget constraints mid-conversation, and produce output nobody reviewed against real engineering standards. Lattice fixes this with composable skills in three tiers – atoms, molecules, refiners – that embed battle-tested engineering disciplines (Clean Architecture, DDD, design-first methodology, secure coding, and more), plus a living context layer (the .lattice/ folder) that accumulates your project’s standards, decisions, and review insights. The system gets …
Bliki: Mythical Man Month
martin-fowler· 05-may
In the early 1960s, Fred Brooks managed the development of IBM's System/360 computer systems. After it was done he penned his thoughts in the book The Mythical Man-Month which became one of the most influential books on software development after its publication in 1975. Reading it in 2026, we'll find some of it outdated, but it also retains many lessons that are still relevant today. The book contains Brooks's law: “Adding manpower to a late software project makes it later.” The issue here is communication, as the number of people grows, the number of communication paths between those people grows exponentially. Unless these paths are skillfully designed, then work quickly falls apart. Perhaps my most enduring lesson from this book is the importance of conceptual integrity I will contend…
State of Routing in Model Serving
netflix-tech· 01-may
Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph
netflix-tech· 04-may
GitHub Repo Stats
simonw· 07-may
Tool: <a href="https://tools.simonwillison.net/github-repo-stats">GitHub Repo Stats</a> One of the things I always look for when evaluating a new GitHub repository is the number of commits it has... but that number isn't visible on GitHub's mobile site layout. I built this tool to fix that, using this prompt: <blockquote> <code>Given a GitHub repo URL or foo/bar repo ID show information about that repo absorbed via wither REST or graphql CORS fetch() including the number of commits in the repo and other useful stats</code> </blockquote> Example output for <a href="https://tools.simonwillison.net/github-repo-stats?repo=simonw%2Fdatasette">simonw/datasette</a> and <a href="https://tools.simonwillison.net/github-repo-stats?repo=simonw%2Fllm">simonw/ll…
Behind the Scenes Hardening Firefox with Claude Mythos Preview
simonw· 07-may
<a href="https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/">Behind the Scenes Hardening Firefox with Claude Mythos Preview</a> Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox: <blockquote> Suddenly, the bugs are very good Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it. It is difficult to overstate how much this …
Big Words
simonw· 07-may
Tool: <a href="https://tools.simonwillison.net/big-words">Big Words</a> I'm using my <a href="https://simonwillison.net/2026/Feb/25/present/">vibe coded macOS presentations tool</a> to put together a talk, and I wanted to add a slide with some text on it. The tool only accepts URLs, so I <a href="https://github.com/simonw/tools/pull/279">put together</a> a quick page that accepts query string arguments and turns them into a simple slide. Here's an example: <a href="https://tools.simonwillison.net/big-words?text=simonwillison.net&gradient=1&size=9.5">https://tools.simonwillison.net/big-words?text=simonwillison.net&gradient=1&size=9.5</a> Double click or double tap the page to access a form for modifying the different options. …
Vibe coding and agentic engineering are getting closer than I'd like
simonw· 06-may
I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: <a href="https://www.heavybit.com/library/podcasts/high-leverage/ep-9-the-ai-coding-paradigm-shift-with-simon-willison">Ep. #9, The AI Coding Paradigm Shift with Simon Willison</a>. Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work. One thing I really enjoy about podcasts is that they sometimes push me to think out loud in a way that exposes an idea I've not previously been able to put into words. <h4 id="vibe-coding-and-agentic-engineering-are-starting-to-overlap">Vibe coding and agentic engineering are starting to overlap</h4> A few weeks after vibe coding was first coined I pu…
Notes on the xAI/Anthropic data center deal
simonw· 07-may
There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center". As I mentioned in my <a href="https://simonwillison.net/2026/May/6/code-w-claude-2026/">live blog of the keynote</a>, that's the one with the <a href="https://www.politico.com/news/2025/05/06/elon-musk-xai-memphis-gas-turbines-air-pollution-permits-00317582">particularly bad environmental record</a>. The gas turbines installed to power the facility initially ran without Clean Air Act permits or pollution control devices, which they got away with by classifying them as "temporary". Credible reports link it to increases in hospital admissions relating to low ai…
Our AI started a cafe in Stockholm
simonw· 05-may
<a href="https://andonlabs.com/blog/ai-cafe-stockholm">Our AI started a cafe in Stockholm</a> Andon Labs previously <a href="https://andonlabs.com/blog/andon-market-launch">started an AI-run retail store</a> in San Francisco. Now they're running a similar experiment in Stockholm, Sweden, only this time it's a cafe. These experiments are interesting, and often throw out amusing anecdotes: <blockquote> During the first week of inventory, Mona ordered 120 eggs even though the café has no stove. When the staff told her they couldn’t cook them, she suggested using the high-speed oven, until they pointed out the eggs would likely explode. She also tried to solve the problem of fresh tomatoes being spoiled too fast by ordering 22.5 kg of canned tomatoes for …
datasette-referrer-policy 0.1
simonw· 05-may
Release: <a href="https://github.com/datasette/datasette-referrer-policy/releases/tag/0.1">datasette-referrer-policy 0.1</a> The OpenStreetMap tiles on the Datasette <a href="https://datasette.io/global-power-plants/global-power-plants">global-power-plants demo</a> weren't displaying correctly. This turned out to be caused by two bugs. The first is that the CAPTCHA <a href="https://github.com/simonw/datasette-turnstile">I added</a> to that site a few weeks ago was triggering for the <code>.json</code> fetch requests used by the map plugin, and since those weren't HTML the user was not being asked to solve them. Here's <a href="https://github.com/simonw/datasette.io/commit/23a1c8596b75b2094db46035a3b4280109fb3df3">the fix</a>. The second was that Op…
datasette-llm 0.1a7
simonw· 05-may
Release: <a href="https://github.com/datasette/datasette-llm/releases/tag/0.1a7">datasette-llm 0.1a7</a> <blockquote> <ul> <li>Mechanism for <a href="https://github.com/datasette/datasette-llm/blob/main/README.md#configuration">configuring default options</a> for specific models.</li> </ul> </blockquote> Part of Datasette's evolving support mechanism for plugins that use LLMs. It's now possible to configure a model with default options, e.g. to say all <a href="https://github.com/datasette/datasette-enrichments-llm">enrichment</a> operations should use a specific model with temperature set to 0.5. Tags: <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/tags/datasette">datasette</a>
Live blog: Code w/ Claude 2026
simonw· 06-may
I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions. Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/anthropic">anthropic</a>, <a href="https://simonwillison.net/tags/claude">claude</a>, <a href="https://simonwillison.net/tags/claude-code">claude-code</a>, <a href="https://simonwillison.net/tags/live-blog">live-blog</a>
llm-gemini 0.31
simonw· 07-may
Release: <a href="https://github.com/simonw/llm-gemini/releases/tag/0.31">llm-gemini 0.31</a> <blockquote> <ul> <li><code>gemini-3.1-flash-lite</code> is <a href="https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available">no longer a preview</a>. </li> </ul> </blockquote> Here's my write-up of the <a href="https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/">Gemini 3.1 Flash-Lite Preview model</a> back in March. I don't believe this new non-preview model has changed since then. Tags: <a href="https://simonwillison.net/tags/llm-release">llm-release</a>, <a href="https://simonwillison.net/tags/gemini">gemini</a>, <a href="https://simonwillison.net/tags/llm">llm</a>, <a href="https://simonwillison.net/…