AI Digest

Digest curado

viernes, 05 de junio de 2026·weekly-deep·deep·9,169 tokens

🔥 TOP — lo que SÍ o SÍ tenés que ver

  • Uber limitó el uso de Claude Code a $1.500/mes por dev — Esto confirma que los coding agents no son gratis: queman tokens como si no hubiera mañana. Si estás pensando en adoptar estas herramientas a escala para tu SaaS, este es el numero que tenés que tener en cuenta para presupuestar. link
  • Claude Platform: release notes con dos cambios directos — El advisor tool ahora acepta max_tokens para capotear costos y latencia, y la API dejó de cobrar por requests que devuelven stop_reason: "refusal". Si usás Claude API, son dos ahorros concretos que podés aplicar ya. link
  • Anthropic mapeó un año entero de amenazas cibernéticas habilitadas por IA — Mapeo con MITRE ATT&CK de cómo se están usando LLMs para ataques reales. Útil para pensar la postura de seguridad de tu SaaS si estás exponiendo APIs de IA. link

📦 Claude / Anthropic ecosystem

  • Services Track y Partner Hub de la Claude Partner Network — Nuevo programa para partners que quieran ofrecer servicios alrededor de Claude. Si estás pensando en construir arriba de la API, puede ser un canal a explorar. link
  • Anthropic encontró una vulnerabilidad de counterfeiting en Zcash — Claude aplicado a criptografía con resultados concretos: encontró un bug serio en Zcash que hizo caer la crypto un 30%. Muestra que los LLMs ya son herramientas de seguridad reales. link
  • "Running an AI-native engineering org" – post del blog de Claude — Anthropic publicó cómo piensan el organigrama de un equipo de ingeniería que ya da por sentado que todos usan IA. Arquitectura de equipos, no de software. link

🛠️ Dev tools & coding

  • VoidZero (Vite, Vitest, Rolldown, Oxc) se une a Cloudflare — El equipo que mueve el ecosistema JS tooling pasa a Cloudflare. Vite sigue open source. Implicación: Cloudflare invierte fuerte en tooling JS moderno (WASI? Workers?). link
  • Snill.ai: describí tu negocio, te genera una app interna en segundos — Show HN de una herramienta que hace exactamente lo que promete. Si estás en modo side project con un SaaS, esto te da una idea de lo que ya se puede generar con LLMs. link

🏗️ Software engineering

  • Netflix: Dynamic Repartitioning para workloads time-series en Cassandra — Cómo Netflix parteciona dinámicamente tablas que crecen desbalanceadas con el tiempo. Para cualquiera que labure con datos temporales, es arquitectura aplicada de alto nivel. link
  • Netflix: de silos a service topology con un mapa en tiempo real — Cómo armaron un mapa de servicios vivo para entender dependencias en microservicios. Si tenés un SaaS que escala, es el tipo de observabilidad que te salva cuando todo explota. link
  • Netflix: High-Throughput Graph Abstraction (Parte I) — Abstracción para procesar grafos a alta velocidad. Big Tech haciendo cosas raras con grafos. link
  • Cloudflare: cómo redujeron el boot time de servidores de horas a minutos — Debugging profundo de firmware UEFI, iPXE, timeouts. Para los que disfrutan la ingeniería de sistemas de verdad. link

📚 Vale la pena leer

  • "The Path of a Request: A Tour of Modern Web Architecture" (ByteByteGo) — El viaje completo de un request web, hop por hop. Bueno para tener la visión de conjunto clara. link
  • "How OpenAI Built Its Data Agent" (ByteByteGo) — Cómo resolvieron el problema de encontrar la tabla correcta para hacer análisis de datos con LLMs. El cuello de botella real no es escribir SQL, es la semántica de los datos. link
  • "A Practical Guide to Becoming an AI-Native Engineer" (ByteByteGo) — Guía práctica para estar del lado productivo de la división entre los que usan IA bien y los que no. link
  • "How DoorDash Built a Testing System to Evaluate LLMs" (ByteByteGo) — Cómo evaluar LLMs en producción. Para cuando te preguntes "¿esto anda bien o anda mal?". link
  • Charity Majors: "AI enthusiasts are in a race against time, AI skeptics are in a race against entropy" — El texto completo de la reflexión que Simon juga. Ambos lados tienen razón, la pregunta es cómo cerrar la brecha en los equipos. link
  • Martin Fowler sobre métricas de productividad con IA — Greg Wilson tiró abajo todas las métricas comunes de productividad con IA. Fowler coincide y agrega que lo único que vale medio algo es preguntarle a los devs si se sienten más productivos. link
  • "What Should Agents Say?" – comunicación eficiente entre agentes LLM — Paper sobre PACT, un protocolo para que los agentes se pasen mensajes sin inflar tokens. Si estás armando multi-agent systems, es lectura obligada. link

💤 Skippeable pero conviene saber

  • Satya Nadella: "not sure" quién dijo que Microsoft quería IA adictiva — Clásico corporate speak, pero interesante como referencia de hacia dónde va Microsoft con IA. link
  • "I Didn't Become a Developer to Review AI Slop" — Queja legítima sobre PRs generados por IA que nadie pidió. El lado oscuro de los coding agents. link
  • SentinelBench: benchmark para agentes de monitoreo de larga duración — 100 tareas en 10 entornos web sintéticos. Para los que piensan en agentes que "miran y esperan" en vez de "hacen cosas constantemente". link
  • Cloudflare: enforcing first AS en BGP — Cómo evitar routing hijacks con un chequeo simple pero efectivo. Bueno para entender seguridad a nivel de red. link

Artículos fetched (44)

  • Introducing the Services Track and Partner Hub of the Claude Partner Network
    anthropic-news· 03-jun

    Jun 3, 2026Announcements

  • What we learned mapping a year’s worth of AI-enabled cyber threats
    anthropic-news· 03-jun

    Jun 3, 2026Policy

  • Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
    arxiv-ai· 05-jun

    arXiv:2606.05334v1 Announce Type: new Abstract: Returned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Existing PHM approaches support degradation prediction, but often target fixed operating conditions or isolated component benchmarks, while material-fatigue assessment is rarely linked to system-level functional prognosis. This paper addresses this gap for an angle grinder by combining uncertainty-aware functional prediction with component-level fatigue assessment in an instance-specific reliability workflow. The proposed framewor…

  • SentinelBench: A Benchmark for Long-Running Monitoring Agents
    arxiv-ai· 05-jun

    arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which are better served by a strategy of sustained attention. Instead, agents should monitor an environment, notice when an external event makes progress possible, then respond promptly without wasting resources while waiting. To measure progress on this class of tasks, we introduce SentinelBench, an open-source benchmark for time-evolving monitoring tasks. SentinelBench contains 100 tasks across 10 synthetic web environments, includin…

  • An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
    arxiv-ai· 05-jun

    arXiv:2606.05357v1 Announce Type: new Abstract: Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We first developed a deep learning framework to predict MOAKS features directly from knee MRIs and incorporated conformal prediction to provide prediction uncertainty quantification. This uncertainty-aware strategy enables explicit filtering of model outputs, retaining only high-confidence MOAKS predictions at the knee level. Second, we applied a longitudinal latent class mixed model (LCMM) to examine associations between key structura…

  • Synthetic Contrastive Reasoning for Multi-Table Q&A
    arxiv-ai· 05-jun

    arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative traces with heterogeneous LLMs. We then use the resulting preference pairs to fine-tune open-weight LLMs with Contrastive Preference Optimization (CPO). Across Qwen3-14B, Mistral-8B, and Llama-3.1-8B, CPO achieves absolute average improvements over Q&A supervised fine-tuning ranging from 9.7…

  • Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
    arxiv-ai· 05-jun

    arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumption does not hold under interaction. We study post-decision manipulability: the extent to which an evaluation outcome can be altered through subsequent conversation with the judge after an initial decision has been made. Across controlled experiments on MT-Bench and AlpacaEval, we find that LLM judges are highly stable under repeated and neutral reevaluation, yet become substantially reversible under targeted post-decision challenge. An anti-baseline challenge protocol shows that stable judgments…

  • Residual Modeling for High-Fidelity Learned Compression of Scientific Data
    arxiv-ai· 05-jun

    arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate. We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should …

  • How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
    arxiv-ai· 05-jun

    arXiv:2606.05256v1 Announce Type: new Abstract: This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts engaging users in live debate. After public disclosure, Reddit authorized moderators to release an archive of the AI-generated comments, creating a rare opportunity to examine how large language models operated in an identity-rich deliberative forum without disclosure. We conduct a structured content analysis of this corpus, evaluating identity performance, authority signaling, alignment strategies, and activation of cognitive heuristics. Identity targeting or adoption appears in over two-t…

  • GITCO: Gated Inference-Time Context Optimization in TSFMs
    arxiv-ai· 05-jun

    arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by optimizing the input context rather than modifying model weights. We present GITCO (Gated Inference-Time Context Optimization), a lightweight three-component framework: Gate, Router, and Critic that selectively identifies and suppresses harmful patches without any parameter updates. Evaluated on TimesFM 2.5 across 53 GIFT-Eval datasets under K-fold cross-validation, GITCO achieves an average +1.95% MASE reduction on TimesFM 2.5 while capturing 89.9% of the improvement upper bound. We introduce …

  • What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
    arxiv-ai· 05-jun

    arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmissi…

  • I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
    arxiv-ai· 05-jun

    arXiv:2606.05316v1 Announce Type: new Abstract: Multimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero shot framework that identifies missing knowledge, retrieves open web evidence, and synthesizes evidence grounded background knowledge for meme understanding and detection. We also introduce a curated meme understanding benchmark of recent memes from 2024 to 2026 with external background knowledge annotations. Experiments on three meme understanding datasets and five meme detection tasks show that our framework improves knowledge…

  • How OpenAI Built Its Data Agent
    bytebytego· 03-jun

    The hardest part of data analysis isn’t writing SQL. It’s finding the right tables to use in the first place and understanding semantically how to use…Jun 3 • ByteByteGo29349

  • The Path of a Request: A Tour of Modern Web Architecture
    bytebytego· 04-jun

    In this article, we follow the journey of a web request one hop at a time.15 hrs ago • ByteByteGo1037

  • A Practical Guide to Becoming an AI-Native Engineer
    bytebytego· 02-jun

    This piece is a working guide for engineers who want to land on the productive side of that split.Jun 2 • ByteByteGo382813

  • How DoorDash Built a Testing System to Evaluate LLMs
    bytebytego· 30-may

    In this article, we will learn how they built this flywheel and the key takeaways.May 30 • ByteByteGo316214

  • AI enthusiasts are in a race against time, AI skeptics are in a race against entropy (xpost)
    charity-majors· 02-jun

    Both sides are grappling with a real existential threat, and both sides feel like they are screaming into the void. There is a way to close the gap and get everyone pulling in the same direction.. Xposted from substack. I recently attended a talk where one of the presenters made some pretty…astonishing claims about what they […]

  • Loading...
    claude-changelog

    Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...

  • Claude Platform
    claude-changelog

    Release notesCopy pageUpdates to the Claude Platform, including the Claude API, client SDKs, and the Claude Console.Copy pageFor release notes on Claude Apps, see the Release notes for Claude Apps in the Claude Help Center.For updates to Claude Code, see the complete CHANGELOG.md in the claude-code repository. June 2, 2026 The advisor tool now supports a max_tokens parameter to cap the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length advisor responses. Set tools[].max_tokens on the advisor tool definition; see Capping advisor output. On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output. See Streaming refusals for detecting and handling ref…

  • VoidZero is joining Cloudflare
    cloudflare· 04-jun

    VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. Vite stays open source, vendor-agnostic, and built for everyone.

  • Enforcing the First AS in BGP AS_PATHs
    cloudflare· 03-jun

    BGP is vulnerable to routing hijacks and path leaks that negatively impact traffic on the Internet. RPKI helps solve some of these problems, but for some forged paths, we need to rely on a simpler mechanism: First AS enforcement in BGP.

  • How we reduced core unit boot time from hours to minutes
    cloudflare· 01-jun

    We investigated why firmware updates were causing our core servers to take four hours to reboot. By diving into UEFI data structures and iPXE automation, we eliminated unnecessary timeouts and cut boot times back down to minutes.

  • Mark Zuckerberg's longest-serving employee on AI, jobs and her boss
    hn-ai· 05-jun

    Article URL: https://www.bbc.com/news/articles/c5y71106g07o Comments URL: https://news.ycombinator.com/item?id=48408832 Points: 2 # Comments: 0

  • AI Optimists Race Clock; Skeptics Race Decay
    hn-ai· 05-jun

    Article URL: https://charitydotwtf.substack.com/p/ai-enthusiasts-are-in-a-race-against Comments URL: https://news.ycombinator.com/item?id=48408940 Points: 1 # Comments: 0

  • Running an AI-native engineering org
    hn-ai· 05-jun

    Article URL: https://claude.com/blog/running-an-ai-native-engineering-org Comments URL: https://news.ycombinator.com/item?id=48408846 Points: 1 # Comments: 0

  • Show HN: Snill.ai launched – describe your biz – get an internal app in seconds
    hn-ai· 05-jun

    Article URL: https://snill.ai/ Comments URL: https://news.ycombinator.com/item?id=48408894 Points: 1 # Comments: 0

  • Satya Nadella 'Not Sure' Who Said Microsoft Wanted to Make Addictive AI
    hn-ai· 05-jun

    Article URL: https://www.404media.co/satya-nadella-not-sure-who-said-microsoft-wanted-to-make-addictive-ai-is-looking-for-guy-who-did-this/ Comments URL: https://news.ycombinator.com/item?id=48408581 Points: 5 # Comments: 0

  • I Didn't Become a Developer to Review AI Slop
    hn-ai· 05-jun

    Article URL: https://www.builder.io/blog/developers-drowning-in-ai-prs Comments URL: https://news.ycombinator.com/item?id=48408569 Points: 4 # Comments: 0

  • How much value is AI creating?
    hn-ai· 05-jun

    Article URL: https://www.ft.com/content/8e9ae7a4-7209-4e2c-aa36-f3af77d6ce1f Comments URL: https://news.ycombinator.com/item?id=48408560 Points: 2 # Comments: 0

  • Americans lead AI data centre backlash, global poll finds
    hn-ai· 05-jun

    Article URL: https://www.ft.com/content/ed07dc6c-aabe-4e4d-a508-4d2b4f24a852 Comments URL: https://news.ycombinator.com/item?id=48408598 Points: 3 # Comments: 0

  • SpaceX IPO video sells Musk's space, AI, asteroid dreams to mom-n-pop investors
    hn-ai· 05-jun

    Article URL: https://www.latimes.com/business/story/2026-06-04/spacex-ipo-video-sells-elon-musks-space-ai-asteroid-dreams-to-mom-and-pop-investors Comments URL: https://news.ycombinator.com/item?id=48408668 Points: 2 # Comments: 0

  • ZEC drops 30% after Anthropic AI finds Zcash counterfeit vulnerability
    hn-ai· 05-jun

    Article URL: https://www.tradingview.com/news/cointelegraph:52f56f35b094b:0-zec-drops-30-after-anthropic-ai-finds-zcash-counterfeit-vulnerability/ Comments URL: https://news.ycombinator.com/item?id=48408925 Points: 1 # Comments: 0

  • [AINews] not much happened today
    latentspace· 05-jun

    a quiet day

  • Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
    latentspace· 04-jun

    We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.

  • 🔬Scaling Past Informal AI - Carina Hong, Axiom Math
    latentspace· 03-jun

    Verified Generation and Compounding Intelligence

  • ⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build
    latentspace· 03-jun

    The legendary Microsoft CEO makes his first Latent Space appearance!

  • [AINews] Reve 2 and Ideogram 4: Layouts in Imagegen
    latentspace· 04-jun

    a quiet day.

  • Fragments: June 2
    martin-fowler· 02-jun

    Greg Wilson has noticed that lots of folks are using dodgy metrics to figure out if AI tools are worth their costs. Would you measure lines of code generated, or tickets closed? Or would you send out a survey asking whether developers feel more productive? Each of those approaches is flawed in a different way; He lists lots of common metrics, and why they are flawed. Sadly he doesn’t give any suggestions on what would be better. In my view, since we cannot measure productivity, any metrics are weak evidence at the best of times. I do somewhat use one of his flawed measures: “Asking Developers If They Feel More Productive”. While I acknowledge the problems he gives with this measure, I find that in an environment where decent measures are hard to find, even such a dim light is the best we …

  • Dynamic Repartitioning for Time Series Workloads
    netflix-tech· 03-jun
  • High-Throughput Graph Abstraction at Netflix: Part I
    netflix-tech· 29-may
  • From Silos to Service Topology: Why Netflix Built a Real-Time Service Map
    netflix-tech· 29-may
  • Uber Caps Usage of AI Tools Like Claude Code to Manage Costs
    simonw· 03-jun

    <p><strong><a href="https://www.bloomberg.com/news/articles/2026-06-02/uber-caps-usage-of-ai-tools-like-claude-code-to-cut-costs">Uber Caps Usage of AI Tools Like Claude Code to Manage Costs</a></strong></p> I wrote <a href="https://simonwillison.net/2026/May/27/product-market-fit/#the-ai-failure-stories-around-this-are-pretty-thin">the other day</a> about Uber blowing its 2026 AI budget in four months, and how that wasn't particularly surprising given they would have set that budget in 2025, before anyone could have predicted how popular token-burning coding agents were about to become. Natalie Lung for Bloomberg:</p> <blockquote> <p>The rideshare giant is limiting all employees to $1,500 in monthly token spending per AI coding tool, an Uber spokesperson said in response to a Bloomberg N…

  • AI enthusiasts are in a race against time, AI skeptics are in a race against entropy
    simonw· 04-jun

    <p><strong><a href="https://charitydotwtf.substack.com/p/ai-enthusiasts-are-in-a-race-against">AI enthusiasts are in a race against time, AI skeptics are in a race against entropy</a></strong></p> Charity Majors neatly captures the dynamic between AI enthusiasts and AI skeptics, both of whom are trying to build great software, often in the same teams:</p> <blockquote> <p>The enthusiasts are <em>not wrong</em>. We are starting to see real, non-imaginary, discontinuous leaps in capabilities from teams that lean in hard to working with AI. And this does not feel like a normal technology cycle where you can wait for the dust to settle; teams that sit this out while competitors are hustling could be out of business before the dust settles. That’s a real, existential threat.</p> <p>The skeptics…

  • Quoting Emanuel Maiberg, 404 Media
    simonw· 04-jun

    <blockquote cite="https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/"><p>After this story was published Google's spokesperson reached out and asked us to publish a slightly different version of that statement. The new statement no longer stated that "it's critical that we maintain humans in the loop."</p></blockquote> <p class="cite">&mdash; <a href="https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/">Emanuel Maiberg, 404 Media</a>, Google Employees Internally Share Memes About How Its AI Sucks</p> <p>Tags: <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/journalism">journalism</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwill…