AI Digest

Digest curado

viernes, 01 de mayo de 2026·weekly-deep·deep·11,598 tokens

🔥 TOP — lo que SÍ o SÍ tenés que ver

  • LLM 0.32a0: Simon Willison refactoriza todo el modelo conceptual de su CLI — Pasó de "prompt/response" a "conversation/event" con soporte nativo de streaming, tool calling, y multimodal. Si usás LLM como tool, esto cambia cómo escribís plugins y scripts. link
  • Codex CLI 0.128.0 agrega /goal: el Ralph loop llega a OpenAI — Codex ahora itera solo hasta cumplir un objetivo, mismo concepto que el loop de Ralph. Directamente compite con Claude Code en autonomía. link
  • GPT-5.5 ya es comparable a Claude Mythos en capacidades de seguridad ofensiva — La UK AISI evaluó a GPT-5.5 y encontró rendimiento similar a Mythos, pero con la diferencia clave de que GPT-5.5 está disponible ahora para todo el mundo. link
  • Cloudflare habilita a los agentes a crear cuentas, comprar dominios y deployar — Los agentes ya no necesitan humanos para registrarse en Cloudflare. Pueden crear cuenta, iniciar subscripción, registrar dominio y obtener un token de API. Esto cambia el juego para deploys completamente autónomos. link
  • Warp se vuelve open source, con soporte nativo para Claude Code, Codex, Gemini CLI — El entorno de desarrollo agéntico que nació como terminal ahora es open source, corre agentes custom y tiene un dashboard de miles de Oz agents. Si querés un solo entorno para todos tus coding agents, Warp es el candidato más fuerte. link
  • Superpowers: una metodología completa para coding agents — No es solo tooling, es un framework de desarrollo que arranca pidiéndote el spec antes de tocar código, lo muestra en chunks digeribles, y después genera un plan de implementación. Si laburás con Claude Code, esto te ordena el workflow. link

📦 Claude / Anthropic ecosystem

  • Anthropic publica investigación sobre cómo la gente usa Claude para guía personal — Análisis de patrones de usuario pidiendo consejo personal a Claude. Interesante para entender UX de asistencia personalizada. link
  • Craft Agents: tool de craft.do para trabajar con agentes, open source — Usa Claude Agent SDK + Pi SDK, con foco en workflow document-centric en vez de code-centric. Si te copa la interfaz de Craft, esto te da el backend. link
  • PrivClaw: marketplace open-source de plugins AI auto-hosteable — Alternativa self-hosted para plugins de Claude, construida con FastAPI y Next.js. link

🛠️ Dev tools & coding

  • Structured-Prompt-Driven Development (SPDD) en Thoughtworks — Método formalizado para tratar prompts como artefactos de primera clase, versionados y alineados con requerimientos. Tres skills clave: alignment, abstraction-first, iterative review. link
  • jcode: coding agent harness para multi-session workflows — Construido para ser más performante y eficiente en recursos que otras alternativas (comparan RAM y boot time). Pensado para escalar a múltiples sesiones simultáneas. link
  • Martin Fowler actualiza su guía de cómo usar AI para code — Tercera actualización con consejos concretos: cambios chicos, guardrails, documentación obsesiva, y cada cambio verificado antes de shippear. "Verified" ya no significa "lo leíste vos". link
  • Simon Willison agrega RSS feed a su página de tools (con Claude) — Usó Claude para agregar un Atom feed a /elsewhere/tools/. Inspirado en el post de Matt Webb sobre necesitar RSS para vibe-coded apps. link

🏗️ Software engineering

  • Scaling Camera File Processing at Netflix — Cómo Netflix maneja el procesamiento masivo de archivos de cámara. Post técnico de Netflix Tech Blog. link
  • Cómo Stripe detecta transacciones fraudulentas en 100 ms (ByteByteGo) — Arquitectura de Radar: decisiones de diseño, tradeoffs, latencia. link
  • El tech stack de Wise (ByteByteGo) — Cómo su sistema de deployment bloquea automáticamente releases que causarían incidentes en producción. link
  • Cómo Amazon usa LLMs para recomendar productos (ByteByteGo) — Análisis de COSMO, el sistema de recomendación de Amazon basado en LLMs, y los desafíos del equipo de ingeniería. link
  • Post-quantum encryption para Cloudflare IPsec, GA — Soporte general disponible para ML-KEM híbrido, interoperabilidad confirmada con Cisco y Fortinet. link
  • Kubernetes para beginners (ByteByteGo) — Explicación de K8s como sistema de promesas donde cada componente es un pequeño programa cumpliendo una de esas promesas. link
  • Data Warehouse vs Data Lake vs Data Mesh (ByteByteGo) — Guía para decidir dónde y cómo organizar datos. link

📚 Vale la pena leer

  • "When Your LLM Reaches End-of-Life" — framework para migrar modelos en producción — Paper con enfoque bayesiano para calibrar eval automáticos contra juicios humanos. Aplica a cualquier sistema LLM en producción con millones de interacciones mensuales. link
  • "Step-level Optimization for Efficient Computer-use Agents" — Paper que argumenta que no todos los steps en GUI tasks necesitan el mismo compute. Propone rutear pasos rutinarios a policies más chicas y baratas. Aplica directo a computer-use agents como Claude. link
  • Eugene Yan: "Task-Specific LLM Evals That Do and Don't Work" — Guía práctica sobre qué evalúa realmente mide algo útil y qué es ruido. link
  • "LLM Summaries Are Ruining Your Learning" — Crítica fundamentada a confiar en resúmenes de LLM para aprender conceptos complejos. link

💤 Skippeable pero conviene saber

  • La política anti-AI de Zig: "no escriban con LLM en mi casa" — Andrew Kelley explica que pueden detectar contribuciones generadas por IA por un "digital smell" fácil de identificar para quienes no usan. Bun (propiedad de Anthropic) lo usa pesadamente pero respeta la política. link
  • Zuckerberg dice que costos de AI contribuyeron a 8000 despidos en Meta — Post de Forbes. Señal de que el costo de inferencia sigue siendo un problema incluso para las Big Tech. link
  • TradingAgents: framework multi-agente de trading financiero con LLMs — v0.2.4 con structured outputs, LangGraph checkpoint, soporte DeepSeek/Qwen/GLM. Si te copan los sistemas multi-agente aplicados, es buen repo para mirar. link
  • Maigret: recolecta dossiers de personas por username en 3000+ sitios — Herramienta OSINT útil si alguna vez necesitás investigar identidades. No requiere API keys. link
  • TRUST: framework descentralizado para verificación de sistemas multi-agente — Propone HDAGs para descomponer Chain-of-Thought en 5 niveles de abstracción y auditar en paralelo. link
  • Q1 2026 Internet disruptions según Cloudflare — Apagones en Uganda e Irán, ataques con drones a infraestructura cloud. link
  • Anthropic: actualización sobre salvaguardas electorales — Post institucional, no hay cambios funcionales. link
  • Vibe coding y estudiantes: paper analiza 19k interacciones — Los top performers hacen help-seeking instrumental (preguntar, explorar), los low performers delegan tareas (executive help-seeking). link

Artículos fetched (48)

  • An update on our election safeguards
    anthropic-news· 24-abr

    Apr 24, 2026Announcements

  • Compositional Meta-Learning for Mitigating Task Heterogeneity in Physics-Informed Neural Networks
    arxiv-ai· 01-may

    arXiv:2604.26999v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) approximate solutions of partial differential equations (PDEs) by embedding physical laws into the loss function. In parameterized PDE families, variations in coefficients or boundary/initial conditions define distinct tasks. This makes training individual PINNs for each task computationally prohibitive, while cross-task transfer can be sensitive to task heterogeneity. While meta-learning can reduce retraining cost, existing methods often rely on a single global initialization and may suffer from negative transfer, particularly under feature-scarce coordinate inputs and limited training-task availability. We propose the Learning-Affinity Adaptive Modular Physics-Informed Neural Network (LAM-PINN), a c…

  • Binary Spiking Neural Networks as Causal Models
    arxiv-ai· 01-may

    arXiv:2604.27007v1 Announce Type: new Abstract: We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output of the network by leveraging logic-based methods. In particular, we show that we can successfully use a SAT as well as a SMT solver to compute abductive explanations from this binary causal model. To illustrate our approach, we trained the BSNN on the standard MNIST dataset and applied our SAT-based and SMT-based methods to finding abductive explanations of the network's classifications based on pixel-level features. We also compared the found explanations against SHAP, a popular method us…

  • When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
    arxiv-ai· 01-may

    arXiv:2604.27082v1 Announce Type: new Abstract: We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation metrics against human judgments, enabling confident model comparison even with limited manual evaluation data. We demonstrate this framework on a commercial question-answering system serving 5.3M monthly interactions across six global regions; evaluating correctness, refusal behavior, and stylistic adherence to successfully identify suitable replacement models. The framework is broadly applicable to any enterprise deploying LLM-based products, providing a principled, reproducible methodology f…

  • End-to-end autonomous scientific discovery on a real optical platform
    arxiv-ai· 01-may

    arXiv:2604.27092v1 Announce Type: new Abstract: Scientific research has long been human-led, driving new knowledge and transformative technologies through the continual revision of questions, methods and claims as evidence accumulates. Although large language model (LLM)-based agents are beginning to move beyond assisting predefined research workflows, none has yet demonstrated end-to-end autonomous discovery in a real physical system that produces a nontrivial result supported by experimental evidence. Here we introduce Qiushi Discovery Engine, an LLM-based agentic system for end-to-end autonomous scientific discovery on a real optical platform. Qiushi Engine combines nonlinear research phases, Meta-Trace memory and a dual-layer architecture to maintain adaptive and stable research traje…

  • Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI
    arxiv-ai· 01-may

    arXiv:2604.27096v1 Announce Type: new Abstract: The purpose of our paper is to develop a unified multi-agent architecture that automates end-to-end machine learning (ML) pipeline generation from datasets and natural-language (NL) goals, improving efficiency, robustness and explainability. A five-agent system is proposed to handle profiling, intent parsing, microservice recommendation, Directed Acyclic Graph (DAG) construction and execution. It integrates code-grounded Retrieval-Augmented Generation (RAG) for microservice understanding, an explainable hybrid recommender combining multiple criteria, a self-healing mechanism using Large Language Model (LLM)-based error interpretation and adaptive learning from execution history. The approach is evaluated on 150 ML tasks across diverse scenar…

  • Unsupervised Electrofacies Classification and Porosity Characterization in the Offshore Keta Basin Using Wireline Logs
    arxiv-ai· 01-may

    arXiv:2604.27126v1 Announce Type: new Abstract: This study presents an unsupervised machine learning workflow for electrofacies analysis in the offshore Keta Basin, Ghana, where core data are scarce. Six standard wireline logs from Well~C were analysed over a depth interval comprising approximately $11{,}195$ samples. K-means clustering was applied in multivariate log space, with the clustering structure evaluated using inertia and silhouette diagnostics. Four clusters were identified, supported by an average silhouette coefficient of approximately $0.50$, indicating moderate but meaningful separation. The resulting electrofacies exhibit systematic, depth-continuous patterns associated with variations in clay content, porosity, and rock framework properties, forming a geological continuum…

  • TRUST: A Framework for Decentralized AI Service v.0.1
    arxiv-ai· 01-may

    arXiv:2604.27132v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) and Multi-Agent Systems (MAS) in high-stakes domains demand reliable verification, yet centralized approaches suffer four limitations: (1) Robustness, with single points of failure vulnerable to attacks and bias; (2) Scalability, as reasoning complexity creates bottlenecks; (3) Opacity, as hidden auditing erodes trust; and (4) Privacy, as exposed reasoning traces risk model theft. We introduce TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework with three innovations: (i) Hierarchical Directed Acyclic Graphs (HDAGs) that decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing; (ii) the DAAN protocol, which projects multi-agen…

  • Unpacking Vibe Coding: Help-Seeking Processes in Student-AI Interactions While Programming
    arxiv-ai· 01-may

    arXiv:2604.27134v1 Announce Type: new Abstract: Generative AI is reshaping higher education programming through vibe coding, where students collaborate with AI via natural language rather than writing code line-by-line. We conceptualize this practice as help-seeking, analyzing 19,418 interaction turns from 110 undergraduate students. Using inductive coding and Heterogeneous Transition Network Analysis, we examined interaction sequences to compare top- and low-performing students. Results reveal that top performers engaged in instrumental help-seeking -- inquiry and exploration -- eliciting tutor-like AI responses. In contrast, low performers relied on executive help-seeking, frequently delegating tasks and prompting the AI to assume an executor role focused on ready-made solutions. These …

  • Optimal Stop-Loss and Take-Profit Parameterization for Autonomous Trading Agent Swarm
    arxiv-ai· 01-may

    arXiv:2604.27150v1 Announce Type: new Abstract: Autonomous crypto trading systems often spend most of their design effort on finding entries, while exits are left to fixed rules that are rarely tested in a systematic way. This paper examines whether better stop-loss and take-profit settings can improve the performance of an autonomous trading agent swarm. Using more than 900 historical trades, we replay each trade under many alternative exit policies and compare results against the existing production setup. The study finds that exit design matters meaningfully: stronger configurations improve risk-adjusted performance and generally favor tighter loss limits, earlier profit capture, and closer trailing protection. The paper also discusses a key evaluation challenge: a purely chronological…

  • Step-level Optimization for Efficient Computer-use Agents
    arxiv-ai· 01-may

    arXiv:2604.27151v1 Announce Type: new Abstract: Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across compute…

  • A Beginner’s Guide to Kubernetes
    bytebytego· 30-abr

    In this article, we will learn how Kubernetes is a system of promises, and that every piece of it is a small program keeping one of those promises.14 hrs ago • ByteByteGo13711

  • EP212: Data Warehouse vs Data Lake vs Data Mesh
    bytebytego· 25-abr

    Storing data is the easy part. Deciding where and how to organize it is the real challenge.Apr 25 • ByteByteGo306214

  • How Amazon Uses LLMs to Recommend Products
    bytebytego· 27-abr

    In this article, we will look at how COSMO works and the challenges the engineering team faced.Apr 27 • ByteByteGo30738

  • How Stripe Detects Fraudulent Transactions Within 100 ms
    bytebytego· 28-abr

    In this article, we will look at how Stripe’s Radar does this effectively and the architectural decisions the team took while building it.Apr 28 • ByteByteGo343410

  • The Tech Stack Powering Wise
    bytebytego· 29-abr

    In 2024, Wise’s deployment system automatically blocked hundreds of releases that would have caused production incidents.Apr 29 • ByteByteGo28026

  • Agents can now create Cloudflare accounts, buy domains, and deploy
    cloudflare· 30-abr

    Starting today, agents can now be Cloudflare customers. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission, but there’s no need to go to the dashboard, copy and paste API tokens, or enter credit card details.

  • Post-quantum encryption for Cloudflare IPsec is generally available
    cloudflare· 30-abr

    Cloudflare IPsec now has generally available support for post-quantum encryption via hybrid ML-KEM. We’ve confirmed interoperability with Cisco and Fortinet.

  • Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions
    cloudflare· 28-abr

    The first quarter of 2026 saw a surge in Internet disruptions, from nationwide shutdowns in Uganda and Iran to unprecedented drone strikes on cloud infrastructure. We explore the data behind these events using Cloudflare Radar.

  • soxoj/maigret
    github-trending

    🕵️‍♂️ Collect a dossier on a person by username from 3000+ sites Maigret Maigret collects a dossier on a person by username only, checking for accounts on a huge number of sites and gathering all the available information from web pages. No API keys required. Contents In one minute Main features Demo Installation Usage Contributing Commercial Use About In one minute Ensure you have Python 3.10 or higher. pip install maigret maigret YOUR_USERNAME No install? Try the Telegram bot or a Cloud Shell. Want a web UI? See how to launch it. See also: Quick start. Main features Supports 3,000+ sites (see full list). A default run checks the 500 highest-ranked sites by traffic; pass -a to scan everything, or --tags to narrow by category/country. Embeddable in Python projects — import maigret and ru…

  • 1jehuang/jcode
    github-trending

    Coding Agent Harness jcode The next generation coding agent harness to raise the skill ceiling. Built for multi-session workflows, infinite customizability, and performance. Features · Install · Quick Start · Further Reading · Contributing Installation # macOS & Linux curl -fsSL https://raw.githubusercontent.com/1jehuang/jcode/master/scripts/install.sh | bash Need Windows, Homebrew, source builds, provider setup, or tell your agent to set it up for you? Jump to detailed installation. Performance & Resource Efficiency jcode is built to be as performant and resource efficient as possible. Every metric is optimized to the bone, which is important for scaling multi-session workflows. Here we sample a few metrics to show the difference: RAM usage and boot up. RAM comparison 1 active session To…

  • lukilabs/craft-agents-oss
    github-trending

    Craft Agents How it Works (Video) To understand what Craft Agents does and how it works watch this video. Click Here (or on the image above) to watch the video on YouTube → Why Craft Agents was built Craft Agents is a tool we built so that we (at craft.do) can work effectively with agents. It enables intuitive multitasking, no-fluff connection to any API or Service, sharing sessions, and a more document (vs code) centric workflow - in a beautiful and fluid UI. It uses the Claude Agent SDK and the Pi SDK side by side—building on what we found great and improving areas where we've desired improvements. It's built with Agent Native software principles in mind, and is highly customisable out of the box. One of the first of its kind. Craft Agents is open source under the Apache 2.0 license - s…

  • obra/superpowers
    github-trending

    An agentic skills framework & software development methodology that works. Superpowers Superpowers is a complete software development methodology for your coding agents, built on top of a set of composable skills and some initial instructions that make sure your agent uses them. How it works It starts from the moment you fire up your coding agent. As soon as it sees that you're building something, it doesn't just jump into trying to write code. Instead, it steps back and asks you what you're really trying to do. Once it's teased a spec out of the conversation, it shows it to you in chunks short enough to actually read and digest. After you've signed off on the design, your agent puts together an implementation plan that's clear enough for an enthusiastic junior engineer with poor taste, n…

  • TauricResearch/TradingAgents
    github-trending

    TradingAgents: Multi-Agents LLM Financial Trading Framework Deutsch | Español | français | 日本語 | 한국어 | Português | Русский | 中文 TradingAgents: Multi-Agents LLM Financial Trading Framework News [2026-04] TradingAgents v0.2.4 released with structured-output agents (Research Manager, Trader, Portfolio Manager), LangGraph checkpoint resume, persistent decision log, DeepSeek/Qwen/GLM/Azure provider support, Docker, and a Windows UTF-8 encoding fix. See CHANGELOG.md for the full list. [2026-03] TradingAgents v0.2.3 released with multi-language support, GPT-5.4 family models, unified model catalog, backtesting date fidelity, and proxy support. [2026-03] TradingAgents v0.2.2 released with GPT-5.4/Gemini 3.1/Claude 4.6 model coverage, five-tier rating scale, OpenAI Responses API, Anthropic effort …

  • warpdotdev/warp
    github-trending

    Warp is an agentic development environment, born out of the terminal. Website · Code · Agents · Terminal · Drive · Docs · How Warp Works Note OpenAI is the founding sponsor of the new, open-source Warp repository, and the new agentic management workflows are powered by GPT models. About Warp is an agentic development environment, born out of the terminal. Use Warp's built-in coding agent, or bring your own CLI agent (Claude Code, Codex, Gemini CLI, and others). Installation You can download Warp and read our docs for platform-specific instructions. Warp Contributions Overview Dashboard Explore build.warp.dev to: Watch thousands of Oz agents triage issues, write specs, implement changes, and review PRs View top contributors and in-flight features Track your own issues with GitHub sign-in C…

  • 10x Faster Real-Time High-Quality AI Video Generation
    hn-ai· 01-may

    Article URL: https://tenstorrent.com/solutions/real-time-video Comments URL: https://news.ycombinator.com/item?id=47971377 Points: 1 # Comments: 0

  • AI Tips and Tricks
    hn-ai· 01-may

    Article URL: https://www.youtube.com/watch?v=w_m5RmVsmtE Comments URL: https://news.ycombinator.com/item?id=47971410 Points: 2 # Comments: 0

  • LLM Summaries Are Ruining Your Learning
    hn-ai· 01-may

    Article URL: https://arpitbhayani.me/blogs/do-not-rely-on-summaries/ Comments URL: https://news.ycombinator.com/item?id=47971373 Points: 1 # Comments: 0

  • Mark Zuckerberg Says AI Costs Contributed to Layoffs of 8k Staffers
    hn-ai· 01-may

    Article URL: https://www.forbes.com/sites/antoniopequenoiv/2026/04/30/mark-zuckerberg-says-ai-costs-contributed-to-layoffs-of-8000-staffers-report-says/ Comments URL: https://news.ycombinator.com/item?id=47971309 Points: 2 # Comments: 0

  • PrivClaw – Open-source self-hostable AI plugin marketplace (FastAPI and Next.js)
    hn-ai· 01-may

    Article URL: https://github.com/geneleo537-afk/privclaw Comments URL: https://news.ycombinator.com/item?id=47971245 Points: 1 # Comments: 0

  • Ask HN: In the age of AI do we need to learn how to code?
    hn-ai· 01-may

    First off i am a software engineer. I had been a developer with C and Rust. In the old days we have to learn a lot a bout these languages even assembly. My days we only focus on C while still have to touch assembly a bit. I saw my frd's kid in college only learning python and a little bit C. I feel like after LLM and ai coding in the future people will skip the C. While I doubt and consider it would cause problem cause lacking of fundermental it seems it's the trend. Comments URL: https://news.ycombinator.com/item?id=47971273 Points: 1 # Comments: 1

  • What do you think of people buying Mac mini's to run AI?
    hn-ai· 01-may

    Today during Apple's earnings call we heard they're out of mac mini's and back ordered due to huge surge in demand. We know barely any decent size models can run on it. Most of the buyers are buying to run Openclaw in complete isolation from their regular desktops or laptops. A VPS for few dollars/month will be sufficient to run tools like Openclaw. What are your thoughts on why people buying mac minis? Comments URL: https://news.ycombinator.com/item?id=47971335 Points: 1 # Comments: 3

  • AI and the Future of News 2026
    hn-ai· 01-may

    Article URL: https://reutersinstitute.politics.ox.ac.uk/news/ai-and-future-news-2026-what-we-learnt-about-its-impact-newsrooms-fact-checking-and-news Comments URL: https://news.ycombinator.com/item?id=47971364 Points: 2 # Comments: 0

  • How People ask Claude for personal guidance
    hn-ai· 01-may

    Article URL: https://www.anthropic.com/research/claude-personal-guidance Comments URL: https://news.ycombinator.com/item?id=47971585 Points: 1 # Comments: 0

  • Task-Specific LLM Evals That Do and Don't Work
    hn-ai· 01-may

    Article URL: https://eugeneyan.com/writing/evals/ Comments URL: https://news.ycombinator.com/item?id=47971328 Points: 1 # Comments: 0

  • [AINews] The Inference Inflection
    latentspace· 30-abr

    a quiet day lets us reflect on the growing implications of the inference age

  • [AINews] Agents for Everything Else: Codex for Knowledge Work, Claude for Creative Work
    latentspace· 01-may

    a quiet day lets us reflect on coding agents "breaking containment"

  • Fragments: April 29
    martin-fowler· 29-abr

    Chris Parsons has updated his guide on using AI to code. This is his third update, what I like about it is that he gives a lot of concrete information about how he uses AI, with sufficient detail that we can learn from him. His advice also resonates with the better advice I’ve seen out there, so the article makes a good overview of the state of using AI for software development. I wrote the previous version of this post in March 2025, updated it once in August, and it has been linked from almost everything I have written about AI engineering since. The fundamentals from that post still hold: keep changes small, build guardrails, document ruthlessly, and make sure every change gets verified before it ships. One thing has had to move with the volume. “Verified” used to mean “read by you”. W…

  • Structured-Prompt-Driven Development (SPDD)
    martin-fowler· 28-abr

    LLM programming assistants have demonstrated considerable value, but mostly with individual developers. The internal IT organization in Thoughtworks has been using them for their teams and have developed a method and workflow called Structured Prompt-Driven Development (SPDD). Wei Zhang and Jessie Jie Xia describe a simple example of this workflow with details in github. This workflow treats the prompts as a first-class artifact, kept with the code in version control, and used to align development with business needs. They have found that developers need three key skills to be effective: alignment, abstraction-first, and iterative review. more…

  • Scaling Camera File Processing at Netflix
    netflix-tech· 24-abr
  • The Zig project's rationale for their firm anti-AI contribution policy
    simonw· 30-abr

    <p><a href="https://ziglang.org/">Zig</a> has one of the most stringent <a href="https://ziglang.org/code-of-conduct/">anti-LLM policies</a> of any major open source project:</p> <blockquote> <p>No LLMs for issues.</p> <p>No LLMs for pull requests.</p> <p>No LLMs for comments on the bug tracker, including translation. English is encouraged, but not required. You are welcome to post in your native language and rely on others to have their own translation tools of choice to interpret your words.</p> </blockquote> <p>The most prominent project written in Zig may be the <a href="https://bun.com/">Bun</a> JavaScript runtime, which was <a href="https://bun.com/blog/bun-joins-anthropic">acquired by Anthropic</a> in December 2025 and, unsurprisingly, makes heavy use of AI assistance.</p> <p>Bun o…

  • LLM 0.32a0 is a major backwards-compatible refactor
    simonw· 29-abr

    <p>I just released <a href="https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28">LLM 0.32a0</a>, an alpha release of my <a href="https://llm.datasette.io/">LLM</a> Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while.</p> <p>Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response.</p> <pre><span class="pl-k">import</span> <span class="pl-s1">llm</span> <span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_model</span>(<span class="pl-s">"gpt-5.5"</span>) <span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="p…

  • llm 0.32a1
    simonw· 29-abr

    <p><strong>Release:</strong> <a href="https://github.com/simonw/llm/releases/tag/0.32a1">llm 0.32a1</a></p> <blockquote> <ul> <li>Fixed a bug in 0.32a0 where tool-calling conversations were not correctly reinflated from SQLite. <a href="https://github.com/simonw/llm/issues/1426">#1426</a></li> </ul> </blockquote> <p>Tags: <a href="https://simonwillison.net/tags/llm">llm</a></p>

  • llm 0.32a0
    simonw· 29-abr

    <p><strong>Release:</strong> <a href="https://github.com/simonw/llm/releases/tag/0.32a0">llm 0.32a0</a></p> <p>See <a href="https://simonwillison.net/2026/Apr/29/llm/">the annotated release notes</a>.</p> <p>Tags: <a href="https://simonwillison.net/tags/llm">llm</a></p>

  • Codex CLI 0.128.0 adds /goal
    simonw· 30-abr

    <p><strong><a href="https://github.com/openai/codex/releases/tag/rust-v0.128.0">Codex CLI 0.128.0 adds /goal</a></strong></p> The latest version of OpenAI's Codex CLI coding agent adds their own version of the <a href="https://ghuntley.com/ralph/">Ralph loop</a>: you can now set a <code>/goal</code> and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted.</p> <p>It looks like the feature is mainly implemented though the <a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md">goals/continuation.md</a> and <a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.m…

  • Quoting Andrew Kelley
    simonw· 30-abr

    <blockquote cite="https://lobste.rs/s/ifcyr1/contributor_poker_zig_s_ai_ban#c_cbtxub"><p>It's a common misconception that we can't tell who is using LLM and who is not. I'm sure we didn't catch 100% of LLM-assisted PRs over the past few months, but the kind of mistakes humans make are fundamentally different than LLM hallucinations, making them easy to spot. Furthermore, people who come from the world of agentic coding have a certain <em>digital smell</em> that is not obvious to them but is obvious to those who abstain. It's like when a smoker walks into the room, everybody who doesn't smoke instantly knows it.</p> <p>I'm not telling you not to smoke, but I am telling you not to smoke in my house.</p></blockquote> <p class="cite">&mdash; <a href="https://lobste.rs/s/ifcyr1/contributor_pok…

  • Our evaluation of OpenAI's GPT-5.5 cyber capabilities
    simonw· 30-abr

    <p><strong><a href="https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities">Our evaluation of OpenAI&#x27;s GPT-5.5 cyber capabilities</a></strong></p> The UK's AI Security Institute <a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities">previously evaluated Claude Mythos</a>: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. <p>Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags…

  • We need RSS for sharing abundant vibe-coded apps
    simonw· 30-abr

    <p><strong><a href="https://interconnected.org/home/2026/04/29/syndicating-vibes">We need RSS for sharing abundant vibe-coded apps</a></strong></p> Matt Webb:</p> <blockquote> <p>I would love an RSS web feed for all those various tools and apps pages, each item with an “Install” button. (But install to where?)</p> <p>The lesson here is that when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent. Shipping a tool or a micro-app is less like launching a website and more like posting on a blog.</p> </blockquote> <p>This inspired me to <a href="https://github.com/simonw/simonwillisonblog/pull/665">have Claude</a> add an Atom feed (and icon) to my <a href="https://simonwillison.net/elsewhere/tool/">/elsewhere/tools/</a> page, which itself is po…