- Introducing the Services Track and Partner Hub of the Claude Partner Network
anthropic-news· 03-jun
Jun 3, 2026Announcements
- What we learned mapping a year’s worth of AI-enabled cyber threats
anthropic-news· 03-jun
Jun 3, 2026Policy
- Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
arxiv-ai· 05-jun
arXiv:2606.05334v1 Announce Type: new Abstract: Returned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Existing PHM approaches support degradation prediction, but often target fixed operating conditions or isolated component benchmarks, while material-fatigue assessment is rarely linked to system-level functional prognosis. This paper addresses this gap for an angle grinder by combining uncertainty-aware functional prediction with component-level fatigue assessment in an instance-specific reliability workflow. The proposed framewor…
- SentinelBench: A Benchmark for Long-Running Monitoring Agents
arxiv-ai· 05-jun
arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which are better served by a strategy of sustained attention. Instead, agents should monitor an environment, notice when an external event makes progress possible, then respond promptly without wasting resources while waiting. To measure progress on this class of tasks, we introduce SentinelBench, an open-source benchmark for time-evolving monitoring tasks. SentinelBench contains 100 tasks across 10 synthetic web environments, includin…
- An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
arxiv-ai· 05-jun
arXiv:2606.05357v1 Announce Type: new Abstract: Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We first developed a deep learning framework to predict MOAKS features directly from knee MRIs and incorporated conformal prediction to provide prediction uncertainty quantification. This uncertainty-aware strategy enables explicit filtering of model outputs, retaining only high-confidence MOAKS predictions at the knee level. Second, we applied a longitudinal latent class mixed model (LCMM) to examine associations between key structura…
- Synthetic Contrastive Reasoning for Multi-Table Q&A
arxiv-ai· 05-jun
arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative traces with heterogeneous LLMs. We then use the resulting preference pairs to fine-tune open-weight LLMs with Contrastive Preference Optimization (CPO). Across Qwen3-14B, Mistral-8B, and Llama-3.1-8B, CPO achieves absolute average improvements over Q&A supervised fine-tuning ranging from 9.7…
- Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
arxiv-ai· 05-jun
arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumption does not hold under interaction. We study post-decision manipulability: the extent to which an evaluation outcome can be altered through subsequent conversation with the judge after an initial decision has been made. Across controlled experiments on MT-Bench and AlpacaEval, we find that LLM judges are highly stable under repeated and neutral reevaluation, yet become substantially reversible under targeted post-decision challenge. An anti-baseline challenge protocol shows that stable judgments…
- Residual Modeling for High-Fidelity Learned Compression of Scientific Data
arxiv-ai· 05-jun
arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate. We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should …
- How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
arxiv-ai· 05-jun
arXiv:2606.05256v1 Announce Type: new Abstract: This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts engaging users in live debate. After public disclosure, Reddit authorized moderators to release an archive of the AI-generated comments, creating a rare opportunity to examine how large language models operated in an identity-rich deliberative forum without disclosure. We conduct a structured content analysis of this corpus, evaluating identity performance, authority signaling, alignment strategies, and activation of cognitive heuristics. Identity targeting or adoption appears in over two-t…
- GITCO: Gated Inference-Time Context Optimization in TSFMs
arxiv-ai· 05-jun
arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by optimizing the input context rather than modifying model weights. We present GITCO (Gated Inference-Time Context Optimization), a lightweight three-component framework: Gate, Router, and Critic that selectively identifies and suppresses harmful patches without any parameter updates. Evaluated on TimesFM 2.5 across 53 GIFT-Eval datasets under K-fold cross-validation, GITCO achieves an average +1.95% MASE reduction on TimesFM 2.5 while capturing 89.9% of the improvement upper bound. We introduce …
- What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
arxiv-ai· 05-jun
arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmissi…
- I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
arxiv-ai· 05-jun
arXiv:2606.05316v1 Announce Type: new Abstract: Multimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero shot framework that identifies missing knowledge, retrieves open web evidence, and synthesizes evidence grounded background knowledge for meme understanding and detection. We also introduce a curated meme understanding benchmark of recent memes from 2024 to 2026 with external background knowledge annotations. Experiments on three meme understanding datasets and five meme detection tasks show that our framework improves knowledge…
- How OpenAI Built Its Data Agent
bytebytego· 03-jun
The hardest part of data analysis isn’t writing SQL. It’s finding the right tables to use in the first place and understanding semantically how to use…Jun 3 • ByteByteGo29349
- The Path of a Request: A Tour of Modern Web Architecture
bytebytego· 04-jun
In this article, we follow the journey of a web request one hop at a time.15 hrs ago • ByteByteGo1037
- A Practical Guide to Becoming an AI-Native Engineer
bytebytego· 02-jun
This piece is a working guide for engineers who want to land on the productive side of that split.Jun 2 • ByteByteGo382813
- How DoorDash Built a Testing System to Evaluate LLMs
bytebytego· 30-may
In this article, we will learn how they built this flywheel and the key takeaways.May 30 • ByteByteGo316214
- AI enthusiasts are in a race against time, AI skeptics are in a race against entropy (xpost)
charity-majors· 02-jun
Both sides are grappling with a real existential threat, and both sides feel like they are screaming into the void. There is a way to close the gap and get everyone pulling in the same direction.. Xposted from substack. I recently attended a talk where one of the presenters made some pretty…astonishing claims about what they […]
- Loading...
claude-changelog
Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...Loading...
- Claude Platform
claude-changelog
Release notesCopy pageUpdates to the Claude Platform, including the Claude API, client SDKs, and the Claude Console.Copy pageFor release notes on Claude Apps, see the Release notes for Claude Apps in the Claude Help Center.For updates to Claude Code, see the complete CHANGELOG.md in the claude-code repository. June 2, 2026 The advisor tool now supports a max_tokens parameter to cap the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length advisor responses. Set tools[].max_tokens on the advisor tool definition; see Capping advisor output. On the Claude API, you are no longer billed for a request when it returns stop_reason: "refusal" without Claude having generated any output. See Streaming refusals for detecting and handling ref…
- VoidZero is joining Cloudflare
cloudflare· 04-jun
VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. Vite stays open source, vendor-agnostic, and built for everyone.
- Enforcing the First AS in BGP AS_PATHs
cloudflare· 03-jun
BGP is vulnerable to routing hijacks and path leaks that negatively impact traffic on the Internet. RPKI helps solve some of these problems, but for some forged paths, we need to rely on a simpler mechanism: First AS enforcement in BGP.
- How we reduced core unit boot time from hours to minutes
cloudflare· 01-jun
We investigated why firmware updates were causing our core servers to take four hours to reboot. By diving into UEFI data structures and iPXE automation, we eliminated unnecessary timeouts and cut boot times back down to minutes.
- Mark Zuckerberg's longest-serving employee on AI, jobs and her boss
hn-ai· 05-jun
Article URL: https://www.bbc.com/news/articles/c5y71106g07o Comments URL: https://news.ycombinator.com/item?id=48408832 Points: 2 # Comments: 0
- AI Optimists Race Clock; Skeptics Race Decay
hn-ai· 05-jun
Article URL: https://charitydotwtf.substack.com/p/ai-enthusiasts-are-in-a-race-against Comments URL: https://news.ycombinator.com/item?id=48408940 Points: 1 # Comments: 0
- Running an AI-native engineering org
hn-ai· 05-jun
Article URL: https://claude.com/blog/running-an-ai-native-engineering-org Comments URL: https://news.ycombinator.com/item?id=48408846 Points: 1 # Comments: 0
- Show HN: Snill.ai launched – describe your biz – get an internal app in seconds
hn-ai· 05-jun
Article URL: https://snill.ai/ Comments URL: https://news.ycombinator.com/item?id=48408894 Points: 1 # Comments: 0
- Satya Nadella 'Not Sure' Who Said Microsoft Wanted to Make Addictive AI
hn-ai· 05-jun
Article URL: https://www.404media.co/satya-nadella-not-sure-who-said-microsoft-wanted-to-make-addictive-ai-is-looking-for-guy-who-did-this/ Comments URL: https://news.ycombinator.com/item?id=48408581 Points: 5 # Comments: 0
- I Didn't Become a Developer to Review AI Slop
hn-ai· 05-jun
Article URL: https://www.builder.io/blog/developers-drowning-in-ai-prs Comments URL: https://news.ycombinator.com/item?id=48408569 Points: 4 # Comments: 0
- How much value is AI creating?
hn-ai· 05-jun
Article URL: https://www.ft.com/content/8e9ae7a4-7209-4e2c-aa36-f3af77d6ce1f Comments URL: https://news.ycombinator.com/item?id=48408560 Points: 2 # Comments: 0
- Americans lead AI data centre backlash, global poll finds
hn-ai· 05-jun
Article URL: https://www.ft.com/content/ed07dc6c-aabe-4e4d-a508-4d2b4f24a852 Comments URL: https://news.ycombinator.com/item?id=48408598 Points: 3 # Comments: 0
- SpaceX IPO video sells Musk's space, AI, asteroid dreams to mom-n-pop investors
hn-ai· 05-jun
Article URL: https://www.latimes.com/business/story/2026-06-04/spacex-ipo-video-sells-elon-musks-space-ai-asteroid-dreams-to-mom-and-pop-investors Comments URL: https://news.ycombinator.com/item?id=48408668 Points: 2 # Comments: 0
- ZEC drops 30% after Anthropic AI finds Zcash counterfeit vulnerability
hn-ai· 05-jun
Article URL: https://www.tradingview.com/news/cointelegraph:52f56f35b094b:0-zec-drops-30-after-anthropic-ai-finds-zcash-counterfeit-vulnerability/ Comments URL: https://news.ycombinator.com/item?id=48408925 Points: 1 # Comments: 0
- [AINews] not much happened today
latentspace· 05-jun
a quiet day
- Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
latentspace· 04-jun
We talk with the VendingBench authors on evaling Claudes from Haiku to Mythos, and how they build leading, and lasting, frontier evals from scratch.
- 🔬Scaling Past Informal AI - Carina Hong, Axiom Math
latentspace· 03-jun
Verified Generation and Compounding Intelligence
- ⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build
latentspace· 03-jun
The legendary Microsoft CEO makes his first Latent Space appearance!
- [AINews] Reve 2 and Ideogram 4: Layouts in Imagegen
latentspace· 04-jun
a quiet day.
- Fragments: June 2
martin-fowler· 02-jun
Greg Wilson has noticed that lots of folks are using dodgy metrics to figure out if AI tools are worth their costs. Would you measure lines of code generated, or tickets closed? Or would you send out a survey asking whether developers feel more productive? Each of those approaches is flawed in a different way; He lists lots of common metrics, and why they are flawed. Sadly he doesn’t give any suggestions on what would be better. In my view, since we cannot measure productivity, any metrics are weak evidence at the best of times. I do somewhat use one of his flawed measures: “Asking Developers If They Feel More Productive”. While I acknowledge the problems he gives with this measure, I find that in an environment where decent measures are hard to find, even such a dim light is the best we …
- Dynamic Repartitioning for Time Series Workloads
netflix-tech· 03-jun
- High-Throughput Graph Abstraction at Netflix: Part I
netflix-tech· 29-may
- From Silos to Service Topology: Why Netflix Built a Real-Time Service Map
netflix-tech· 29-may
- Uber Caps Usage of AI Tools Like Claude Code to Manage Costs
simonw· 03-jun
<p><strong><a href="https://www.bloomberg.com/news/articles/2026-06-02/uber-caps-usage-of-ai-tools-like-claude-code-to-cut-costs">Uber Caps Usage of AI Tools Like Claude Code to Manage Costs</a></strong></p> I wrote <a href="https://simonwillison.net/2026/May/27/product-market-fit/#the-ai-failure-stories-around-this-are-pretty-thin">the other day</a> about Uber blowing its 2026 AI budget in four months, and how that wasn't particularly surprising given they would have set that budget in 2025, before anyone could have predicted how popular token-burning coding agents were about to become. Natalie Lung for Bloomberg:</p> <blockquote> <p>The rideshare giant is limiting all employees to $1,500 in monthly token spending per AI coding tool, an Uber spokesperson said in response to a Bloomberg N…
- AI enthusiasts are in a race against time, AI skeptics are in a race against entropy
simonw· 04-jun
<p><strong><a href="https://charitydotwtf.substack.com/p/ai-enthusiasts-are-in-a-race-against">AI enthusiasts are in a race against time, AI skeptics are in a race against entropy</a></strong></p> Charity Majors neatly captures the dynamic between AI enthusiasts and AI skeptics, both of whom are trying to build great software, often in the same teams:</p> <blockquote> <p>The enthusiasts are <em>not wrong</em>. We are starting to see real, non-imaginary, discontinuous leaps in capabilities from teams that lean in hard to working with AI. And this does not feel like a normal technology cycle where you can wait for the dust to settle; teams that sit this out while competitors are hustling could be out of business before the dust settles. That’s a real, existential threat.</p> <p>The skeptics…
- Quoting Emanuel Maiberg, 404 Media
simonw· 04-jun
<blockquote cite="https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/"><p>After this story was published Google's spokesperson reached out and asked us to publish a slightly different version of that statement. The new statement no longer stated that "it's critical that we maintain humans in the loop."</p></blockquote> <p class="cite">— <a href="https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/">Emanuel Maiberg, 404 Media</a>, Google Employees Internally Share Memes About How Its AI Sucks</p> <p>Tags: <a href="https://simonwillison.net/tags/ai-ethics">ai-ethics</a>, <a href="https://simonwillison.net/tags/journalism">journalism</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwill…