AI & LLMs

13 articles · latest first

Claude Fable 5, DiffusionGemma 26B-A4B, Kimi K2.7 Code, NVIDIA 550B inference, Cohere North Mini Code

Anthropic's Claude Fable 5 and open-weight releases like DiffusionGemma 26B and Kimi K2.7 Code push self-hosting, while optimized giants shift ops to hardware.

Jun 16, 2026·3mclaude-fable-5kimi-k2-7-code

AI & LLMs

Kimi K2.7 Code: Moonshot's Open-Weight Code Model

Moonshot released Kimi K2 Code as an open-weight, code-specialized model. Platform teams must treat models as modular, testable components, not monoliths.

Jun 14, 2026·3mopen-weight-modelscode-generation

AI & LLMs

GLM-5.1 Community Drop: SWE-Bench Pro Scores Rival Closed Frontier Models

GLM-5.1 community release posts SWE-Bench Pro results rivaling closed frontier models. Platform teams should evaluate open weights and inference stacks now.

Jun 12, 2026·4mopen-weight-modelsglm-5.1

AI & LLMs

June 2026 Model Release Analysis: Nemotron 3 Ultra 550B, Gemma 4 12B, Qwen3.7 Plus, MiniMax-M3

June 1–4, 2026 analysis: NVIDIA Nemotron 3 Ultra 550B, Google Gemma 4 12B, Alibaba Qwen3.7 Plus, MiniMax-M3 — inference tiers, costs, self-hosting tradeoffs.

Jun 10, 2026·6mnemotron-3-ultragemma-4-12b

AI & LLMs

GPT-4o mini and gpt-oss variants: weekly model, API, and tooling operational update

Operational roundup: GPT-4o mini and open-weight gpt-oss variants, inference runtime patches, quantization guidance, benchmarks, and Kubernetes rollout steps.

Jun 9, 2026·6mopenai-gpt-4ogpt-oss-120b

AI & LLMs

Claude Sonnet 4.6 Default Midtier: 1M-Token Beta Context, Agent Improvements, and Operational Guidance

Anthropic's Claude Sonnet 4.6 is now the default midtier with a 1M-token beta context. Operational guidance for inference, agents, and RAG integration.

Jun 8, 2026·6mclaude-sonnet-4-6anthropic

AI & LLMs

Claude Opus 4.7: What Platform Teams Must Track — Open Checkpoints, Agent Tooling, Inference Runtimes

Claude Opus 4.7 is a baseline; platform teams should track OSS checkpoints, lightweight agent tooling, and runtime changes now for secure multi-cloud ops.

Jun 6, 2026·6mclaude-opus-4-7inference-runtimes

AI & LLMs

Opus 4.8, Gemma 4 (12B), MiniMax M3 1M-Token: Open-Weight & Enterprise AI Update

Anthropic Opus 4.8 and Claude Mythos expansion; Google DeepMind Gemma 4 (12B Apache-2.0) on HF; MiniMax M3 with 1M-token context — operational implications.

Jun 5, 2026·6mllmsopen-weight-models

AI & LLMs

Open-model benchmarks, agent tooling, and inference-efficiency trends shaping AI engineering (Late 2025–Early 2026)

Late-2025/early-2026 trends: open-weight models target agentic coding, long-context and multimodal tasks; engineering focuses on inference efficiency, context quality, and orchestration.

Jun 2, 2026·6mai-llmsinference-efficiency

AI & LLMs

Designing Robust Multi-Provider LLM Platforms: Routing, RAG, and Inference Scaling

Design patterns for multi-provider LLM platforms: model routing, RAG-ready retrievers, replayable agents, observability, SLOs, and inference scaling strategies.

May 29, 2026·6mai-architecturellm-platforms

AI & LLMs

Inference-Time Scaling, MoE, and Open-Weight LLMs: Practical Guide (2026)

2026 roundup of open-weight LLMs (GLM-5.1, DeepSeek-V4-Pro, Kimi-K2.6, Qwen3.5-397B, Gemma-4) with practical guidance on inference scaling, MoE, and benchmarks.

May 27, 2026·6mopen-source-llmsinference-optimization

AI & LLMs

Open-weight MoE & Long-Context LLMs Powering Agentic Code Workflows (2025–26)

Open-weight MoE, long-context attention, and inference/post-training shaped 2025–26 LLM engineering for agentic code workflows and platform operations.

May 25, 2026·6mopen-llmsmixture-of-experts

1 more