SenseTime released SenseNova-U1, open multimodal models unifying image understanding and generation using a novel architecture without visual encoders or VAEs, now available on HuggingFace.
DeepSeek, Peking University, and Tsinghua release 'Thinking with Visual Primitives,' a multimodal reasoning framework that elevates spatial tokens—coordinates and bounding boxes—into minimal units of thought, enabling models to 'point' within images during chain-of-thought reasoning.
A unified open-source framework integrates multiple hyperbolic graph embedding methods under a common optimization interface, enabling consistent training, visualization, and evaluation across methods that were previously fragmented.
Implements speculative decoding in NeMo-RL with vLLM backend to accelerate RL post-training rollouts for frontier language models, supporting both synchronous and asynchronous pipelines.
Microsoft released VibeVoice, a Whisper-style open-source speech-to-text model with built-in speaker diarization, MIT licensed and available via MLX for local Mac inference.
OpenAI releases Symphony, an open-source agent orchestration spec enabling tools like Linear to serve as control planes for coding agents, standardizing multi-agent coordination.
Microsoft released TRELLIS.2, a 4B-parameter open-source image-to-3D model with native 3D VAEs achieving 16× spatial compression to generate high-fidelity PBR-textured assets up to 1536³ resolution.
Xiaomi open-sources MiMo-V2.5 and V2.5-Pro under MIT license, claiming both are among the most efficient models for agentic 'claw' (UI automation) tasks, positioning them as cost-effective alternatives for automation workloads.
OpenAI releases Symphony, an open-source spec for orchestrating Codex agents that integrates with issue trackers to function as always-on agent systems.
Diffusion Templates is a unified open plugin framework decoupling base-model inference from controllable capability injection, enabling reusable infrastructure across diffusion backbones.
SpecValidator is a parameter-efficient fine-tuned classifier detecting three types of defective task descriptions (lexical vagueness, under-specification, syntax-formatting) in code generation prompts, outperforming GPT-5-mini.
DeepSeek releases V4, its new generation open-source AI models with enhanced reasoning and coding capabilities, the first major release since January's R1 sensation.
TexOCR is a 2B-parameter model for reconstructing scientific PDFs into compilable LaTeX, trained on a new benchmark (TexOCR-Train) and evaluated via RL with LaTeX unit tests enforcing compilability.
An open-source memory layer lets AI agents retain conversation context and user preferences across sessions, matching Claude.ai and ChatGPT memory capabilities.
Qwen3.6-27B is a new 27B dense open-weight model from Alibaba that claims flagship-level coding performance competitive with its own 397B MoE predecessor, now available on Hugging Face in full and quantized formats.
Auto-ART provides structured synthesis of adversarial robustness research (2020–2026) plus an open-source framework with 50+ attacks, 28 defenses, and compliance mapping to EU AI Act, NIST AI RMF, and OWASP LLM Top 10.
VLA Foundry releases an open-source unified training framework for LLM, VLM, and VLA models from scratch to action fine-tuning, with trained models on LBM Eval.
SIREN identifies safety neurons via linear probing across internal layers and combines them with adaptive weighting, outperforming open-source guard models using 250x fewer parameters.
CCCL is an in-GPU compression-coupled collective communication library that achieves up to 3x NVLink bandwidth by fusing compression kernels directly into NCCL without user-side code changes.
Cloudflare open-sourced Unweight, a lossless compression system for LLMs that achieves 15-22% model size reduction and saves ~3GB VRAM on Llama-3.1-8B/H100, with GPU kernels on GitHub and a technical paper.
Alibaba released Qwen3.6-35B-A3B, an open-weight MoE model with 35B total/3B active parameters that claims to match larger dense models on agentic coding tasks.
ChemGraph-XANES is an agentic LangGraph/LangChain framework automating XANES spectroscopy workflows from natural language task specification through FDMNES execution to curated data outputs.
OpenMobile releases an open-source framework for synthesizing mobile agent task instructions and trajectories, including a scalable pipeline using global environment memory and a policy-switching rollout strategy.
TREX automates full LLM fine-tuning via multi-agent collaboration (Researcher + Executor) modeled as a search tree, covering literature research through training and evaluation.
Researchers introduce dual-trace memory encoding for LLM agents, pairing facts with narrative scene traces, achieving 73.7% vs 53.5% accuracy on LongMemEval benchmark.
ClawGUI is an open-source framework providing unified infrastructure for training, evaluating, and deploying GUI agents with validated RL training support and consistent evaluation protocols.
SkVM proposes treating LLM agent skills as compilable code, analyzing 118K skills to build capability profiles that enable portable, consistent execution across different model-harness pairs.