Research

Latest Research on Megadose. AI news ranked, decayed, deduped.

50 recent items

  1. Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success
    ArXiv · AI/CL/LG ·
    A theoretical analysis explains why Muon can outperform Euclidean optimizers for transformer training under heavy-tailed gradients.
  2. MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
    HF Daily Papers ·
    MiniMax's M3 proof model and MaxProof test-time scaling report gold-medal-level performance on IMO and USAMO proof tasks.
  3. WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation
    HF Daily Papers ·
    WEAVER introduces a robotic manipulation world model designed for high-fidelity, long-horizon, efficient simulation from limited real-world interaction.
  4. MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
    HF Daily Papers ·
    MoVerse builds an interactive 3D video world from a single narrow-view image using panorama expansion and Gaussian scaffolds.
  5. Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
    ArXiv · AI/CL/LG ·
    DoorDash describes a deployed reinforcement learning system that adapts dispatch objective weights from delayed marketplace feedback.
  6. EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis
    ArXiv · AI/CL/LG ·
    EpiBench evaluates AI agents on verifiable epigenomics workflows and shows leading systems fail most attempts.
  7. Open Reproduction of DeepSeek-R1
    HN · GitHub AI ·
    An open-source effort to reproduce DeepSeek-R1 gives researchers a public path to inspect and rebuild its reasoning pipeline.
  8. New framework for auditing machine unlearning
    Google Research Blog ·
    Google Research introduced a framework for auditing machine unlearning, addressing verification of whether models forget targeted data.
  9. InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning
    HF Daily Papers ·
    InternVideo3 introduces an open multimodal foundation-model framework for long-video contextual reasoning, tool use, and agentic video understanding.
  10. Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation
    HF Daily Papers ·
    Pythagoras-Prover releases open-source Lean theorem-proving models and training data focused on lower compute budgets.
  11. DiffusionGemma: 4x Faster Text Generation
    HN · Frontpage AI ·
    DiffusionGemma introduces a diffusion-based Gemma variant claiming substantially faster text generation, drawing strong developer attention.
  12. Kwai Keye-VL-2.0 Technical Report
    HF Daily Papers ·
    Kwai released Keye-VL-2.0-30B-A3B, an open-source multimodal MoE model targeting hour-long video understanding with 256K context.
  13. i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
    HF Daily Papers ·
    i1 releases a fully open text-to-image model recipe with weights, data, code, and large-scale training ablations.
  14. Anthropic researchers say Mythos Preview can now turn publicly disclosed software vulnerabilities, or N-days, into working exploits in hours instead of weeks (Sam Sabin/Axios)
    Techmeme ·
    Anthropic’s Mythos Preview can reportedly convert disclosed vulnerabilities into working exploits within hours, advancing AI-assisted cyber offense capabilities.
  15. Anthropic details its progress toward recursive self-improvement, and its implications, and says 80%+ of the code merged into its codebase is authored by Claude (Anthropic)
    Techmeme ·
    Anthropic published details on Claude-driven code generation inside its own development workflow and its recursive self-improvement implications.
  16. Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models
    HF Daily Papers ·
    Ultralytics introduces YOLO26, a real-time vision model family with NMS-free inference, lighter heads, and broader deployment improvements.
  17. Cosmos 3: Omnimodal World Models for Physical AI
    HF Daily Papers ·
    Cosmos 3 introduces omnimodal world models for physical AI, unifying language, image, video, audio, and action generation.
  18. GPIC: A Giant Permissive Image Corpus for Visual Generation
    ArXiv · AI/CL/LG ·
    GPIC releases a 100 million-image permissive corpus with benchmarks and baselines for visual generative modeling.
  19. Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
    ArXiv · AI/CL/LG ·
    Qwen-VLA extends Qwen into a unified vision-language-action foundation model for robotics tasks across embodiments and environments.
  20. Biohub, the Mark Zuckerberg and Priscilla Chan-funded institute, releases a protein-structure prediction model and more, calling it "a world model" of proteins (Ina Fried/Axios)
    Techmeme ·
    Biohub released a protein-structure prediction model positioned as a broader world model for proteins.
  21. The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
    HF Daily Papers ·
    MiniMax introduced M2, a 230B-parameter MoE model series with sparse activation and agent-focused training infrastructure.
  22. Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini
    HF Daily Papers ·
    Google introduces Gemini Embedding 2, a native multimodal embedding model reporting strong results across text, image, audio, and video retrieval.
  23. LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
    HF Daily Papers ·
    LLaVA-OneVision-2 advances open multimodal modeling with adaptive video tokenization for stronger long-video understanding across benchmarks.
  24. Unified Neural Scaling Laws
    HF Daily Papers ·
    Unified Neural Scaling Laws proposes a single functional form for extrapolating model performance across compute, data, parameters, and inference steps.
  25. Looped Diffusion Language Models
    ArXiv · AI/CL/LG ·
    LoopMDM improves masked diffusion language models by looping transformer layers, cutting training compute while enabling inference-time compute scaling.
  26. QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
    HF Daily Papers ·
    QUEST releases open 2B-35B deep research agent models trained on synthetic long-horizon search and report-generation tasks.
  27. Advancing Mathematics Research with AI-Driven Formal Proof Search
    ArXiv · AI/CL/LG ·
    LLM agent autonomously solved 9 of 353 open Erdős problems and proved 44/492 OEIS conjectures using Lean formal verification, deployed in real math research.
  28. OpenAI says an internal general-purpose reasoning model has disproved the Erdős unit distance conjecture, a central problem in discrete geometry posed in 1946 (OpenAI)
    Techmeme ·
    OpenAI's unreleased reasoning model generated a proof disproving the 1946 Erdős unit distance conjecture in discrete geometry, marking a notable capability demonstration in mathematical research.
  29. An OpenAI model has disproved a central conjecture in discrete geometry
    OpenAI Blog ·
    OpenAI's AI model disproved thewishful thinking conjecture in discrete geometry, solving an 80-year-old unit distance problem and marking a milestone for AI-driven mathematical research.
  30. OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]
    r/MachineLearning ·
    OpenAI published a proof that one of its general-purpose reasoning models found a construction disproving the conjectured n^{1+O(1/log log n)} upper bound in Erdős's planar unit-distance problem, accompanied by a full proof PDF.
  31. Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
    HF Daily Papers ·
    Paper proves DPO's theoretical equivalence to RLHF is conditional on the RLHF-optimal policy preferring human-preferred responses—a frequently violated assumption causing pathological convergence.
  32. HRM-Text: Efficient Pretraining Beyond Scaling
    HF Daily Papers ·
    HRM-Text replaces standard Transformers with a Hierarchical Recurrent Model using slow/fast layers, MagicNorm, and deep credit assignment for efficient pretraining.
  33. SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
    HF Daily Papers ·
    SpecBench identifies reward hacking in long-horizon coding agents by decomposing tasks into specs, visible tests, and held-out composition tests that reveal true capability.
  34. Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
    ArXiv · AI/CL/LG ·
    Equilibrium Reasoners (EqR) formalize test-time compute scaling as learning task-conditioned attractors in latent dynamical systems, enabling generalization via iterative updates without external verifiers.
  35. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
    ArXiv · AI/CL/LG ·
    Agent JIT compilation compiles task descriptions into executable code with embedded LLM and tool calls, reducing latency and errors in computer-use agents via validated multi-plan generation.
  36. OpenAI claims it solved an 80-year-old math problem — for real this time
    TechCrunch AI ·
    OpenAI says a reasoning model disproved a 1946 geometry conjecture, with outside mathematicians validating the result.
  37. SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining
    ArXiv · AI/CL/LG ·
    SpectralEarth-FM is a new hierarchical transformer foundation model for multisensor earth observation that jointly processes hyperspectral imagery with multispectral and SAR data, enabling unified EO pretraining across heterogeneous spectral dimensionality.
  38. Fast-tracking genetic leads to reverse cellular aging
    Google DeepMind ·
    Google DeepMind reported Co-Scientist helped identify novel factors that rejuvenate human cells, extending AI-assisted biological discovery.
  39. GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
    HF Daily Papers ·
    GoLongRL releases 23K RLVR samples and a complete long-context RL training pipeline across 9 task types, with a taxonomy of long-context capabilities guiding data construction.
  40. OpenComputer: Verifiable Software Worlds for Computer-Use Agents
    HF Daily Papers ·
    OpenComputer provides a verifier-grounded framework for computer-use agents with 33 desktop apps and 1,000 machine-checkable tasks, including self-evolving verification and auditable partial-credit rewards.
  41. optimize_anything: A Universal API for Optimizing any Text Parameter
    HF Daily Papers ·
    A unified LLM-based optimization system achieves SOTA across six diverse tasks, discovering agent architectures that triple ARC-AGI accuracy, cutting cloud costs 40%, and generating competitive CUDA kernels.
  42. Toto 2.0: Time Series Forecasting Enters the Scaling Era
    HF Daily Papers ·
    Toto 2.0 releases five open-weights time series forecasting models (4M–2.5B params) demonstrating scaling laws and setting SOTA on BOOM, GIFT-Eval, and TIME benchmarks under Apache 2.0.
  43. Lance: Unified Multimodal Modeling by Multi-Task Synergy
    HF Daily Papers ·
    Lance is a native unified multimodal model with dual-stream MoE trained from scratch, supporting joint understanding and generation of images and video.
  44. EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
    HF Daily Papers ·
    EnvFactory automates executable environment synthesis and robust RL training for tool-use agents, generating realistic multi-turn interaction data without costly real-world APIs.
  45. GIM: Evaluating models via tasks that integrate multiple cognitive domains
    ArXiv · AI/CL/LG ·
    GIM benchmark tests grounded integration of cognitive operations across 820 problems, separating reasoning capability from knowledge demands or abstract puzzles.
  46. Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees
    ArXiv · AI/CL/LG ·
    Distilling TabICLv2 into XGBoost achieves 96.5% of teacher AUC at 1.9ms on CPU, a 38-860x speedup via stratified OOF labeling.
  47. Language-Switching Triggers Take a Latent Detour Through Language Models
    ArXiv · AI/CL/LG ·
    Mechanistic circuit analysis reveals a three-phase backdoor: trigger composition, orthogonal subspace propagation, and MLP-based language conversion in an 8B model.
  48. Post-Trained MoE Can Skip Half Experts via Self-Distillation
    ArXiv · AI/CL/LG ·
    ZEDA converts post-trained static MoE to dynamic MoE via zero-output expert injection and self-distillation, enabling half of experts to skip.
  49. Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)
    ArXiv · AI/CL/LG ·
    Aligned training is a parameter-free SAE reparameterization that eliminates dead features and significantly improves stability across random seeds.
  50. CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
    ArXiv · AI/CL/LG ·
    CrossView Suite provides 450K cross-view instruction data, a comprehensive benchmark, and explicit alignment mechanism for MLLM spatial reasoning across viewpoints.