Last updated: 2025-04-16 04:10:37. Maintained by Weisen Jiang.

citation publish date title (pdf) review authors
581 2024-08-01 SAM 2: Segment Anything in Images and Videos link Nikhila Ravi, Valentin Gabeur,..., Christoph Feichtenhofer
395 2024-04-30 KAN: Kolmogorov–Arnold Networks link Ziming Liu, Yixuan Wang,..., Max Tegmark
388 2023-08-18 WizardMath: Empowering Mathematical Reasoning for Large Language Models via
Reinforced Evol-Instruct
link Haipeng Luo, Qingfeng Sun,..., Dongmei Zhang
380 2024-08-06 Scaling Test-Time Compute Optimally Can be More Effective than
Scaling LLM Parameters
link Charlie Victor Snell, Jaehoon Lee,..., Aviral Kumar
338 2024-08-12 CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer link Zhuoyi Yang, Jiayan Teng,..., Jie Tang
297 2023-11-28 Scalable Extraction of Training Data from Aligned, Production Language
Models
link Milad Nasr, Javier Rando,..., Katherine Lee
216 2024-03-12 LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language
Models for Code
link Naman Jain, King Han,..., Ion Stoica
206 2022-04-25 Trusted Multi-View Classification via Evolutionary Multi-View Fusion link Xinyan Liang, Pinhan Fu,..., Guoqing Liu
169 2024-07-10 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal
Models
link Feng Li, Renrui Zhang,..., Chunyuan Li
144 2024-04-02 Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks link Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
135 2024-08-22 Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation
link Jinheng Xie, Weijia Mao,..., Mike Zheng Shou
125 2024-08-20 Transfusion: Predict the Next Token and Diffuse Images with
One Multi-Modal Model
link Chunting Zhou, LILI YU,..., Omer Levy
115 2024-05-27 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding
Models
link Chankyu Lee, Rajarshi Roy,..., Wei Ping
113 2024-08-27 Generative Verifiers: Reward Modeling as Next-Token Prediction link Lunjun Zhang, Arian Hosseini,..., Rishabh Agarwal
109 2024-06-22 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and
Complex Instructions
link Terry Yue Zhuo, Vu Minh Chien,..., Leandro Von Werra
103 2024-04-02 CameraCtrl: Enabling Camera Control for Text-to-Video Generation link Hao He, Yinghao Xu,..., Ceyuan Yang
102 2024-05-01 Self-Play Preference Optimization for Language Model Alignment link Yue Wu, Zhiqing Sun,..., Quanquan Gu
101 2024-03-28 Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs
in Language Models
link Samuel Marks, Can Rager,..., Aaron Mueller
99 2024-10-07 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large
Language Models
link Seyed Iman Mirzadeh, Keivan Alizadeh,..., Mehrdad Farajtabar
96 2023-10-26 JudgeLM: Fine-tuned Large Language Models are Scalable Judges link Lianghui Zhu, Xinggang Wang, Xinlong Wang
95 2024-02-15 Generative Representational Instruction Tuning link Niklas Muennighoff, Hongjin SU,..., Douwe Kiela
94 2024-06-12 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned
LLMs with Nothing
link Zhangchen Xu, Fengqing Jiang,..., Bill Yuchen Lin
94 2024-08-01 Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference
for LLM Problem-Solving
link Yangzhen Wu, Zhiqing Sun,..., Yiming Yang
91 2024-06-06 Scaling and evaluating sparse autoencoders link Leo Gao, Tom Dupre la Tour,..., Jeffrey Wu
90 2023-12-18 G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model link Jiahui Gao, Renjie Pi,..., Lingpeng Kong
84 2024-04-02 Advancing LLM Reasoning Generalists with Preference Trees link Lifan Yuan, Ganqu Cui,..., Maosong Sun
83 2024-06-07 Mixture-of-Agents Enhances Large Language Model Capabilities link Junlin Wang, Jue WANG,..., James Zou
82 2024-08-09 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language
Models
link Jiabo Ye, Haiyang Xu,..., Jingren Zhou
80 2024-09-06 Can LLMs Generate Novel Research Ideas? A Large-Scale Human
Study with 100+ NLP Researchers
link Chenglei Si, Diyi Yang, Tatsunori Hashimoto
79 2024-09-19 Training Language Models to Self-Correct via Reinforcement Learning link Aviral Kumar, Vincent Zhuang,..., Aleksandra Faust
78 2024-07-23 OpenHands: An Open Platform for AI Software Developers as
Generalist Agents
link Xingyao Wang, Boxuan Li,..., Graham Neubig
76 2023-09-25 Physics of Language Models: Part 3.2, Knowledge Manipulation link Zeyuan Allen-Zhu, Yuanzhi Li
73 2024-06-27 LiveBench: A Challenging, Contamination-Free LLM Benchmark link Colin White, Samuel Dooley,..., Micah Goldblum
70 2024-09-18 To CoT or not to CoT? Chain-of-thought helps mainly
on math and symbolic reasoning
link Zayne Rea Sprague, Fangcong Yin,..., Greg Durrett
69 2024-03-26 The Unreasonable Ineffectiveness of the Deeper Layers link Andrey Gromov, Kushal Tirumala,..., Dan Roberts
67 2024-07-02 OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation link Kepan Nan, Rui Xie,..., Ying Tai
67 2024-10-02 Depth Pro: Sharp Monocular Metric Depth in Less Than
a Second
link Alexey Bochkovskiy, Amaël Delaunoy,..., Vladlen Koltun
66 2024-06-10 Safety Alignment Should be Made More Than Just a
Few Tokens Deep
link Xiangyu Qi, Ashwinee Panda,..., Peter Henderson
61 2024-05-26 SpinQuant: LLM Quantization with Learned Rotations link Zechun Liu, Changsheng Zhao,..., Tijmen Blankevoort
60 2024-08-19 LongVILA: Scaling Long-Context Visual Language Models for Long Videos link Yukang Chen, Fuzhao Xue,..., Song Han
59 2024-10-08 Pyramidal Flow Matching for Efficient Video Generative Modeling link Yang Jin, Zhicheng Sun,..., Zhouchen Lin
58 2024-06-07 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users
in the Wild
link Bill Yuchen Lin, Yuntian Deng,..., Yejin Choi
57 2021-06-07 High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders link Siddharth Ramchandran, Manuel Haussmann, Harri Lähdesmäki
56 2024-10-04 MonST3R: A Simple Approach for Estimating Geometry in the
Presence of Motion
link Junyi Zhang, Charles Herrmann,..., Ming-Hsuan Yang
56 2024-06-26 RouteLLM: Learning to Route LLMs from Preference Data link Isaac Ong, Amjad Almahairi,..., Ion Stoica
56 2024-10-10 RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation link Songming Liu, Lingxuan Wu,..., Jun Zhu
56 2024-09-06 VILA-U: a Unified Foundation Model Integrating Visual Understanding and
Generation
link Yecheng Wu, Zhuoyang Zhang,..., Yao Lu
54 2024-02-13 World Model on Million-Length Video And Language With Blockwise
RingAttention
link Hao Liu, Wilson Yan,..., Pieter Abbeel
51 2024-04-24 Retrieval Head Mechanistically Explains Long-Context Factuality link Wenhao Wu, Yizhong Wang,..., Yao Fu
49 2024-08-28 Eagle: Exploring The Design Space for Multimodal LLMs with
Mixture of Encoders
link Min Shi, Fuxiao Liu,..., Guilin Liu
49 2024-03-25 Data Mixing Laws: Optimizing Data Mixtures by Predicting Language
Modeling Performance
link Jiasheng Ye, Peiju Liu,..., Xipeng Qiu
49 2024-08-12 Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver link Zhenting Qi, Mingyuan MA,..., Mao Yang
48 2024-10-09 Representation Alignment for Generation: Training Diffusion Transformers Is Easier
Than You Think
link Sihyun Yu, Sangkyung Kwak,..., Saining Xie
48 2024-04-08 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling
Laws
link Zeyuan Allen-Zhu, Yuanzhi Li
48 2024-03-05 Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
Language Models
link Gen Luo, Yiyi Zhou,..., Rongrong Ji
47 2024-09-03 OLMoE: Open Mixture-of-Experts Language Models link Niklas Muennighoff, Luca Soldaini,..., Hannaneh Hajishirzi
47 2024-06-11 Samba: Simple Hybrid State Space Models for Efficient Unlimited
Context Language Modeling
link Liliang Ren, Yang Liu,..., Weizhu Chen
46 2024-02-12 On the self-verification limitations of large language models on
reasoning and planning tasks
link Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
46 2024-08-27 Diffusion Models Are Real-Time Game Engines link Dani Valevski, Yaniv Leviathan,..., Shlomi Fruchter
46 2024-09-19 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution link Zuyan Liu, Yuhao Dong,..., Yongming Rao
45 2024-01-25 Deconstructing Denoising Diffusion Models for Self-Supervised Learning link Xinlei Chen, Zhuang Liu,..., Kaiming He
45 2024-06-20 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal link Tinghao Xie, Xiangyu Qi,..., Prateek Mittal
43 2024-10-02 OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction
Data
link Shubham Toshniwal, Wei Du,..., Igor Gitman
43 2024-04-15 HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing link Mude Hui, Siwei Yang,..., Yuyin Zhou
42 2024-07-29 Physics of Language Models: Part 2.1, Grade-School Math and
the Hidden Reasoning Process
link Tian Ye, Zicheng Xu,..., Zeyuan Allen-Zhu
42 2024-06-13 MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding link Fei Wang, Xingyu Fu,..., Muhao Chen
40 2024-03-12 SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model
Compression
link Xin Wang, Yu Zheng,..., Mi Zhang
39 2024-03-04 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures link Yuchen Duan, Weiyun Wang,..., Wenhai Wang
38 2024-04-03 Min-K%++: Improved Baseline for Pre-Training Data Detection from Large
Language Models
link Jingyang Zhang, Jingwei Sun,..., Hai Li
38 2023-10-17 Eliciting Human Preferences with Language Models link Belinda Z. Li, Alex Tamkin,..., Jacob Andreas
38 2024-03-13 Language models scale reliably with over-training and on downstream
tasks
link Samir Yitzhak Gadre, Georgios Smyrnis,..., Ludwig Schmidt
38 2024-06-14 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers link Yiwen Chen, Tong He,..., Chi Zhang
38 2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming
Heads
link Guangxuan Xiao, Jiaming Tang,..., Song Han
37 2024-10-09 MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering link Jun Shern Chan, Neil Chowdhury,..., Lilian Weng
37 2024-10-14 SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers link Enze Xie, Junsong Chen,..., Song Han
37 2024-10-07 Navigating the Digital World as Humans Do: Universal Visual
Grounding for GUI Agents
link Boyu Gou, Ruohan Wang,..., Yu Su
37 2024-10-10 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning link Amrith Setlur, Chirag Nagpal,..., Aviral Kumar
37 2024-07-08 MUSE: Machine Unlearning Six-Way Evaluation for Language Models link Weijia Shi, Jaechan Lee,..., Chiyuan Zhang
37 2024-06-11 Scaling Large Language Model-based Multi-Agent Collaboration link Chen Qian, Zihao Xie,..., Maosong Sun
36 2024-10-03 Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge link Jiayi Ye, Yanbo Wang,..., Xiangliang Zhang
36 2024-05-23 Not All Language Model Features Are Linear link Joshua Engels, Eric J Michaud,..., Max Tegmark
36 2024-07-24 SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View
Consistency
link Yiming Xie, Chun-Han Yao,..., Varun Jampani
36 2024-07-17 VD3D: Taming Large Video Diffusion Transformers for 3D Camera
Control
link Sherwin Bahmani, Ivan Skorokhodov,..., Sergey Tulyakov
35 2024-01-07 Long Context Compression with Activation Beacon link Peitian Zhang, Zheng Liu,..., Zhicheng Dou
35 2024-05-23 AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents link Christopher Rawles, Sarah Clinckemaillie,..., Oriana Riva
33 2024-07-16 BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval link Hongjin SU, Howard Yen,..., Tao Yu
33 2024-06-11 MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance link Xierui Wang, Siming Fu,..., Hao Jiang
32 2024-06-06 Vision-LSTM: xLSTM as Generic Vision Backbone link Benedikt Alkin, Maximilian Beck,..., Johannes Brandstetter
32 2024-06-08 MotionClone: Training-Free Motion Cloning for Controllable Video Generation link Pengyang Ling, Jiazi Bu,..., Yi Jin
31 2024-05-14 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and
Control
link Aleksandar Makelov, Georg Lange, Neel Nanda
31 2024-05-27 LLM-Assisted Static Analysis for Detecting Security Vulnerabilities link Ziyang Li, Saikat Dutta, Mayur Naik
31 2024-08-15 Automated Design of Agentic Systems link Shengran Hu, Cong Lu, Jeff Clune
30 2024-10-02 HelpSteer2-Preference: Complementing Ratings with Preferences link Zhilin Wang, Alexander Bukharin,..., Yi Dong
30 2024-08-13 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs link Yushi Bai, Jiajie Zhang,..., Juanzi Li
30 2024-09-26 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction link Jing He, Haodong LI,..., Ying-Cong Chen
29 2024-07-09 Internet of Agents: Weaving a Web of Heterogeneous Agents
for Collaborative Intelligence
link Weize Chen, Ziming You,..., Maosong Sun
29 2024-08-29 OmniRe: Omni Urban Scene Reconstruction link Ziyu Chen, Jiawei Yang,..., Yue Wang
29 2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning link Haotian Zhang, Mingfei Gao,..., Yinfei Yang
29 2024-06-05 VideoPhy: Evaluating Physical Commonsense for Video Generation link Hritik Bansal, Zongyu Lin,..., Aditya Grover
29 2024-10-14 Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models link Junyu Chen, Han Cai,..., Song Han
29 2024-08-29 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
link Hritik Bansal, Arian Hosseini,..., Mehran Kazemi
29 2024-08-29 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
link Shengpeng Ji, Ziyue Jiang,..., Zhou Zhao
29 2024-03-29 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend
What You Want
link Weifeng Lin, Xinyu Wei,..., Hongsheng Li
29 2024-10-14 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer link Haotian Tang, Yecheng Wu,..., Song Han
28 2024-09-01 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer link Yuancheng Wang, Haoyue Zhan,..., Zhizheng Wu
28 2024-02-23 Repetition Improves Language Model Embeddings link Jacob Mitchell Springer, Suhas Kotha,..., Aditi Raghunathan
28 2024-06-24 Adam-mini: Use Fewer Learning Rates To Gain More link Yushun Zhang, Congliang Chen,..., Ruoyu Sun
28 2024-09-04 Building Math Agents with Multi-Turn Iterative Preference Learning link Wei Xiong, Chengshuai Shi,..., Tianqi Liu
28 2024-08-22 Real-Time Video Generation with Pyramid Attention Broadcast link Xuanlei Zhao, Xiaolong Jin,..., Yang You
28 2024-09-19 Language Models Learn to Mislead Humans via RLHF link Jiaxin Wen, Ruiqi Zhong,..., Shi Feng
28 2024-08-23 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios
that are Difficult for Humans?
link YiFan Zhang, Huanyu Zhang,..., Rong Jin
27 2024-10-24 Data Scaling Laws in Imitation Learning for Robotic Manipulation link Fanqi Lin, Yingdong Hu,..., Yang Gao
27 2024-06-24 DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation link Yuang Peng, Yuxin Cui,..., Shu-Tao Xia
27 None Scaling Laws for Downstream Task Performance in Machine Translation link Berivan Isik, Natalia Ponomareva,..., Sanmi Koyejo
26 2024-10-14 AFlow: Automating Agentic Workflow Generation link Jiayi Zhang, Jinyu Xiang,..., Chenglin Wu
26 2024-09-04 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency link Jianwen Jiang, Chao Liang,..., Yanbo Zheng
26 2024-07-01 RegMix: Data Mixture as Regression for Language Model Pre-training link Qian Liu, Xiaosen Zheng,..., Min Lin
26 2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads link Hanlin Tang, Yang Lin,..., Gongyi Wang
26 2024-07-11 Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting link Zilong Wang, Zifeng Wang,..., Tomas Pfister
26 2024-09-01 Diffusion Policy Policy Optimization link Allen Z. Ren, Justin Lidard,..., Max Simchowitz
25 2024-09-05 Planning in Natural Language Improves LLM Search for Code
Generation
link Evan Z Wang, Federico Cassano,..., Hugh Zhang
25 2023-05-31 SafeDiffuser: Safe Planning with Diffusion Probabilistic Models link Wei Xiao, Tsun-Hsuan Wang,..., Daniela Rus
25 2024-05-27 Matryoshka Multimodal Models link Mu Cai, Jianwei Yang,..., Yong Jae Lee
24 2024-10-03 AlphaEdit: Null-Space Constrained Model Editing for Language Models link Junfeng Fang, Houcheng Jiang,..., Tat-Seng Chua
24 2023-11-20 Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion
Model
link Chunming He, Chengyu Fang,..., Sina Farsiu
24 2024-10-10 Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large
Language Models
link Bofei Gao, Feifan Song,..., Baobao Chang
24 2024-06-06 Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions
of Clean Data
link Jingyang Ou, Shen Nie,..., Chongxuan Li
24 2024-06-14 Training-free Camera Control for Video Generation link Chen Hou, Zhibo Chen
24 2024-10-16 JudgeBench: A Benchmark for Evaluating LLM-Based Judges link Sijun Tan, Siyuan Zhuang,..., Ion Stoica
24 2024-06-07 Towards Semantic Equivalence of Tokenization in Multimodal LLM link Shengqiong Wu, Hao Fei,..., Shuicheng YAN
24 2024-07-16 Does Refusal Training in LLMs Generalize to the Past
Tense?
link Maksym Andriushchenko, Nicolas Flammarion
23 2024-08-15 Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks
of Language Models
link Andy K Zhang, Neil Perry,..., Percy Liang
23 2024-03-13 A Decade's Battle on Dataset Bias: Are We There
Yet?
link Zhuang Liu, Kaiming He
23 2024-05-29 Value-Incentivized Preference Optimization: A Unified Approach to Online and
Offline RLHF
link Shicong Cen, Jincheng Mei,..., Bo Dai
23 2024-06-20 Consistency Models Made Easy link Zhengyang Geng, Ashwini Pokle,..., J Zico Kolter
23 2024-04-15 Learn Your Reference Model for Real Good Alignment link Alexey Gorbatovski, Boris Shaposhnikov,..., Daniil Gavrilov
23 2024-09-10 LLaMA-Omni: Seamless Speech Interaction with Large Language Models link Qingkai Fang, Shoutao Guo,..., Yang Feng
22 2024-10-30 OS-ATLAS: Foundation Action Model for Generalist GUI Agents link Zhiyong Wu, Zhenyu Wu,..., Yu Qiao
22 2024-09-24 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
link Xiaoming Shi, Shiyu Wang,..., Ming Jin
22 2024-02-28 RNNs are not Transformers (Yet): The Key Bottleneck
on In-Context Retrieval
link Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
22 2024-06-03 Unlocking Guidance for Discrete State-Space Diffusion and Flow Models link Hunter Nisonoff, Junhao Xiong,..., Jennifer Listgarten
22 2024-06-13 Test of Time: A Benchmark for Evaluating LLMs on
Temporal Reasoning
link Bahare Fatemi, Mehran Kazemi,..., Bryan Perozzi
22 2024-07-20 Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to
Pretraining Data
link Xinyi Wang, Antonis Antoniades,..., William Yang Wang
22 2024-07-23 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences link Canyu Zhao, Mingyu Liu,..., Chunhua Shen
22 2024-06-12 CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models link Hyungjin Chung, Jeongsol Kim,..., Jong Chul Ye
21 2024-10-31 No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats
from Sparse Unposed Images
link Botao Ye, Sifei Liu,..., Marc Pollefeys
21 2024-08-29 Physics of Language Models: Part 2.2, How to Learn
From Mistakes on Grade-School Math Problems
link Tian Ye, Zicheng Xu,..., Zeyuan Allen-Zhu
21 2024-10-03 HELMET: How to Evaluate Long-context Models Effectively and Thoroughly link Howard Yen, Tianyu Gao,..., Danqi Chen
21 2024-07-11 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data
Engine
link Renrui Zhang, Xinyu Wei,..., Hongsheng Li
21 2024-06-28 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy link Xiang Li, Cristina Mata,..., Michael S Ryoo
21 2024-07-11 Is Your Model Really A Good Math Reasoner? Evaluating
Mathematical Reasoning with Checklist
link Zihao Zhou, Shudong Liu,..., Kaizhu Huang
21 2024-07-19 BOND: Aligning LLMs with Best-of-N Distillation link Pier Giuseppe Sessa, Robert Dadashi-Tazehozi,..., Olivier Bachem
21 2024-10-08 T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and
Conditional Guidance Design
link Jiachen Li, Qian Long,..., William Yang Wang
21 2024-09-04 Masked Diffusion Models are Secretly Time-Agnostic Masked Models and
Exploit Inaccurate Categorical Sampling
link Kaiwen Zheng, Yongxin Chen,..., Qinsheng Zhang
20 2024-05-27 RB-Modulation: Training-Free Personalization using Stochastic Optimal Control link Litu Rout, Yujia Chen,..., Wen-Sheng Chu
20 2024-09-13 Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with
Memoryless Stochastic Optimal Control
link Carles Domingo-Enrich, Michal Drozdzal,..., Ricky T. Q. Chen
20 2024-08-20 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation
with Speculative Decoding
link Ranajoy Sadhukhan, Jian Chen,..., Beidi Chen
20 2024-10-03 LLMs Know More Than They Show: On the Intrinsic
Representation of LLM Hallucinations
link Hadas Orgad, Michael Toker,..., Yonatan Belinkov
20 2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention
Matrix
link Yingyu Liang, Jiangxuan Long,..., Yufa Zhou
20 2024-06-11 AI Sandbagging: Language Models can Strategically Underperform on Evaluations link Teun van der Weij, Felix Hofstätter,..., Francis Rhys Ward
20 2024-05-31 Improved Techniques for Optimization-Based Jailbreaking on Large Language Models link Xiaojun Jia, Tianyu Pang,..., Min Lin
20 2024-10-14 Semantic Image Inversion and Editing using Rectified Stochastic Differential
Equations
link Litu Rout, Yujia Chen,..., Wen-Sheng Chu
19 2024-06-12 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text
link Qingyun Li, Zhe Chen,..., Jifeng Dai
19 2024-10-02 CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in
Text-to-SQL
link Mohammadreza Pourreza, Hailong Li,..., Sercan O Arik
19 2024-06-14 Bootstrapping Language Models with DPO Implicit Rewards link Changyu Chen, Zichen Liu,..., Min Lin
19 2024-10-10 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation link Jiahao Cui, Hui Li,..., Jingdong Wang
19 2024-09-26 EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation link Jiaxiang Tang, Zhaoshuo Li,..., Qinsheng Zhang
19 2024-12-09 Gated Delta Networks: Improving Mamba2 with Delta Rule link Songlin Yang, Jan Kautz, Ali Hatamizadeh
18 2024-04-15 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
link Han Lin, Jaemin Cho,..., Mohit Bansal
18 2024-10-14 Simplifying, Stabilizing and Scaling Continuous-time Consistency Models link Cheng Lu, Yang Song
18 2024-07-14 Lean-STaR: Learning to Interleave Thinking and Proving link Haohan Lin, Zhiqing Sun,..., Yiming Yang
18 2025-01-07 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with
One Vision Token
link Shaolei Zhang, Qingkai Fang,..., Yang Feng
18 2024-10-04 SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
Domains?
link John Yang, Carlos E Jimenez,..., Ofir Press
18 2024-05-30 Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference
Models
link Zachary Ankner, Cody Blakeney,..., Mansheej Paul
18 2023-12-16 Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot
Videos
link Mingfei Han, Linjie Yang,..., Heng Wang
18 2024-10-14 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents link Shi Yu, Chaoyue Tang,..., Maosong Sun
18 2024-08-06 MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for
Medicine
link Yunfei Xie, Ce Zhou,..., Yuyin Zhou
18 2024-09-17 EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR
PRIVACY LEAKAGE
link Zeyi Liao, Lingbo Mo,..., Huan Sun
18 2024-10-15 Latent Action Pretraining from Videos link Seonghyeon Ye, Joel Jang,..., Minjoon Seo
18 2024-06-04 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for
Image and Video Generation
link Tianchen Zhao, Tongcheng Fang,..., Yu Wang
18 2024-06-14 ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation link Chufan Shi, Cheng Yang,..., Yujiu Yang
18 2024-07-29 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher link Zehui Chen, Kuikun Liu,..., Feng Zhao
17 2024-07-18 Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization
via Chi-Squared Preference Optimization
link Audrey Huang, Wenhao Zhan,..., Dylan J Foster
17 2024-10-14 When Attention Sink Emerges in Language Models: An Empirical
View
link Xiangming Gu, Tianyu Pang,..., Min Lin
17 2024-10-24 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark link S Sakshi, Utkarsh Tyagi,..., Dinesh Manocha
17 2024-11-07 SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion
Models
link Muyang Li, Yujun Lin,..., Song Han
17 2024-09-24 Making Text Embedders Few-Shot Learners link Chaofan Li, Minghao Qin,..., Zheng Liu
17 2024-06-18 Dissecting Adversarial Robustness of Multimodal LM Agents link Chen Henry Wu, Rishi Rajesh Shah,..., Aditi Raghunathan
17 2024-07-12 Human-like Episodic Memory for Infinite Context LLMs link Zafeirios Fountas, Martin Benfeghoul,..., Jun Wang
17 2024-08-12 VisualAgentBench: Towards Large Multimodal Models as Visual Agents link Xiao Liu, Tianjie Zhang,..., Jie Tang
17 2024-10-28 Arithmetic Without Algorithms: Language Models Solve Math with a
Bag of Heuristics
link Yaniv Nikankin, Anja Reusch,..., Yonatan Belinkov
17 2024-06-11 Image and Video Tokenization with Binary Spherical Quantization link Yue Zhao, Yuanjun Xiong, Philipp Kraehenbuehl
17 2024-10-02 ImageFolder: Autoregressive Image Generation with Folded Tokens link Xiang Li, Kai Qiu,..., Zhe Lin
17 2024-06-27 A Sanity Check for AI-generated Image Detection link Shilin Yan, Ouxiang Li,..., Weidi Xie
17 2024-10-04 AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
link Wenhao Chai, Enxin Song,..., Christopher D Manning
17 2024-10-07 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven
Scientific Discovery
link Ziru Chen, Shijie Chen,..., Huan Sun
16 2024-10-21 RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
and Style
link Yantao Liu, Zijun Yao,..., Juanzi Li
16 2024-10-22 LVSM: A Large View Synthesis Model with Minimal 3D
Inductive Bias
link Haian Jin, Hanwen Jiang,..., Zexiang Xu
16 2024-10-03 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration link Jintao Zhang, Jia wei,..., Jianfei Chen
16 2024-06-24 Large Language Models Assume People are More Rational than
We Really are
link Ryan Liu, Jiayi Geng,..., Thomas L. Griffiths
16 2024-07-10 Deconstructing What Makes a Good Optimizer for Autoregressive Language
Models
link Rosie Zhao, Depen Morwani,..., Sham M. Kakade
16 2024-02-21 T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory
Stitching
link Zizheng Pan, Bohan Zhuang,..., Anima Anandkumar
16 2023-12-28 MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation link Zhongshen Zeng, Pengguang Chen,..., Jiaya Jia
16 2024-03-11 Can LLMs Separate Instructions From Data? And What Do
We Even Mean By That?
link Egor Zverev, Sahar Abdelnabi,..., Christoph H. Lampert
16 2024-10-07 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks link Ziyan Jiang, Rui Meng,..., Wenhu Chen
16 2024-10-08 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs
in RAG
link Bowen Jin, Jinsung Yoon,..., Sercan O Arik
16 2024-10-16 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language
Models
link Peng Xia, Kangyu Zhu,..., Huaxiu Yao
16 2024-08-04 Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models link Fushuo Huo, Wenchao Xu,..., Peilin Zhao
16 2024-08-30 Safety Layers in Aligned Large Language Models: The Key
to LLM Security
link Shen Li, Liuyi Yao,..., Yaliang Li
15 2024-07-11 AIR-BENCH 2024: A Safety Benchmark based on Regulation and
Policies Specified Risk Categories
link Yi Zeng, Yu Yang,..., Bo Li
15 2024-09-19 Scaling FP8 training to trillion-token LLMs link Maxim Fishman, Brian Chmiel,..., Daniel Soudry
15 2024-10-10 Agent S: An Open Agentic Framework that Uses Computers
Like a Human
link Saaket Agashe, Jiuzhou Han,..., Xin Eric Wang
15 2024-10-29 DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning
Robustness of Vision Language Models
link Chengke Zou, Xingang Guo,..., Huan Zhang
15 2024-03-25 Do LLM Agents Have Regret? A Case Study
in Online Learning and Games
link Chanwoo Park, Xiangyu Liu,..., Kaiqing Zhang
15 2024-06-05 Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer
Models
link Jerry Yao-Chieh Hu, Maojiang Su,..., Han Liu
15 2024-07-08 Variational Best-of-N Alignment link Afra Amini, Tim Vieira,..., Ryan Cotterell
15 2024-04-09 Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm
Design and Convergence Analysis
link Guangchen Lan, Dong-Jun Han,..., Christopher Brinton
15 2024-08-05 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models link Fanqing Meng, Chuanhao Li,..., Wenqi Shao
15 2024-10-01 AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures
in Robotic Manipulation
link Jiafei Duan, Wilbert Pumacay,..., Yijie Guo
15 2024-10-24 Ferret-UI One: Mastering Universal User Interface Understanding Across Platforms link Zhangheng LI, Keen You,..., Zhe Gan
15 2024-10-14 Depth Any Video with Scalable Synthetic Data link Honghui Yang, Di Huang,..., Tong He
14 2024-10-16 One Step Diffusion via Shortcut Models link Kevin Frans, Danijar Hafner,..., Pieter Abbeel
14 2024-10-03 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak
LLMs
link Xiaogeng Liu, Peiran Li,..., Chaowei Xiao
14 2023-11-27 Regularization by Texts for Latent Diffusion Inverse Solvers link Jeongsol Kim, Geon Yeong Park,..., Jong Chul Ye
14 2024-05-29 ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron
Pruning
link Ruchika Chavhan, Da Li, Timothy Hospedales
14 2024-09-17 Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models link Orion Weller, Benjamin Van Durme,..., Jack Hessel
14 2024-02-28 Diffusion-based Neural Network Weights Generation link Bedionita Soro, Bruno Andreis,..., Sung Ju Hwang
14 2024-07-19 ChatQA 2: Bridging the Gap to Proprietary LLMs in
Long Context and RAG Capabilities
link Peng Xu, Wei Ping,..., Bryan Catanzaro
14 2024-07-19 System 1.x: Learning to Balance Fast and Slow Planning
with Language Models
link Swarnadeep Saha, Archiki Prasad,..., Mohit Bansal
14 2024-09-09 Improving Pretraining Data Using Perplexity Correlations link Tristan Thrush, Christopher Potts, Tatsunori Hashimoto
14 2024-05-06 Language-Image Models with 3D Understanding link Jang Hyun Cho, Boris Ivanovic,..., Marco Pavone
14 2024-06-10 How efficient is LLM-generated code? A rigorous & high-standard
benchmark
link Ruizhong Qiu, Weiliang Will Zeng,..., Hanghang Tong
14 2024-06-11 Towards Realistic Data Generation for Real-World Super-Resolution link Long Peng, Wenbo Li,..., Zheng-Jun Zha
14 2024-06-06 Interpreting the Second-Order Effects of Neurons in CLIP link Yossi Gandelsman, Alexei A Efros, Jacob Steinhardt
14 2024-09-19 MMSearch: Benchmarking the Potential of Large Models as Multi-modal
Search Engines
link Dongzhi Jiang, Renrui Zhang,..., Hongsheng Li
14 2024-10-03 FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal
Large Language Models
link Zhipei Xu, Xuanyu Zhang,..., Jian Zhang
14 None Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos link Isabella Liu, Hao Su, Xiaolong Wang
14 2024-06-17 DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors link Keon Lee, Dong Won Kim,..., Jaewoong Cho
13 2024-06-30 Iterative Nash Policy Optimization: Aligning LLMs with General Preferences
via No-Regret Learning
link Yuheng Zhang, Dian Yu,..., Dong Yu
13 2024-06-17 Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI link Robert Hönig, Javier Rando,..., Florian Tramèr
13 2023-05-24 gRNAde: Geometric Deep Learning for 3D RNA inverse design link Chaitanya K. Joshi, Arian Rokkum Jamasb,..., Pietro Lio
13 2024-07-25 LoRA-Pro: Are Low-Rank Adapters Properly Optimized? link Zhengbo Wang, Jian Liang,..., Tieniu Tan
13 2024-06-19 4K4DGen: Panoramic 4D Generation at 4K Resolution link Renjie Li, Panwang Pan,..., Zhiwen Fan
13 2024-10-04 CLoSD: Closing the Loop between Simulation and Diffusion for
multi-task character control
link Guy Tevet, Sigal Raab,..., Michiel van de Panne
13 2024-11-20 Hymba: A Hybrid-head Architecture for Small Language Models link Xin Dong, Yonggan Fu,..., Pavlo Molchanov
13 2024-02-27 Follow My Instruction and Spill the Beans: Scalable Data
Extraction from Retrieval-Augmented Generation Systems
link Zhenting Qi, Hanlin Zhang,..., Himabindu Lakkaraju
13 2024-06-12 Real2Code: Reconstruct Articulated Objects via Code Generation link Zhao Mandi, Yijia Weng,..., Shuran Song
13 2024-12-10 On Evaluating the Durability of Safeguards for Open-Weight LLMs link Xiangyu Qi, Boyi Wei,..., Peter Henderson
13 2024-09-30 ACE: All-round Creator and Editor Following Instructions via Diffusion
Transformer
link Zhen Han, Zeyinzi Jiang,..., Jingren Zhou
13 2024-10-25 Not All Heads Matter: A Head-Level KV Cache Compression
Method with Integrated Retrieval and Reasoning
link Yu Fu, Zefan Cai,..., Wen Xiao
13 2024-05-08 Preble: Efficient Distributed Prompt Scheduling for LLM Serving link Vikranth Srivatsa, Zijian He,..., Yiying Zhang
13 2024-07-01 MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs link Yusu Qian, Hanrong Ye,..., Zhe Gan
13 2024-09-12 DSBench: How Far Are Data Science Agents from Becoming
Data Science Experts?
link Liqiang Jing, Zhehui Huang,..., Dong Yu
13 2024-08-20 To Code or Not To Code? Exploring Impact of
Code in Pre-training
link Viraat Aryabumi, Yixuan Su,..., Sara Hooker
13 2024-06-05 A-Bench: Are LMMs Masters at Evaluating AI-generated Images? link Zicheng Zhang, Haoning Wu,..., Guangtao Zhai
13 2024-05-28 EG4D: Explicit Generation of 4D Object without Score Distillation link Qi Sun, Zhiyang Guo,..., Houqiang Li
13 2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And
Video Generation
link Jaehong Yoon, Shoubin Yu,..., Mohit Bansal
13 2024-06-25 Point-SAM: Promptable 3D Segmentation Model for Point Clouds link Yuchen Zhou, Jiayuan Gu,..., Hao Su
13 2024-11-07 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation link Koichi Namekata, Sherwin Bahmani,..., David B. Lindell
13 2024-05-24 Diffusion Bridge Implicit Models link Kaiwen Zheng, Guande He,..., Jun Zhu
13 2024-12-18 Autoregressive Video Generation without Vector Quantization link Haoge Deng, Ting Pan,..., Xinlong Wang
13 2024-05-30 Is In-Context Learning Sufficient for Instruction Following in LLMs? link Hao Zhao, Maksym Andriushchenko,..., Nicolas Flammarion
13 2024-06-01 MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos link Qingming LIU, Yuan Liu,..., Junhui Hou
13 2024-10-03 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations link Nicholas Jiang, Anish Kachinthaya,..., Yossi Gandelsman
13 2024-10-10 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image
Synthesis
link Jinbin Bai, Tian Ye,..., Shuicheng YAN
12 2024-11-12 Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL
Workflows
link Fangyu Lei, Jixuan Chen,..., Tao Yu
12 2024-11-07 Scaling Laws for Precision link Tanishq Kumar, Zachary Ankner,..., Aditi Raghunathan
12 2024-09-03 Booster: Tackling Harmful Fine-tuning for Large Language Models via
Attenuating Harmful Perturbation
link Tiansheng Huang, Sihao Hu,..., Ling Liu
12 2024-07-08 $R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical
Reasoning
link Mintong Kang, Bo Li
12 2023-10-07 Targeted Attack Improves Protection against Unauthorized Diffusion Customization link Boyang Zheng, Chumeng Liang, Xiaoyu Wu
12 2024-06-24 Theory on Mixture-of-Experts in Continual Learning link Hongbo Li, Sen Lin,..., Ness Shroff
12 2024-05-24 DEEM: Diffusion models serve as the eyes of large
language models for image perception
link Run Luo, Yunshui Li,..., Binyuan Hui
12 2024-06-19 Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large
Language Models
link Guanting Dong, Keming Lu,..., Jingren Zhou
12 2024-07-30 ThinK: Thinner Key Cache by Query-Driven Pruning link Yuhui Xu, Zhanming Jie,..., Doyen Sahoo
12 2025-01-18 Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic
Environments
link Hongjin SU, Ruoxi Sun,..., Sercan O Arik
12 2024-11-04 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
Reinforcement Learning
link Zehan Qi, Xiao Liu,..., Yuxiao Dong
12 2024-05-27 BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments link Yusuf H Roohani, Andrew H. Lee,..., Jure Leskovec
12 2024-08-13 Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents link Kexun Zhang, Weiran Yao,..., Caiming Xiong
12 2024-10-08 Round and Round We Go! What makes Rotary Positional
Encodings useful?
link Federico Barbero, Alex Vitvitskyi,..., Petar Veličković
12 2024-03-21 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and
Modulation
link Yuning Cui, Syed Waqas Zamir,..., Fahad Shahbaz Khan
12 2024-10-03 Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and
Defenses in LLM-based Agents
link Hanrong Zhang, Jingyuan Huang,..., Yongfeng Zhang
12 2024-10-24 Why Does the Effective Context Length of LLMs Fall
Short?
link Chenxin An, Jun Zhang,..., Lingpeng Kong
12 2024-10-15 Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws link Yiding Jiang, Allan Zhou,..., J Zico Kolter
12 2024-05-28 Learning Diverse Attacks on Large Language Models for Robust
Red-Teaming and Safety Tuning
link Seanie Lee, Minsu Kim,..., Moksh Jain
12 2024-10-08 TRACE: Temporal Grounding Video LLM via Causal Event
Modeling
link Yongxin Guo, Jingyu Liu,..., Xi Chen
12 2024-10-09 MMEgo: Towards Building Egocentric Multimodal LLMs link Hanrong Ye, Haotian Zhang,..., Bowen Zhang
12 2024-07-29 Diffusion Feedback Helps CLIP See Better link Wenxuan Wang, Quan Sun,..., Xinlong Wang
12 2024-07-21 CatVTON: Concatenation Is All You Need for Virtual Try-On
with Diffusion Models
link Zheng Chong, Xiao Dong,..., Xiaodan Liang
12 2024-10-14 Animate-X: Universal Character Image Animation with Enhanced Motion Representation link Shuai Tan, Biao Gong,..., Ming Yang
12 2024-11-26 Scaling Speech-Text Pre-training with Synthetic Interleaved Data link Aohan Zeng, Zhengxiao Du,..., Jie Tang
11 2024-07-01 Turning Up the Heat: Min-p Sampling for Creative and
Coherent LLM Outputs
link Nguyen Nhat Minh, Andrew Baker,..., Ravid Shwartz-Ziv
11 2024-10-08 Restructuring Vector Quantization with the Rotation Trick link Christopher Fifty, Ronald Guenther Junkins,..., Christopher Re
11 2024-06-26 On Scaling Up 3D Gaussian Splatting Training link Hexu Zhao, Haoyang Weng,..., Saining Xie
11 2024-04-29 LLM-SR: Scientific Equation Discovery via Programming with Large Language
Models
link Parshin Shojaee, Kazem Meidani,..., Chandan K. Reddy
11 2024-03-05 Correlated Proxies: A New Definition and Improved Mitigation for
Reward Hacking
link Cassidy Laidlaw, Shivam Singhal, Anca Dragan
11 2023-12-13 CBQ: Cross-Block Quantization for Large Language Models link Xin Ding, Xiaoyu Liu,..., Yunhe Wang
11 2024-10-21 MagicPIG: LSH Sampling for Efficient LLM Generation link Zhuoming Chen, Ranajoy Sadhukhan,..., Beidi Chen
11 2024-09-06 Programming Refusal with Conditional Activation Steering link Bruce W. Lee, Inkit Padhi,..., Amit Dhurandhar
11 2024-10-17 DPLM-2: A Multimodal Diffusion Protein Language Model link Xinyou Wang, Zaixiang Zheng,..., Quanquan Gu
11 2024-10-17 Web Agents with World Models: Learning and Leveraging Environment
Dynamics in Web Navigation
link Hyungjoo Chae, Namyoung Kim,..., Jinyoung Yeo
11 None Palu: KV-Cache Compression with Low-Rank Projection link Chi-Chih Chang, Wei-Cheng Lin,..., Kai-Chiang Wu
11 2024-10-24 Scaling up Masked Diffusion Models on Text link Shen Nie, Fengqi Zhu,..., Chongxuan Li
11 2024-07-10 Towards Robust Alignment of Language Models: Distributionally Robustifying Direct
Preference Optimization
link Junkang Wu, Yuexiang Xie,..., Xiangnan He
11 2024-06-25 Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted
Phenomenon
link USVSN Sai Prashanth, Alvin Deng,..., Naomi Saphra
11 2024-11-05 Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset
and Self-adaptive Planning Agent
link Yangning Li, Yinghui Li,..., Philip S. Yu
11 2024-10-15 Improving Instruction-Following in Language Models through Activation Steering link Alessandro Stolfo, Vidhisha Balachandran,..., Besmira Nushi
11 2024-05-24 Emergence of a High-Dimensional Abstraction Phase in Language Transformers link Emily Cheng, Diego Doimo,..., Marco Baroni
11 2024-03-26 AgentStudio: A Toolkit for Building General Virtual Agents link Longtao Zheng, Zhiyuan Huang,..., Shuicheng YAN
11 2024-06-13 MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs link Xuannan Liu, Zekun Li,..., Zhaofeng He
11 2024-10-08 From Tokens to Words: On the Inner Lexicon of
LLMs
link Guy Kaplan, Matanel Oren,..., Roy Schwartz
11 2024-05-27 PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance link Haohan Weng, Yikai Wang,..., Jun Zhu
11 2024-03-10 What Matters When Repurposing Diffusion Models for General Dense
Perception Tasks?
link Guangkai Xu, Yongtao Ge,..., Chunhua Shen
11 2024-05-27 Motion-Agent: A Conversational Framework for Human Motion Generation with
LLMs
link Qi Wu, Yubo Zhao,..., Chi-Keung Tang
11 2024-10-05 Accelerating Diffusion Transformers with Token-wise Feature Caching link Chang Zou, Xuyang Liu,..., Linfeng Zhang
11 2024-06-17 Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong
Generalization
link Wenkai Yang, Shiqi Shen,..., Ji-Rong Wen
11 2024-10-04 Dynamic Diffusion Transformer link Wangbo Zhao, Yizeng Han,..., Yang You
11 2024-10-09 Rectified Diffusion: Straightness Is Not Your Need in Rectified
Flow
link Fu-Yun Wang, Ling Yang,..., Hongsheng Li
11 2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded
Tuning
link Xiangyu Zeng, Kunchang Li,..., Limin Wang
11 2024-06-27 From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities
in LLMs by Finetuning on Synthetic Data
link Zheyang Xiong, Vasilis Papageorgiou,..., Dimitris Papailiopoulos
11 None GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation link Yushi LAN, Shangchen Zhou,..., Chen Change Loy
11 2024-10-03 ControlAR: Controllable Image Generation with Autoregressive Models link Zongming Li, Tianheng Cheng,..., Xinggang Wang
11 2024-11-08 Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the
Capabilities of Spoken Language Models with 180 Tasks
link Chien-yu Huang, Wei-Chih Chen,..., Hung-yi Lee
10 2025-01-27 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World
Understanding
link Wei Chow, Jiageng Mao,..., Yue Wang
10 2024-08-15 Can Large Language Models Understand Symbolic Graphics Programs? link Zeju Qiu, Weiyang Liu,..., Bernhard Schölkopf
10 2024-07-05 Simplifying Deep Temporal Difference Learning link Matteo Gallici, Mattie Fellows,..., Mario Martin
10 2024-09-26 How Feature Learning Can Improve Neural Scaling Laws link Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
10 2024-10-17 AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web
Agents
link Ke Yang, Yao Liu,..., Huzefa Rangwala
10 2024-06-12 CS-Bench: A Comprehensive Benchmark for Large Language Models towards
Computer Science Mastery
link Xiaoshuai Song, Muxi Diao,..., Weiran Xu
10 2024-10-21 Pangea: A Fully Open Multilingual Multimodal LLM for 39
Languages
link Xiang Yue, Yueqi Song,..., Graham Neubig
10 2024-10-17 Looking Inward: Language Models Can Learn About Themselves by
Introspection
link Felix Jedidja Binder, James Chua,..., Owain Evans
10 2024-10-23 Scaling Diffusion Language Models via Adaptation from Autoregressive Models link Shansan Gong, Shivam Agarwal,..., Lingpeng Kong
10 2024-09-06 Theory, Analysis, and Best Practices for Sigmoid Self-Attention link Jason Ramapuram, Federico Danieli,..., Russell Webb
10 2024-10-11 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization link Noam Razin, Sadhika Malladi,..., Boris Hanin
10 2024-11-26 On Statistical Rates of Conditional Diffusion Transformer: Approximation and
Estimation
link Jerry Yao-Chieh Hu, Weimin Wu,..., Han Liu
10 2024-06-23 Efficient Evolutionary Search Over Chemical Space with Large Language
Models
link Haorui Wang, Marta Skreta,..., Chao Zhang
10 2024-03-26 Chain-of-Action: Faithful and Multimodal Question Answering through Large Language
Models
link Zhenyu Pan, Haozheng Luo,..., Han Liu
10 2024-06-03 Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation link Mingyuan Zhou, Zhendong Wang,..., Hai Huang
10 2024-10-18 Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning link Jiacheng Ye, Jiahui Gao,..., Lingpeng Kong
10 2024-11-04 GenXD: Generating Any 3D and 4D Scenes link Yuyang Zhao, Chung-Ching Lin,..., Lijuan Wang
10 2024-07-21 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models link Rylan Schaeffer, Dan Valentine,..., Ethan Perez
10 2024-08-22 A Percolation Model of Emergence: Analyzing Transformers Trained on
a Formal Language
link Ekdeep Singh Lubana, Kyogo Kawaguchi,..., Hidenori Tanaka
10 2024-06-11 McEval: Massively Multilingual Code Evaluation link Linzheng Chai, Shukai Liu,..., Zhoujun Li
10 2024-10-09 SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection link Han Shen, Pin-Yu Chen,..., Tianyi Chen
10 2024-12-13 TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist
Robotic Policies
link Ruijie Zheng, Yongyuan Liang,..., Jianwei Yang
9 2024-10-06 Inference Scaling for Long-Context Retrieval Augmented Generation link Zhenrui Yue, Honglei Zhuang,..., Michael Bendersky
9 2024-11-19 Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues link Riccardo Grazzi, Julien Siems,..., massimiliano pontil
9 2024-10-10 From Exploration to Mastery: Enabling LLMs to Master Tools
via Self-Driven Interactions
link Changle Qu, Sunhao Dai,..., Ji-Rong Wen
9 2024-08-21 EmbodiedSAM: Online Segment Any 3D Thing in Real Time link Xiuwei Xu, Huangxing Chen,..., Jiwen Lu
9 2024-07-25 Trust or Escalate: LLM Judges with Provable Guarantees for
Human Agreement
link Jaehun Jung, Faeze Brahman, Yejin Choi
9 2024-10-03 Better Instruction-Following Through Minimum Bayes Risk link Ian Wu, Patrick Fernandes,..., Graham Neubig
9 2024-09-05 Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based
Surface Representation
link Slava Elizarov, Ciara Rowles, Simon Donné
9 2024-03-21 Physics-Informed Diffusion Models link Jan-Hendrik Bastek, WaiChing Sun, Dennis Kochmann
9 2024-07-06 Progress or Regress? Self-Improvement Reversal in Post-training link Ting Wu, Xuefeng Li, Pengfei Liu
9 2024-11-07 The Semantic Hub Hypothesis: Language Models Share Semantic Representations
Across Languages and Modalities
link Zhaofeng Wu, Xinyan Velocity Yu,..., Yoon Kim
9 2024-12-18 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models link Yinlam Chow, Guy Tennenholtz,..., Aleksandra Faust
9 2024-06-12 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos link Xuehai He, Weixi Feng,..., Xin Eric Wang
9 2024-10-03 RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph link Siru Ouyang, Wenhao Yu,..., Dong Yu
9 2024-07-01 Eliminating Position Bias of Language Models: A Mechanistic Approach link Ziqi Wang, Hanlin Zhang,..., Heng Ji
9 2024-10-18 DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device
Control Agent
link Taiyi Wang, Zhihao Wu,..., Kun Shao
9 2024-03-22 A Transfer Attack to Image Watermarks link Yuepeng Hu, Zhengyuan Jiang,..., Neil Zhenqiang Gong
9 2024-09-30 FaithEval: Can Your Language Model Stay Faithful to Context,
Even If "The Moon is Made of Marshmallows"
link Yifei Ming, Senthil Purushwalkam,..., Shafiq Joty
9 2024-02-19 CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark link Jian Wu, Linyi Yang,..., Yue Zhang
9 2024-10-01 TestGenEval: A Real World Unit Test Generation and Test
Completion Benchmark
link Kush Jain, Gabriel Synnaeve, Baptiste Roziere
9 2024-05-23 Tighter Privacy Auditing of DP-SGD in the Hidden State
Threat Model
link Tudor Ioan Cebere, Aurélien Bellet, Nicolas Papernot
9 2024-03-13 Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion
Models Trained on Corrupted Data
link Asad Aali, Giannis Daras,..., Jon Tamir
9 2024-05-28 Hierarchical World Models as Visual Whole-Body Humanoid Controllers link Nicklas Hansen, Jyothir S V,..., Hao Su
9 2024-05-23 Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient
Fine-tuning
link Chongjie Si, Xuehui Wang,..., Wei Shen
9 2024-11-11 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision link Cong Wei, Zheyang Xiong,..., Wenhu Chen
9 2024-12-02 InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences link Chenyang Zhu, Kai Li,..., Xiu Li
9 2024-05-24 OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness
with Environments Programmed in Code
link Maxence Faldor, Jenny Zhang,..., Jeff Clune
9 2024-08-16 Visual Agents as Fast and Slow Thinkers link Guangyan Sun, Mingyu Jin,..., Dongfang Liu
9 2024-08-27 Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation link Xiaojuan Wang, Boyang Zhou,..., Steve Seitz
9 2024-10-02 Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding link Yao Teng, Han Shi,..., Xihui Liu
9 2024-06-28 ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents link Haiyang SHEN, Yue Li,..., Yun Ma
9 2024-10-09 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology link Xiangyu Wang, Donglin Yang,..., Si Liu
9 2024-05-23 Can LLMs Solve Long Math Word Problems Better? link Xin Xu, Tong Xiao,..., Yang Wang
9 2024-12-10 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video
Generation
link Xiao FU, Xian Liu,..., Dahua Lin
9 2024-09-23 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions link Weifeng Lin, Xinyu Wei,..., Hongsheng Li
8 2024-10-11 Transformers Provably Solve Parity Efficiently with Chain of Thought link Juno Kim, Taiji Suzuki
8 2024-01-18 RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything link Shilin Xu, Haobo Yuan,..., Ming-Hsuan Yang
8 2024-06-28 PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent
Collaboration
link Yuxuan Sun, Yunlong Zhang,..., Lin Yang
8 2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior link Hanyu Wang, Saksham Suri,..., Abhinav Shrivastava
8 2024-02-28 Signature Kernel Conditional Independence Tests in Causal Discovery for
Stochastic Processes
link Georg Manten, Cecilia Casolo,..., Niki Kilbertus
8 2024-10-10 Uncovering Overfitting in Large Language Model Editing link Mengqi Zhang, Xiaotian Ye,..., Zhumin Chen
8 2024-05-23 Graph Sparsification via Mixture of Graphs link Guibin Zhang, Xiangguo Sun,..., Shirui Pan
8 2024-08-27 Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models link Wenxuan Zhang, Philip Torr,..., Adel Bibi
8 2024-02-04 DeLLMa: Decision Making Under Uncertainty with Large Language Models link Ollie Liu, Deqing Fu,..., Willie Neiswanger
8 2024-10-21 Can Knowledge Editing Really Correct Hallucinations? link Baixiang Huang, Canyu Chen,..., Kai Shu
8 2025-02-05 SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models link Daniel Levy, Siba Smarak Panigrahi,..., Siamak Ravanbakhsh
8 2024-10-09 KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks link Kaijing Ma, Xeron Du,..., Ge Zhang
8 2024-10-10 What Makes Large Language Models Reason in (Multi-Turn) Code
Generation?
link Kunhao Zheng, Juliette Decugis,..., Gabriel Synnaeve
8 2024-10-13 RMB: Comprehensively benchmarking reward models in LLM alignment link Enyu Zhou, Guodong Zheng,..., Xuanjing Huang
8 2024-11-29 Scaling Transformers for Low-Bitrate High-Quality Speech Coding link Julian D Parker, Anton Smirnov,..., Xubo Liu
8 2024-10-14 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory link Di Wu, Hongwei Wang,..., Dong Yu
8 2024-10-22 Self-Evolving Multi-Agent Networks for Software Development link Yue Hu, Yuzhu Cai,..., Siheng Chen
8 2024-09-05 The AdEMAMix Optimizer: Better, Faster, Older link Matteo Pagliardini, Pierre Ablin, David Grangier