Last updated: 2025-05-19 23:34:32. Maintained by Weisen Jiang.

citation publish date title (pdf) review authors
720 2024-08-01 SAM 2: Segment Anything in Images and Videos link Nikhila Ravi, Valentin Gabeur,..., Christoph Feichtenhofer
478 2024-08-06 Scaling Test-Time Compute Optimally Can be More Effective than
Scaling LLM Parameters
link Charlie Victor Snell, Jaehoon Lee,..., Aviral Kumar
473 2024-04-30 KAN: Kolmogorov–Arnold Networks link Ziming Liu, Yixuan Wang,..., Max Tegmark
408 2023-08-18 WizardMath: Empowering Mathematical Reasoning for Large Language Models via
Reinforced Evol-Instruct
link Haipeng Luo, Qingfeng Sun,..., Dongmei Zhang
395 2024-08-12 CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer link Zhuoyi Yang, Jiayan Teng,..., Jie Tang
320 2023-11-28 Scalable Extraction of Training Data from Aligned, Production Language
Models
link Milad Nasr, Javier Rando,..., Katherine Lee
273 2024-03-12 LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language
Models for Code
link Naman Jain, King Han,..., Ion Stoica
217 2022-04-25 Trusted Multi-View Classification via Evolutionary Multi-View Fusion link Xinyan Liang, Pinhan Fu,..., Guoqing Liu
192 2024-07-10 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal
Models
link Feng Li, Renrui Zhang,..., Chunyuan Li
160 2024-08-22 Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation
link Jinheng Xie, Weijia Mao,..., Mike Zheng Shou
159 2024-04-02 Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks link Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
147 2024-08-20 Transfusion: Predict the Next Token and Diffuse Images with
One Multi-Modal Model
link Chunting Zhou, LILI YU,..., Omer Levy
139 2024-05-27 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding
Models
link Chankyu Lee, Rajarshi Roy,..., Wei Ping
134 2024-08-27 Generative Verifiers: Reward Modeling as Next-Token Prediction link Lunjun Zhang, Arian Hosseini,..., Rishabh Agarwal
131 2024-06-22 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and
Complex Instructions
link Terry Yue Zhuo, Vu Minh Chien,..., Leandro Von Werra
130 2024-10-07 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large
Language Models
link Seyed Iman Mirzadeh, Keivan Alizadeh,..., Mehrdad Farajtabar
129 2024-08-01 Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference
for LLM Problem-Solving
link Yangzhen Wu, Zhiqing Sun,..., Yiming Yang
118 2024-04-02 CameraCtrl: Enabling Camera Control for Text-to-Video Generation link Hao He, Yinghao Xu,..., Ceyuan Yang
117 2024-06-12 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned
LLMs with Nothing
link Zhangchen Xu, Fengqing Jiang,..., Bill Yuchen Lin
113 2024-06-06 Scaling and evaluating sparse autoencoders link Leo Gao, Tom Dupre la Tour,..., Jeffrey Wu
113 2024-05-01 Self-Play Preference Optimization for Language Model Alignment link Yue Wu, Zhiqing Sun,..., Quanquan Gu
112 2024-09-19 Training Language Models to Self-Correct via Reinforcement Learning link Aviral Kumar, Vincent Zhuang,..., Aleksandra Faust
112 2024-03-28 Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs
in Language Models
link Samuel Marks, Can Rager,..., Aaron Mueller
108 2023-10-26 JudgeLM: Fine-tuned Large Language Models are Scalable Judges link Lianghui Zhu, Xinggang Wang, Xinlong Wang
108 2024-09-06 Can LLMs Generate Novel Research Ideas? A Large-Scale Human
Study with 100+ NLP Researchers
link Chenglei Si, Diyi Yang, Tatsunori Hashimoto
104 2024-02-15 Generative Representational Instruction Tuning link Niklas Muennighoff, Hongjin SU,..., Douwe Kiela
99 2024-06-07 Mixture-of-Agents Enhances Large Language Model Capabilities link Junlin Wang, Jue WANG,..., James Zou
96 2024-04-02 Advancing LLM Reasoning Generalists with Preference Trees link Lifan Yuan, Ganqu Cui,..., Maosong Sun
96 2023-12-18 G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model link Jiahui Gao, Renjie Pi,..., Lingpeng Kong
94 2024-08-09 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language
Models
link Jiabo Ye, Haiyang Xu,..., Jingren Zhou
93 None LiveBench: A Challenging, Contamination-Free LLM Benchmark link Colin White, Samuel Dooley,..., Micah Goldblum
91 2024-10-02 Depth Pro: Sharp Monocular Metric Depth in Less Than
a Second
link Alexey Bochkovskiy, Amaël Delaunoy,..., Vladlen Koltun
89 2024-07-23 OpenHands: An Open Platform for AI Software Developers as
Generalist Agents
link Xingyao Wang, Boxuan Li,..., Graham Neubig
86 2024-09-18 To CoT or not to CoT? Chain-of-thought helps mainly
on math and symbolic reasoning
link Zayne Rea Sprague, Fangcong Yin,..., Greg Durrett
84 2023-09-25 Physics of Language Models: Part 3.2, Knowledge Manipulation link Zeyuan Allen-Zhu, Yuanzhi Li
84 2024-05-26 SpinQuant: LLM Quantization with Learned Rotations link Zechun Liu, Changsheng Zhao,..., Tijmen Blankevoort
79 2024-03-26 The Unreasonable Ineffectiveness of the Deeper Layers link Andrey Gromov, Kushal Tirumala,..., Dan Roberts
77 2024-10-10 RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation link Songming Liu, Lingxuan Wu,..., Jun Zhu
72 2024-06-10 Safety Alignment Should be Made More Than Just a
Few Tokens Deep
link Xiangyu Qi, Ashwinee Panda,..., Peter Henderson
72 2024-06-26 RouteLLM: Learning to Route LLMs from Preference Data link Isaac Ong, Amjad Almahairi,..., Ion Stoica
71 2024-08-19 LongVILA: Scaling Long-Context Visual Language Models for Long Videos link Yukang Chen, Fuzhao Xue,..., Song Han
70 2024-10-04 MonST3R: A Simple Approach for Estimating Geometry in the
Presence of Motion
link Junyi Zhang, Charles Herrmann,..., Ming-Hsuan Yang
68 2024-06-07 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users
in the Wild
link Bill Yuchen Lin, Yuntian Deng,..., Yejin Choi
68 2024-07-02 OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation link Kepan Nan, Rui Xie,..., Ying Tai
66 2024-09-06 VILA-U: a Unified Foundation Model Integrating Visual Understanding and
Generation
link Yecheng Wu, Zhuoyang Zhang,..., Yao Lu
65 2024-10-08 Pyramidal Flow Matching for Efficient Video Generative Modeling link Yang Jin, Zhicheng Sun,..., Zhouchen Lin
64 2024-10-09 Representation Alignment for Generation: Training Diffusion Transformers Is Easier
Than You Think
link Sihyun Yu, Sangkyung Kwak,..., Saining Xie
62 2024-03-25 Data Mixing Laws: Optimizing Data Mixtures by Predicting Language
Modeling Performance
link Jiasheng Ye, Peiju Liu,..., Xipeng Qiu
61 2024-04-24 Retrieval Head Mechanistically Explains Long-Context Factuality link Wenhao Wu, Yizhong Wang,..., Yao Fu
61 2024-08-12 Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver link Zhenting Qi, Mingyuan MA,..., Mao Yang
59 2024-02-13 World Model on Million-Length Video And Language With Blockwise
RingAttention
link Hao Liu, Wilson Yan,..., Pieter Abbeel
59 2021-06-07 High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders link Siddharth Ramchandran, Manuel Haussmann, Harri Lähdesmäki
59 2024-08-27 Diffusion Models Are Real-Time Game Engines link Dani Valevski, Yaniv Leviathan,..., Shlomi Fruchter
56 2024-06-11 Samba: Simple Hybrid State Space Models for Efficient Unlimited
Context Language Modeling
link Liliang Ren, Yang Liu,..., Weizhu Chen
54 2024-09-03 OLMoE: Open Mixture-of-Experts Language Models link Niklas Muennighoff, Luca Soldaini,..., Hannaneh Hajishirzi
54 2024-04-08 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling
Laws
link Zeyuan Allen-Zhu, Yuanzhi Li
54 2024-01-25 Deconstructing Denoising Diffusion Models for Self-Supervised Learning link Xinlei Chen, Zhuang Liu,..., Kaiming He
54 2024-09-19 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution link Zuyan Liu, Yuhao Dong,..., Yongming Rao
54 2024-03-05 Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
Language Models
link Gen Luo, Yiyi Zhou,..., Rongrong Ji
54 2024-04-15 HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing link Mude Hui, Siwei Yang,..., Yuyin Zhou
53 2024-08-28 Eagle: Exploring The Design Space for Multimodal LLMs with
Mixture of Encoders
link Min Shi, Fuxiao Liu,..., Guilin Liu
51 2024-06-20 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal link Tinghao Xie, Xiangyu Qi,..., Prateek Mittal
49 2024-10-14 SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers link Enze Xie, Junsong Chen,..., Song Han
49 2024-10-07 Navigating the Digital World as Humans Do: Universal Visual
Grounding for GUI Agents
link Boyu Gou, Ruohan Wang,..., Yu Su
49 2024-10-02 OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction
Data
link Shubham Toshniwal, Wei Du,..., Igor Gitman
49 2024-02-12 On the self-verification limitations of large language models on
reasoning and planning tasks
link Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
48 2024-10-09 MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering link Jun Shern Chan, Neil Chowdhury,..., Lilian Weng
48 2024-06-14 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers link Yiwen Chen, Tong He,..., Chi Zhang
47 2024-06-13 MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding link Fei Wang, Xingyu Fu,..., Muhao Chen
45 2024-07-08 MUSE: Machine Unlearning Six-Way Evaluation for Language Models link Weijia Shi, Jaechan Lee,..., Chiyuan Zhang
45 2024-10-03 Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge link Jiayi Ye, Yanbo Wang,..., Xiangliang Zhang
44 2024-03-04 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures link Yuchen Duan, Weiyun Wang,..., Wenhai Wang
44 2024-07-29 Physics of Language Models: Part 2.1, Grade-School Math and
the Hidden Reasoning Process
link Tian Ye, Zicheng Xu,..., Zeyuan Allen-Zhu
44 2024-05-23 AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents link Christopher Rawles, Sarah Clinckemaillie,..., Oriana Riva
44 2024-05-27 LLM-Assisted Static Analysis for Detecting Security Vulnerabilities link Ziyang Li, Saikat Dutta, Mayur Naik
44 2024-06-11 Scaling Large Language Model-based Multi-Agent Collaboration link Chen Qian, Zihao Xie,..., Maosong Sun
43 2023-10-17 Eliciting Human Preferences with Language Models link Belinda Z. Li, Alex Tamkin,..., Jacob Andreas
43 2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming
Heads
link Guangxuan Xiao, Jiaming Tang,..., Song Han
43 2024-06-06 Vision-LSTM: xLSTM as Generic Vision Backbone link Benedikt Alkin, Maximilian Beck,..., Johannes Brandstetter
43 2024-03-12 SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model
Compression
link Xin Wang, Yu Zheng,..., Mi Zhang
42 2024-10-10 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning link Amrith Setlur, Chirag Nagpal,..., Aviral Kumar
42 2024-07-17 VD3D: Taming Large Video Diffusion Transformers for 3D Camera
Control
link Sherwin Bahmani, Ivan Skorokhodov,..., Sergey Tulyakov
41 2024-04-03 Min-K%++: Improved Baseline for Pre-Training Data Detection from Large
Language Models
link Jingyang Zhang, Jingwei Sun,..., Hai Li
41 2024-10-02 HelpSteer2-Preference: Complementing Ratings with Preferences link Zhilin Wang, Alexander Bukharin,..., Yi Dong
41 2024-06-11 MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance link Xierui Wang, Siming Fu,..., Hao Jiang
41 2024-08-15 Automated Design of Agentic Systems link Shengran Hu, Cong Lu, Jeff Clune
40 2024-07-01 RegMix: Data Mixture as Regression for Language Model Pre-training link Qian Liu, Xiaosen Zheng,..., Min Lin
40 2024-03-13 Language models scale reliably with over-training and on downstream
tasks
link Samir Yitzhak Gadre, Georgios Smyrnis,..., Ludwig Schmidt
40 2024-09-26 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction link Jing He, Haodong LI,..., Ying-Cong Chen
39 2024-07-24 SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View
Consistency
link Yiming Xie, Chun-Han Yao,..., Varun Jampani
39 2024-08-13 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs link Yushi Bai, Jiajie Zhang,..., Juanzi Li
38 2024-09-01 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer link Yuancheng Wang, Haoyue Zhan,..., Zhizheng Wu
38 2024-10-16 JudgeBench: A Benchmark for Evaluating LLM-Based Judges link Sijun Tan, Siyuan Zhuang,..., Ion Stoica
37 2024-10-10 Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large
Language Models
link Bofei Gao, Feifan Song,..., Baobao Chang
37 2024-08-29 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
link Hritik Bansal, Arian Hosseini,..., Mehran Kazemi
36 2024-10-24 Data Scaling Laws in Imitation Learning for Robotic Manipulation link Fanqi Lin, Yingdong Hu,..., Yang Gao
36 2024-07-09 Internet of Agents: Weaving a Web of Heterogeneous Agents
for Collaborative Intelligence
link Weize Chen, Ziming You,..., Maosong Sun
36 2024-07-16 BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval link Hongjin SU, Howard Yen,..., Tao Yu
36 2024-06-05 VideoPhy: Evaluating Physical Commonsense for Video Generation link Hritik Bansal, Zongyu Lin,..., Aditya Grover
36 2024-06-24 Adam-mini: Use Fewer Learning Rates To Gain More link Yushun Zhang, Congliang Chen,..., Ruoyu Sun
36 2024-10-14 Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models link Junyu Chen, Han Cai,..., Song Han
36 2024-05-23 Not All Language Model Features Are Linear link Joshua Engels, Eric J Michaud,..., Max Tegmark
36 2024-09-01 Diffusion Policy Policy Optimization link Allen Z. Ren, Justin Lidard,..., Max Simchowitz
36 2024-08-23 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios
that are Difficult for Humans?
link YiFan Zhang, Huanyu Zhang,..., Rong Jin
34 2024-09-19 Language Models Learn to Mislead Humans via RLHF link Jiaxin Wen, Ruiqi Zhong,..., Shi Feng
34 2024-06-08 MotionClone: Training-Free Motion Cloning for Controllable Video Generation link Pengyang Ling, Jiazi Bu,..., Yi Jin
33 2024-10-03 AlphaEdit: Null-Space Constrained Model Editing for Language Models link Junfeng Fang, Houcheng Jiang,..., Tat-Seng Chua
33 2024-08-29 OmniRe: Omni Urban Scene Reconstruction link Ziyu Chen, Jiawei Yang,..., Yue Wang
33 2024-01-07 Long Context Compression with Activation Beacon link Peitian Zhang, Zheng Liu,..., Zhicheng Dou
33 2024-05-14 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and
Control
link Aleksandar Makelov, Georg Lange, Neel Nanda
33 2024-08-29 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
link Shengpeng Ji, Ziyue Jiang,..., Zhou Zhao
33 2024-03-29 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend
What You Want
link Weifeng Lin, Xinyu Wei,..., Hongsheng Li
33 2024-10-14 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer link Haotian Tang, Yecheng Wu,..., Song Han
32 2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning link Haotian Zhang, Mingfei Gao,..., Yinfei Yang
32 2024-09-04 Building Math Agents with Multi-Turn Iterative Preference Learning link Wei Xiong, Chengshuai Shi,..., Tianqi Liu
32 2024-07-11 Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting link Zilong Wang, Zifeng Wang,..., Tomas Pfister
32 2024-08-22 Real-Time Video Generation with Pyramid Attention Broadcast link Xuanlei Zhao, Xiaolong Jin,..., Yang You
32 2024-06-07 Towards Semantic Equivalence of Tokenization in Multimodal LLM link Shengqiong Wu, Hao Fei,..., Shuicheng YAN
31 2024-10-14 AFlow: Automating Agentic Workflow Generation link Jiayi Zhang, Jinyu Xiang,..., Chenglin Wu
31 2024-09-04 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency link Jianwen Jiang, Chao Liang,..., Yanbo Zheng
31 2024-10-30 OS-ATLAS: Foundation Action Model for Generalist GUI Agents link Zhiyong Wu, Zhenyu Wu,..., Yu Qiao
31 2024-06-24 DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation link Yuang Peng, Yuxin Cui,..., Shu-Tao Xia
31 2024-06-03 Unlocking Guidance for Discrete State-Space Diffusion and Flow Models link Hunter Nisonoff, Junhao Xiong,..., Jennifer Listgarten
31 2024-06-14 Training-free Camera Control for Video Generation link Chen Hou, Zhibo Chen
31 2024-12-09 Gated Delta Networks: Improving Mamba2 with Delta Rule link Songlin Yang, Jan Kautz, Ali Hatamizadeh
30 2024-09-10 LLaMA-Omni: Seamless Speech Interaction with Large Language Models link Qingkai Fang, Shoutao Guo,..., Yang Feng
29 2024-02-23 Repetition Improves Language Model Embeddings link Jacob Mitchell Springer, Suhas Kotha,..., Aditi Raghunathan
29 2024-06-06 Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions
of Clean Data
link Jingyang Ou, Shen Nie,..., Chongxuan Li
28 2024-10-21 RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
and Style
link Yantao Liu, Zijun Yao,..., Juanzi Li
28 2024-08-15 Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks
of Language Models
link Andy K Zhang, Neil Perry,..., Percy Liang
28 2024-03-13 A Decade's Battle on Dataset Bias: Are We There
Yet?
link Zhuang Liu, Kaiming He
28 2024-09-24 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
link Xiaoming Shi, Shiyu Wang,..., Ming Jin
28 None Scaling Laws for Downstream Task Performance in Machine Translation link Berivan Isik, Natalia Ponomareva,..., Sanmi Koyejo
27 2024-09-05 Planning in Natural Language Improves LLM Search for Code
Generation
link Evan Z Wang, Federico Cassano,..., Hugh Zhang
27 2023-05-31 SafeDiffuser: Safe Planning with Diffusion Probabilistic Models link Wei Xiao, Tsun-Hsuan Wang,..., Daniela Rus
27 2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads link Hanlin Tang, Yang Lin,..., Gongyi Wang
27 2024-06-18 Dissecting Adversarial Robustness of Multimodal LM Agents link Chen Henry Wu, Rishi Rajesh Shah,..., Aditi Raghunathan
27 2024-09-17 EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR
PRIVACY LEAKAGE
link Zeyi Liao, Lingbo Mo,..., Huan Sun
27 2024-10-15 Latent Action Pretraining from Videos link Seonghyeon Ye, Joel Jang,..., Minjoon Seo
27 2024-07-16 Does Refusal Training in LLMs Generalize to the Past
Tense?
link Maksym Andriushchenko, Nicolas Flammarion
26 2023-11-20 Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion
Model
link Chunming He, Chengyu Fang,..., Sina Farsiu
26 2024-04-15 Learn Your Reference Model for Real Good Alignment link Alexey Gorbatovski, Boris Shaposhnikov,..., Daniil Gavrilov
26 2024-06-13 Test of Time: A Benchmark for Evaluating LLMs on
Temporal Reasoning
link Bahare Fatemi, Mehran Kazemi,..., Bryan Perozzi
26 2024-07-23 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences link Canyu Zhao, Mingyu Liu,..., Chunhua Shen
26 2024-06-12 CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models link Hyungjin Chung, Jeongsol Kim,..., Jong Chul Ye
25 2025-01-07 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with
One Vision Token
link Shaolei Zhang, Qingkai Fang,..., Yang Feng
25 2024-08-29 Physics of Language Models: Part 2.2, How to Learn
From Mistakes on Grade-School Math Problems
link Tian Ye, Zicheng Xu,..., Zeyuan Allen-Zhu
25 2024-10-03 HELMET: How to Evaluate Long-context Models Effectively and Thoroughly link Howard Yen, Tianyu Gao,..., Danqi Chen
25 2024-10-03 LLMs Know More Than They Show: On the Intrinsic
Representation of LLM Hallucinations
link Hadas Orgad, Michael Toker,..., Yonatan Belinkov
25 2024-07-11 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data
Engine
link Renrui Zhang, Xinyu Wei,..., Hongsheng Li
25 2024-06-28 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy link Xiang Li, Cristina Mata,..., Michael S Ryoo
25 2024-07-20 Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to
Pretraining Data
link Xinyi Wang, Antonis Antoniades,..., William Yang Wang
25 2024-05-27 Matryoshka Multimodal Models link Mu Cai, Jianwei Yang,..., Yong Jae Lee
25 2024-10-04 AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
link Wenhao Chai, Enxin Song,..., Christopher D Manning
24 2024-09-13 Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with
Memoryless Stochastic Optimal Control
link Carles Domingo-Enrich, Michal Drozdzal,..., Ricky T. Q. Chen
24 2024-02-28 RNNs are not Transformers (Yet): The Key Bottleneck
on In-Context Retrieval
link Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
24 2024-06-20 Consistency Models Made Easy link Zhengyang Geng, Ashwini Pokle,..., J Zico Kolter
24 2024-05-30 Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference
Models
link Zachary Ankner, Cody Blakeney,..., Mansheej Paul
24 2024-08-06 MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for
Medicine
link Yunfei Xie, Ce Zhou,..., Yuyin Zhou
24 2024-10-08 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs
in RAG
link Bowen Jin, Jinsung Yoon,..., Sercan O Arik
24 2024-09-04 Masked Diffusion Models are Secretly Time-Agnostic Masked Models and
Exploit Inaccurate Categorical Sampling
link Kaiwen Zheng, Yongxin Chen,..., Qinsheng Zhang
24 2024-06-04 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for
Image and Video Generation
link Tianchen Zhao, Tongcheng Fang,..., Yu Wang
23 2024-05-27 RB-Modulation: Training-Free Personalization using Stochastic Optimal Control link Litu Rout, Yujia Chen,..., Wen-Sheng Chu
23 2024-07-14 Lean-STaR: Learning to Interleave Thinking and Proving link Haohan Lin, Zhiqing Sun,..., Yiming Yang
23 2024-09-24 Making Text Embedders Few-Shot Learners link Chaofan Li, Minghao Qin,..., Zheng Liu
23 2024-05-29 Value-Incentivized Preference Optimization: A Unified Approach to Online and
Offline RLHF
link Shicong Cen, Jincheng Mei,..., Bo Dai
23 2024-10-02 CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in
Text-to-SQL
link Mohammadreza Pourreza, Hailong Li,..., Sercan O Arik
23 2024-07-11 Is Your Model Really A Good Math Reasoner? Evaluating
Mathematical Reasoning with Checklist
link Zihao Zhou, Shudong Liu,..., Kaizhu Huang
23 2024-06-11 AI Sandbagging: Language Models can Strategically Underperform on Evaluations link Teun van der Weij, Felix Hofstätter,..., Francis Rhys Ward
23 2024-10-14 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents link Shi Yu, Chaoyue Tang,..., Maosong Sun
23 2024-06-14 Bootstrapping Language Models with DPO Implicit Rewards link Changyu Chen, Zichen Liu,..., Min Lin
23 2024-07-19 BOND: Aligning LLMs with Best-of-N Distillation link Pier Giuseppe Sessa, Robert Dadashi-Tazehozi,..., Olivier Bachem
23 2024-10-14 Semantic Image Inversion and Editing using Rectified Stochastic Differential
Equations
link Litu Rout, Yujia Chen,..., Wen-Sheng Chu
23 2024-09-26 EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation link Jiaxiang Tang, Zhaoshuo Li,..., Qinsheng Zhang
23 2024-10-14 Animate-X: Universal Character Image Animation with Enhanced Motion Representation link Shuai Tan, Biao Gong,..., Ming Yang
22 2024-10-24 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark link S Sakshi, Utkarsh Tyagi,..., Dinesh Manocha
22 2024-11-07 SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion
Models
link Muyang Li, Yujun Lin,..., Song Han
22 2024-08-20 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation
with Speculative Decoding
link Ranajoy Sadhukhan, Jian Chen,..., Beidi Chen
22 2024-10-04 SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
Domains?
link John Yang, Carlos E Jimenez,..., Ofir Press
22 2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention
Matrix
link Yingyu Liang, Jiangxuan Long,..., Yufa Zhou
22 2023-12-16 Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot
Videos
link Mingfei Han, Linjie Yang,..., Heng Wang
22 2024-08-12 VisualAgentBench: Towards Large Multimodal Models as Visual Agents link Xiao Liu, Tianjie Zhang,..., Jie Tang
22 2024-10-10 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation link Jiahao Cui, Hui Li,..., Jingdong Wang
22 2024-05-31 Improved Techniques for Optimization-Based Jailbreaking on Large Language Models link Xiaojun Jia, Tianyu Pang,..., Min Lin
22 2024-10-08 T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and
Conditional Guidance Design
link Jiachen Li, Qian Long,..., William Yang Wang
22 2024-06-14 ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation link Chufan Shi, Cheng Yang,..., Yujiu Yang
22 2024-10-07 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven
Scientific Discovery
link Ziru Chen, Shijie Chen,..., Huan Sun
21 2024-11-12 Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL
Workflows
link Fangyu Lei, Jixuan Chen,..., Tao Yu
21 2024-10-31 No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats
from Sparse Unposed Images
link Botao Ye, Sifei Liu,..., Marc Pollefeys
21 2024-10-14 When Attention Sink Emerges in Language Models: An Empirical
View
link Xiangming Gu, Tianyu Pang,..., Min Lin
21 2024-11-20 Hymba: A Hybrid-head Architecture for Small Language Models link Xin Dong, Yonggan Fu,..., Pavlo Molchanov
21 2024-10-29 DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning
Robustness of Vision Language Models
link Chengke Zou, Xingang Guo,..., Huan Zhang
21 2024-10-28 Arithmetic Without Algorithms: Language Models Solve Math with a
Bag of Heuristics
link Yaniv Nikankin, Anja Reusch,..., Yonatan Belinkov
21 2024-10-16 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language
Models
link Peng Xia, Kangyu Zhu,..., Huaxiu Yao
20 2024-04-15 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
link Han Lin, Jaemin Cho,..., Mohit Bansal
20 2024-10-16 One Step Diffusion via Shortcut Models link Kevin Frans, Danijar Hafner,..., Pieter Abbeel
20 2024-09-19 Scaling FP8 training to trillion-token LLMs link Maxim Fishman, Brian Chmiel,..., Daniel Soudry
20 2024-06-12 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text
link Qingyun Li, Zhe Chen,..., Jifeng Dai
20 2024-10-10 Agent S: An Open Agentic Framework that Uses Computers
Like a Human
link Saaket Agashe, Jiuzhou Han,..., Xin Eric Wang
20 2024-10-03 Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and
Defenses in LLM-based Agents
link Hanrong Zhang, Jingyuan Huang,..., Yongfeng Zhang
20 2024-06-27 A Sanity Check for AI-generated Image Detection link Shilin Yan, Ouxiang Li,..., Weidi Xie
19 2024-10-14 Simplifying, Stabilizing and Scaling Continuous-time Consistency Models link Cheng Lu, Yang Song
19 2024-03-11 Can LLMs Separate Instructions From Data? And What Do
We Even Mean By That?
link Egor Zverev, Sahar Abdelnabi,..., Christoph H. Lampert
19 2024-10-07 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks link Ziyan Jiang, Rui Meng,..., Wenhu Chen
19 2024-11-11 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision link Cong Wei, Zheyang Xiong,..., Wenhu Chen
19 2024-08-04 Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models link Fushuo Huo, Wenchao Xu,..., Peilin Zhao
19 2024-08-30 Safety Layers in Aligned Large Language Models: The Key
to LLM Security
link Shen Li, Liuyi Yao,..., Yaliang Li
19 2024-10-01 AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures
in Robotic Manipulation
link Jiafei Duan, Wilbert Pumacay,..., Yijie Guo
18 2024-10-22 LVSM: A Large View Synthesis Model with Minimal 3D
Inductive Bias
link Haian Jin, Hanwen Jiang,..., Zexiang Xu
18 2024-07-11 AIR-BENCH 2024: A Safety Benchmark based on Regulation and
Policies Specified Risk Categories
link Yi Zeng, Yu Yang,..., Bo Li
18 2024-10-03 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration link Jintao Zhang, Jia wei,..., Jianfei Chen
18 2024-02-27 Follow My Instruction and Spill the Beans: Scalable Data
Extraction from Retrieval-Augmented Generation Systems
link Zhenting Qi, Hanlin Zhang,..., Himabindu Lakkaraju
18 2024-07-12 Human-like Episodic Memory for Infinite Context LLMs link Zafeirios Fountas, Martin Benfeghoul,..., Jun Wang
18 2024-10-13 RMB: Comprehensively benchmarking reward models in LLM alignment link Enyu Zhou, Guodong Zheng,..., Xuanjing Huang
18 2024-08-20 To Code or Not To Code? Exploring Impact of
Code in Pre-training
link Viraat Aryabumi, Yixuan Su,..., Sara Hooker
18 2023-12-28 MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation link Zhongshen Zeng, Pengguang Chen,..., Jiaya Jia
18 2024-07-08 Variational Best-of-N Alignment link Afra Amini, Tim Vieira,..., Ryan Cotterell
18 2024-08-05 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models link Fanqing Meng, Chuanhao Li,..., Wenqi Shao
18 2024-10-02 ImageFolder: Autoregressive Image Generation with Folded Tokens link Xiang Li, Kai Qiu,..., Zhe Lin
18 2024-07-29 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher link Zehui Chen, Kuikun Liu,..., Feng Zhao
18 2024-10-24 Ferret-UI One: Mastering Universal User Interface Understanding Across Platforms link Zhangheng LI, Keen You,..., Zhe Gan
17 2024-10-03 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak
LLMs
link Xiaogeng Liu, Peiran Li,..., Chaowei Xiao
17 2024-07-18 Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization
via Chi-Squared Preference Optimization
link Audrey Huang, Wenhao Zhan,..., Dylan J Foster
17 2024-10-17 AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web
Agents
link Ke Yang, Yao Liu,..., Huzefa Rangwala
17 2024-03-25 Do LLM Agents Have Regret? A Case Study
in Online Learning and Games
link Chanwoo Park, Xiangyu Liu,..., Kaiqing Zhang
17 2024-07-10 Deconstructing What Makes a Good Optimizer for Autoregressive Language
Models
link Rosie Zhao, Depen Morwani,..., Sham M. Kakade
17 2024-09-17 Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models link Orion Weller, Benjamin Van Durme,..., Jack Hessel
17 2024-09-30 ACE: All-round Creator and Editor Following Instructions via Diffusion
Transformer
link Zhen Han, Zeyinzi Jiang,..., Jingren Zhou
17 2024-02-21 T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory
Stitching
link Zizheng Pan, Bohan Zhuang,..., Anima Anandkumar
17 2024-09-09 Improving Pretraining Data Using Perplexity Correlations link Tristan Thrush, Christopher Potts, Tatsunori Hashimoto
17 2024-03-21 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and
Modulation
link Yuning Cui, Syed Waqas Zamir,..., Fahad Shahbaz Khan
17 2024-05-08 Preble: Efficient Distributed Prompt Scheduling for LLM Serving link Vikranth Srivatsa, Zijian He,..., Yiying Zhang
17 2024-06-11 Image and Video Tokenization with Binary Spherical Quantization link Yue Zhao, Yuanjun Xiong, Philipp Kraehenbuehl
17 2024-06-25 Point-SAM: Promptable 3D Segmentation Model for Point Clouds link Yuchen Zhou, Jiayuan Gu,..., Hao Su
17 2024-12-18 Autoregressive Video Generation without Vector Quantization link Haoge Deng, Ting Pan,..., Xinlong Wang
17 2024-07-21 CatVTON: Concatenation Is All You Need for Virtual Try-On
with Diffusion Models
link Zheng Chong, Xiao Dong,..., Xiaodan Liang
17 2024-12-13 TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist
Robotic Policies
link Ruijie Zheng, Yongyuan Liang,..., Jianwei Yang
16 2024-07-01 Turning Up the Heat: Min-p Sampling for Creative and
Coherent LLM Outputs
link Nguyen Nhat Minh, Andrew Baker,..., Ravid Shwartz-Ziv
16 2024-06-17 Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI link Robert Hönig, Javier Rando,..., Florian Tramèr
16 2024-10-21 MagicPIG: LSH Sampling for Efficient LLM Generation link Zhuoming Chen, Ranajoy Sadhukhan,..., Beidi Chen
16 2024-06-19 4K4DGen: Panoramic 4D Generation at 4K Resolution link Renjie Li, Panwang Pan,..., Zhiwen Fan
16 2024-06-12 Real2Code: Reconstruct Articulated Objects via Code Generation link Zhao Mandi, Yijia Weng,..., Shuran Song
16 2024-06-24 Large Language Models Assume People are More Rational than
We Really are
link Ryan Liu, Jiayi Geng,..., Thomas L. Griffiths
16 2024-08-13 Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents link Kexun Zhang, Weiran Yao,..., Caiming Xiong
16 2024-10-08 Round and Round We Go! What makes Rotary Positional
Encodings useful?
link Federico Barbero, Alex Vitvitskyi,..., Petar Veličković
16 2024-05-06 Language-Image Models with 3D Understanding link Jang Hyun Cho, Boris Ivanovic,..., Marco Pavone
16 2024-09-30 FaithEval: Can Your Language Model Stay Faithful to Context,
Even If "The Moon is Made of Marshmallows"
link Yifei Ming, Senthil Purushwalkam,..., Shafiq Joty
16 2024-06-05 A-Bench: Are LMMs Masters at Evaluating AI-generated Images? link Zicheng Zhang, Haoning Wu,..., Guangtao Zhai
16 2024-10-24 Why Does the Effective Context Length of LLMs Fall
Short?
link Chenxin An, Jun Zhang,..., Lingpeng Kong
16 2024-06-11 Towards Realistic Data Generation for Real-World Super-Resolution link Long Peng, Wenbo Li,..., Zheng-Jun Zha
16 2024-06-05 Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer
Models
link Jerry Yao-Chieh Hu, Maojiang Su,..., Han Liu
16 2024-06-06 Interpreting the Second-Order Effects of Neurons in CLIP link Yossi Gandelsman, Alexei A Efros, Jacob Steinhardt
16 2024-09-19 MMSearch: Benchmarking the Potential of Large Models as Multi-modal
Search Engines
link Dongzhi Jiang, Renrui Zhang,..., Hongsheng Li
16 2024-06-01 MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos link Qingming LIU, Yuan Liu,..., Junhui Hou
16 2024-10-03 FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal
Large Language Models
link Zhipei Xu, Xuanyu Zhang,..., Jian Zhang
16 2024-10-14 Depth Any Video with Scalable Synthetic Data link Honghui Yang, Di Huang,..., Tong He
15 2024-06-30 Iterative Nash Policy Optimization: Aligning LLMs with General Preferences
via No-Regret Learning
link Yuheng Zhang, Dian Yu,..., Dong Yu
15 2024-12-19 Predictive Inverse Dynamics Models are Scalable Learners for Robotic
Manipulation
link Yang Tian, Sizhe Yang,..., Jiangmiao Pang
15 2024-08-21 EmbodiedSAM: Online Segment Any 3D Thing in Real Time link Xiuwei Xu, Huangxing Chen,..., Jiwen Lu
15 2024-04-29 LLM-SR: Scientific Equation Discovery via Programming with Large Language
Models
link Parshin Shojaee, Kazem Meidani,..., Chandan K. Reddy
15 2025-01-27 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World
Understanding
link Wei Chow, Jiageng Mao,..., Yue Wang
15 2024-05-24 DEEM: Diffusion models serve as the eyes of large
language models for image perception
link Run Luo, Yunshui Li,..., Binyuan Hui
15 2024-07-25 LoRA-Pro: Are Low-Rank Adapters Properly Optimized? link Zhengbo Wang, Jian Liang,..., Tieniu Tan
15 2024-10-04 CLoSD: Closing the Loop between Simulation and Diffusion for
multi-task character control
link Guy Tevet, Sigal Raab,..., Michiel van de Panne
15 2024-05-29 ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron
Pruning
link Ruchika Chavhan, Da Li, Timothy Hospedales
15 2025-01-18 Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic
Environments
link Hongjin SU, Ruoxi Sun,..., Sercan O Arik
15 2024-10-21 Pangea: A Fully Open Multilingual Multimodal LLM for 39
Languages
link Xiang Yue, Yueqi Song,..., Graham Neubig
15 2024-02-28 Diffusion-based Neural Network Weights Generation link Bedionita Soro, Bruno Andreis,..., Sung Ju Hwang
15 2024-11-05 Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset
and Self-adaptive Planning Agent
link Yangning Li, Yinghui Li,..., Philip S. Yu
15 2024-07-19 System 1.x: Learning to Balance Fast and Slow Planning
with Language Models
link Swarnadeep Saha, Archiki Prasad,..., Mohit Bansal
15 2024-10-25 Not All Heads Matter: A Head-Level KV Cache Compression
Method with Integrated Retrieval and Reasoning
link Yu Fu, Zefan Cai,..., Wen Xiao
15 2024-10-23 Scaling Diffusion Language Models via Adaptation from Autoregressive Models link Shansan Gong, Shivam Agarwal,..., Lingpeng Kong
15 2024-06-10 How efficient is LLM-generated code? A rigorous & high-standard
benchmark
link Ruizhong Qiu, Weiliang Will Zeng,..., Hanghang Tong
15 2024-10-15 Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws link Yiding Jiang, Allan Zhou,..., J Zico Kolter
15 2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And
Video Generation
link Jaehong Yoon, Shoubin Yu,..., Mohit Bansal
15 2024-10-18 Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning link Jiacheng Ye, Jiahui Gao,..., Lingpeng Kong
15 2024-04-09 Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm
Design and Convergence Analysis
link Guangchen Lan, Dong-Jun Han,..., Christopher Brinton
15 2024-10-03 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations link Nicholas Jiang, Anish Kachinthaya,..., Yossi Gandelsman
15 2024-10-15 Process Reward Model with Q-value Rankings link Wendi Li, Yixuan Li
14 2024-09-03 Booster: Tackling Harmful Fine-tuning for Large Language Models via
Attenuating Harmful Perturbation
link Tiansheng Huang, Sihao Hu,..., Ling Liu
14 2024-07-05 Simplifying Deep Temporal Difference Learning link Matteo Gallici, Mattie Fellows,..., Mario Martin
14 2024-06-24 Theory on Mixture-of-Experts in Continual Learning link Hongbo Li, Sen Lin,..., Ness Shroff
14 2023-11-27 Regularization by Texts for Latent Diffusion Inverse Solvers link Jeongsol Kim, Geon Yeong Park,..., Jong Chul Ye
14 2024-07-30 ThinK: Thinner Key Cache by Query-Driven Pruning link Yuhui Xu, Zhanming Jie,..., Doyen Sahoo
14 2024-10-17 Web Agents with World Models: Learning and Leveraging Environment
Dynamics in Web Navigation
link Hyungjoo Chae, Namyoung Kim,..., Jinyoung Yeo
14 2024-11-04 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
Reinforcement Learning
link Zehan Qi, Xiao Liu,..., Yuxiao Dong
14 2024-07-19 ChatQA 2: Bridging the Gap to Proprietary LLMs in
Long Context and RAG Capabilities
link Peng Xu, Wei Ping,..., Bryan Catanzaro
14 2024-12-10 On Evaluating the Durability of Safeguards for Open-Weight LLMs link Xiangyu Qi, Boyi Wei,..., Peter Henderson
14 2024-10-15 Improving Instruction-Following in Language Models through Activation Steering link Alessandro Stolfo, Vidhisha Balachandran,..., Besmira Nushi
14 2024-05-25 Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection link Yun Zhu, Jia-Chen Gu,..., Jindong Chen
14 2024-09-06 Theory, Analysis, and Best Practices for Sigmoid Self-Attention link Jason Ramapuram, Federico Danieli,..., Russell Webb
14 2024-10-11 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization link Noam Razin, Sadhika Malladi,..., Boris Hanin
14 2024-09-12 DSBench: How Far Are Data Science Agents from Becoming
Data Science Experts?
link Liqiang Jing, Zhehui Huang,..., Dong Yu
14 2024-03-26 AgentStudio: A Toolkit for Building General Virtual Agents link Longtao Zheng, Zhiyuan Huang,..., Shuicheng YAN
14 2024-06-19 InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales link Zhepei Wei, Wei-Lin Chen, Yu Meng
14 2024-03-10 What Matters When Repurposing Diffusion Models for General Dense
Perception Tasks?
link Guangkai Xu, Yongtao Ge,..., Chunhua Shen
14 2024-11-04 GenXD: Generating Any 3D and 4D Scenes link Yuyang Zhao, Chung-Ching Lin,..., Lijuan Wang
14 2024-10-08 TRACE: Temporal Grounding Video LLM via Causal Event
Modeling
link Yongxin Guo, Jingyu Liu,..., Xi Chen
14 2024-07-29 Diffusion Feedback Helps CLIP See Better link Wenxuan Wang, Quan Sun,..., Xinlong Wang
14 2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded
Tuning
link Xiangyu Zeng, Kunchang Li,..., Limin Wang
14 None GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation link Yushi LAN, Shangchen Zhou,..., Chen Change Loy
14 None Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos link Isabella Liu, Hao Su, Xiaolong Wang
14 2024-10-03 ControlAR: Controllable Image Generation with Autoregressive Models link Zongming Li, Tianheng Cheng,..., Xinggang Wang
14 2024-06-17 DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors link Keon Lee, Dong Won Kim,..., Jaewoong Cho
13 2024-11-07 Scaling Laws for Precision link Tanishq Kumar, Zachary Ankner,..., Aditi Raghunathan
13 2023-05-24 gRNAde: Geometric Deep Learning for 3D RNA inverse design link Chaitanya K. Joshi, Arian Rokkum Jamasb,..., Pietro Lio
13 2023-12-13 CBQ: Cross-Block Quantization for Large Language Models link Xin Ding, Xiaoyu Liu,..., Yunhe Wang
13 2024-06-19 Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large
Language Models
link Guanting Dong, Keming Lu,..., Jingren Zhou
13 2024-09-06 Programming Refusal with Conditional Activation Steering link Bruce W. Lee, Inkit Padhi,..., Amit Dhurandhar
13 2024-10-17 D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution
Refinement
link Yansong Peng, Hebei Li,..., Feng Wu
13 2024-10-17 DPLM-2: A Multimodal Diffusion Protein Language Model link Xinyou Wang, Zaixiang Zheng,..., Quanquan Gu
13 2024-10-24 Scaling up Masked Diffusion Models on Text link Shen Nie, Fengqi Zhu,..., Chongxuan Li
13 2024-07-10 Towards Robust Alignment of Language Models: Distributionally Robustifying Direct
Preference Optimization
link Junkang Wu, Yuexiang Xie,..., Xiangnan He
13 2024-10-14 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory link Di Wu, Hongwei Wang,..., Dong Yu
13 2024-05-27 BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments link Yusuf H Roohani, Andrew H. Lee,..., Jure Leskovec
13 2024-12-18 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models link Yinlam Chow, Guy Tennenholtz,..., Aleksandra Faust
13 2024-10-18 DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device
Control Agent
link Taiyi Wang, Zhihao Wu,..., Kun Shao
13 2024-07-01 MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs link Yusu Qian, Hanrong Ye,..., Zhe Gan
13 2024-06-13 MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs link Xuannan Liu, Zekun Li,..., Zhaofeng He
13 2024-05-27 PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance link Haohan Weng, Yikai Wang,..., Jun Zhu
13 2024-03-26 Chain-of-Action: Faithful and Multimodal Question Answering through Large Language
Models
link Zhenyu Pan, Haozheng Luo,..., Han Liu
13 2024-05-28 EG4D: Explicit Generation of 4D Object without Score Distillation link Qi Sun, Zhiyang Guo,..., Houqiang Li
13 2024-06-20 DeciMamba: Exploring the Length Extrapolation Potential of Mamba link Assaf Ben-Kish, Itamar Zimerman,..., Raja Giryes
13 2024-11-07 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation link Koichi Namekata, Sherwin Bahmani,..., David B. Lindell
13 2024-05-24 Diffusion Bridge Implicit Models link Kaiwen Zheng, Guande He,..., Jun Zhu
13 2024-10-05 Accelerating Diffusion Transformers with Token-wise Feature Caching link Chang Zou, Xuyang Liu,..., Linfeng Zhang
13 2024-06-17 Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong
Generalization
link Wenkai Yang, Shiqi Shen,..., Ji-Rong Wen
13 2024-09-20 RRM: Robust Reward Model Training Mitigates Reward Hacking link Tianqi Liu, Wei Xiong,..., Mohammad Saleh
13 2024-08-16 Visual Agents as Fast and Slow Thinkers link Guangyan Sun, Mingyu Jin,..., Dongfang Liu
13 2024-10-09 Rectified Diffusion: Straightness Is Not Your Need in Rectified
Flow
link Fu-Yun Wang, Ling Yang,..., Hongsheng Li
13 2024-10-09 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology link Xiangyu Wang, Donglin Yang,..., Si Liu
13 2024-10-10 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image
Synthesis
link Jinbin Bai, Tian Ye,..., Shuicheng YAN
13 2024-11-26 Scaling Speech-Text Pre-training with Synthetic Interleaved Data link Aohan Zeng, Zhengxiao Du,..., Jie Tang
12 2024-06-26 On Scaling Up 3D Gaussian Splatting Training link Hexu Zhao, Haoyang Weng,..., Saining Xie
12 2024-07-25 Trust or Escalate: LLM Judges with Provable Guarantees for
Human Agreement
link Jaehun Jung, Faeze Brahman, Yejin Choi
12 2024-07-08 $R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical
Reasoning
link Mintong Kang, Bo Li
12 2024-09-26 How Feature Learning Can Improve Neural Scaling Laws link Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
12 2024-10-18 How to Evaluate Reward Models for RLHF link Evan Frick, Tianle Li,..., Ion Stoica
12 None Palu: KV-Cache Compression with Low-Rank Projection link Chi-Chih Chang, Wei-Cheng Lin,..., Kai-Chiang Wu
12 2025-02-05 SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models link Daniel Levy, Siba Smarak Panigrahi,..., Siamak Ravanbakhsh
12 2024-10-11 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time
Hybrid Information Structurization
link Zhuoqun Li, Xuanang Chen,..., Yongbin Li
12 2024-06-25 Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted
Phenomenon
link USVSN Sai Prashanth, Alvin Deng,..., Naomi Saphra
12 2024-10-17 Looking Inward: Language Models Can Learn About Themselves by
Introspection
link Felix Jedidja Binder, James Chua,..., Owain Evans
12 2024-06-12 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos link Xuehai He, Weixi Feng,..., Xin Eric Wang
12 2024-10-03 RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph link Siru Ouyang, Wenhao Yu,..., Dong Yu
12 2024-07-01 Eliminating Position Bias of Language Models: A Mechanistic Approach link Ziqi Wang, Hanlin Zhang,..., Heng Ji
12 2024-10-08 From Tokens to Words: On the Inner Lexicon of
LLMs
link Guy Kaplan, Matanel Oren,..., Roy Schwartz
12 2024-06-23 Efficient Evolutionary Search Over Chemical Space with Large Language
Models
link Haorui Wang, Marta Skreta,..., Chao Zhang
12 2024-11-25 Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and
Efficiency
link Jerry Yao-Chieh Hu, Wei-Po Wang,..., Han Liu
12 2024-10-16 CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency
Patching
link Xingjian Wu, Xiangfei Qiu,..., Bin Yang
12 2024-05-28 Learning Diverse Attacks on Large Language Models for Robust
Red-Teaming and Safety Tuning
link Seanie Lee, Minsu Kim,..., Moksh Jain
12 2024-10-09 MMEgo: Towards Building Egocentric Multimodal LLMs link Hanrong Ye, Haotian Zhang,..., Bowen Zhang
12 2024-04-16 COMBO: Compositional World Models for Embodied Multi-Agent Cooperation link Hongxin Zhang, Zeyuan Wang,..., Chuang Gan
12 2024-12-10 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video
Generation
link Xiao FU, Xian Liu,..., Dahua Lin
12 2024-11-08 Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the
Capabilities of Spoken Language Models with 180 Tasks
link Chien-yu Huang, Wei-Chih Chen,..., Hung-yi Lee
12 2024-10-28 CycleResearcher: Improving Automated Research via Automated Review link Yixuan Weng, Minjun Zhu,..., Linyi Yang
11 2024-10-08 Restructuring Vector Quantization with the Rotation Trick link Christopher Fifty, Ronald Guenther Junkins,..., Christopher Re
11 2024-11-19 Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues link Riccardo Grazzi, Julien Siems,..., massimiliano pontil
11 2024-06-28 PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent
Collaboration
link Yuxuan Sun, Yunlong Zhang,..., Lin Yang
11 2024-09-11 Synthetic continued pretraining link Zitong Yang, Neil Band,..., Tatsunori Hashimoto
11 2024-10-05 AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language
Models via Systematic Attribution of Machine Text against Web Text
link Ximing Lu, Melanie Sclar,..., Yejin Choi
11 2024-10-22 Do Vision-Language Models Represent Space and How? Evaluating Spatial
Frame of Reference under Ambiguities
link Zheyuan Zhang, Fengyuan Hu,..., Ziqiao Ma
11 2024-03-05 Correlated Proxies: A New Definition and Improved Mitigation for
Reward Hacking
link Cassidy Laidlaw, Shivam Singhal, Anca Dragan
11 2024-08-15 Can Large Language Models Understand Symbolic Graphics Programs? link Zeju Qiu, Weiyang Liu,..., Bernhard Schölkopf
11 2023-10-07 Targeted Attack Improves Protection against Unauthorized Diffusion Customization link Boyang Zheng, Chumeng Liang, Xiaoyu Wu
11 2024-07-22 ALLaM: Large Language Models for Arabic and English link M Saiful Bari, Yazeed Alnumay,..., Haidar Khan
11 2024-06-12 CS-Bench: A Comprehensive Benchmark for Large Language Models towards
Computer Science Mastery
link Xiaoshuai Song, Muxi Diao,..., Weiran Xu
11 2024-03-21 Physics-Informed Diffusion Models link Jan-Hendrik Bastek, WaiChing Sun, Dennis Kochmann
11 2024-05-24 Emergence of a High-Dimensional Abstraction Phase in Language Transformers link Emily Cheng, Diego Doimo,..., Marco Baroni
11 2024-10-15 Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based
Formalized Programming
link Yilun Hao, Yang Zhang, Chuchu Fan
11 2024-11-26 On Statistical Rates of Conditional Diffusion Transformer: Approximation and
Estimation
link Jerry Yao-Chieh Hu, Weimin Wu,..., Han Liu
11 2024-06-11 3D-Properties: Identifying Challenges in DPO and Charting a Path
Forward
link Yuzi Yan, Yibo Miao,..., Dong Yan
11 2024-05-27 Motion-Agent: A Conversational Framework for Human Motion Generation with
LLMs
link Qi Wu, Yubo Zhao,..., Chi-Keung Tang
11 2024-08-27 NeuroLM: A Universal Multi-task Foundation Model for Bridging the
Gap between Language and EEG Signals
link Weibang Jiang, Yansen Wang,..., Dongsheng Li
11 2024-04-29 U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion
in Generative Hierarchical Models
link Song Mei
11 2024-05-24 OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness
with Environments Programmed in Code
link Maxence Faldor, Jenny Zhang,..., Jeff Clune
11 2024-05-30 Is In-Context Learning Sufficient for Instruction Following in LLMs? link Hao Zhao, Maksym Andriushchenko,..., Nicolas Flammarion
11 2024-10-04 Dynamic Diffusion Transformer link Wangbo Zhao, Yizeng Han,..., Yang You
11 2024-06-11 McEval: Massively Multilingual Code Evaluation link Linzheng Chai, Shukai Liu,..., Zhoujun Li
11 2024-10-09 SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection link Han Shen, Pin-Yu Chen,..., Tianyi Chen
11 2024-10-09 Does Spatial Cognition Emerge in Frontier Models? link Santhosh Kumar Ramakrishnan, Erik Wijmans,..., Vladlen Koltun
11 2024-06-27 From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities
in LLMs by Finetuning on Synthetic Data
link Zheyang Xiong, Vasilis Papageorgiou,..., Dimitris Papailiopoulos
10 2024-10-06 Inference Scaling for Long-Context Retrieval Augmented Generation link Zhenrui Yue, Honglei Zhuang,..., Michael Bendersky
10 2024-10-11 Transformers Provably Solve Parity Efficiently with Chain of Thought link Juno Kim, Taiji Suzuki
10 2024-11-21 Do I Know This Entity? Knowledge Awareness and Hallucinations
in Language Models
link Javier Ferrando, Oscar Balcells Obeso,..., Neel Nanda
10 2024-10-14 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language
Models
link Peng Xia, Siwei Han,..., Huaxiu Yao
10 2024-08-26 Training-Free Activation Sparsity in Large Language Models link James Liu, Pragaash Ponnusamy,..., Ben Athiwaratkun
10 2024-10-09 An undetectable watermark for generative image models link Sam Gunn, Xuandong Zhao, Dawn Song
10 2025-01-15 Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation
Models via Energy Hessians
link Ishan Amin, Sanjeev Raja, Aditi S. Krishnapriyan
10 2024-10-09 KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks link Kaijing Ma, Xeron Du,..., Ge Zhang
10 2024-10-04 GraphRouter: A Graph-based Router for LLM Selections link Tao Feng, Yanzhen Shen, Jiaxuan You
10 2024-08-04 Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs
via Complementary Image Pyramid
link Mingxin Huang, Yuliang Liu,..., Xiang Bai
10 2024-10-22 Self-Evolving Multi-Agent Networks for Software Development link Yue Hu, Yuzhu Cai,..., Siheng Chen
10 2024-11-07 The Semantic Hub Hypothesis: Language Models Share Semantic Representations
Across Languages and Modalities
link Zhaofeng Wu, Xinyan Velocity Yu,..., Yoon Kim
10 2024-04-09 MuPT: A Generative Symbolic Music Pretrained Transformer link Xingwei Qu, yuelin bai,..., Ge Zhang
10 2024-03-22 A Transfer Attack to Image Watermarks link Yuepeng Hu, Zhengyuan Jiang,..., Neil Zhenqiang Gong
10 2024-09-30 Robust LLM safeguarding via refusal feature adversarial training link Lei Yu, Virginie Do,..., Nicola Cancedda
10 2024-10-01 TestGenEval: A Real World Unit Test Generation and Test
Completion Benchmark
link Kush Jain, Gabriel Synnaeve, Baptiste Roziere
10 2024-06-11 On the Relation between Trainability and Dequantization of Variational
Quantum Learning Models
link Elies Gil-Fuster, Casper Gyurik,..., Vedran Dunjko
10 2024-12-10 CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding link Jiquan Wang, Sha Zhao,..., Gang Pan
10 2024-10-02 On the expressiveness and spectral bias of KANs link Yixuan Wang, Jonathan W. Siegel,..., Thomas Y. Hou
10 2024-05-28 Hierarchical World Models as Visual Whole-Body Humanoid Controllers link Nicklas Hansen, Jyothir S V,..., Hao Su
10 2024-12-01 Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language
Context Sparsification
link Wenxuan Huang, Zijie Zhai,..., Shaohui Lin
10 2024-10-03 Articulate-Anything: Automatic Modeling of Articulated Objects via a
Vision-Language Foundation Model
link Long Le, Jason Xie,..., Eric Eaton
10 2024-09-27 O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal
Assumptions
link Gen Li, Yuling Yan
10 2024-05-28 Intent3D: 3D Object Detection in RGB-D Scans Based on
Human Intention
link Weitai Kang, Mengxue Qu,..., Yan Yan