Last updated: 2024-12-09 08:30:24. Maintained by Weisen Jiang.
citation | date | review | title (pdf) | authors |
---|---|---|---|---|
1341 | 2023-12-01 | link | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | Albert Gu; Tri Dao |
432 | None | link | AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework | Qingyun Wu; Gagan Bansal; Jieyu Zhang; Yiran Wu; Beibin Li; Erkang Zhu; Li Jiang; Xiaoyun Zhang; Shaokun Zhang; Jiale Liu; Ahmed Hassan Awadallah; Ryen W White; Doug Burger; Chi Wang |
279 | 2023-10-25 | link | Zephyr: Direct Distillation of LM Alignment | Lewis Tunstall; Edward Emanuel Beeching; Nathan Lambert; Nazneen Rajani; Kashif Rasul; Younes Belkada; Shengyi Huang; Leandro Von Werra; Clémentine Fourrier; Nathan Habib; Nathan Sarrazin; Omar Sanseviero; Alexander M Rush; Thomas Wolf |
165 | 2024-04-09 | link | MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies |
Shengding Hu; Yuge Tu; Xu Han; Ganqu Cui; Chaoqun He; Weilin Zhao; Xiang Long; Zhi Zheng; Yewei Fang; Yuxiang Huang; Xinrong Zhang; Zhen Leng Thai; Chongyi Wang; Yuan Yao; Chenyang Zhao; Jie Zhou; Jie Cai; Zhongwu Zhai; Ning Ding; Chao Jia; Guoyang Zeng; dahai li; Zhiyuan Liu; Maosong Sun |
144 | 2023-06-28 | link | Towards Measuring the Representation of Subjective Global Opinions in Language Models |
Esin DURMUS; Karina Nguyen; Thomas Liao; Nicholas Schiefer; Amanda Askell; Anton Bakhtin; Carol Chen; Zac Hatfield-Dodds; Danny Hernandez; Nicholas Joseph; Liane Lovitt; Sam McCandlish; Orowa Sikder; Alex Tamkin; Janel Thamkul; Jared Kaplan; Jack Clark; Deep Ganguli |
131 | 2023-11-20 | link | GPQA: A Graduate-Level Google-Proof Q&A Benchmark | David Rein; Betty Li Hou; Asa Cooper Stickland; Jackson Petty; Richard Yuanzhe Pang; Julien Dirani; Julian Michael; Samuel R. Bowman |
129 | 2023-07-25 | link | LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | Chengsong Huang; Qian Liu; Bill Yuchen Lin; Tianyu Pang; Chao Du; Min Lin |
120 | 2023-09-06 | link | Certifying LLM Safety against Adversarial Prompting | Aounon Kumar; Chirag Agarwal; Suraj Srinivas; Aaron Jiaxun Li; Soheil Feizi; Himabindu Lakkaraju |
112 | 2023-10-10 | link | The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets |
Samuel Marks; Max Tegmark |
110 | 2024-04-09 | link | RULER: What's the Real Context Size of Your Long-Context Language Models? |
Cheng-Ping Hsieh; Simeng Sun; Samuel Kriman; Shantanu Acharya; Dima Rekesh; Fei Jia; Boris Ginsburg |
103 | 2024-03-15 | link | RAFT: Adapting Language Model to Domain Specific RAG | Tianjun Zhang; Shishir G Patil; Naman Jain; Sheng Shen; Matei Zaharia; Ion Stoica; Joseph E. Gonzalez |
101 | 2023-10-05 | link | A Long Way to Go: Investigating Length Correlations in RLHF |
Prasann Singhal; Tanya Goyal; Jiacheng Xu; Greg Durrett |
87 | 2024-04-09 | link | LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | Parishad BehnamGhader; Vaibhav Adlakha; Marius Mosbach; Dzmitry Bahdanau; Nicolas Chapados; Siva Reddy |
81 | 2024-01-11 | link | TOFU: A Task of Fictitious Unlearning for LLMs | Pratyush Maini; Zhili Feng; Avi Schwarzschild; Zachary Chase Lipton; J Zico Kolter |
78 | 2024-04-18 | link | From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function |
Rafael Rafailov; Joey Hejna; Ryan Park; Chelsea Finn |
75 | 2024-02-27 | link | Tower: An Open Multilingual Large Language Model for Translation-Related Tasks |
Duarte Miguel Alves; José Pombal; Nuno M Guerreiro; Pedro Henrique Martins; João Alves; Amin Farajian; Ben Peters; Ricardo Rei; Patrick Fernandes; Sweta Agrawal; Pierre Colombo; José G. C. de Souza; Andre Martins |
63 | 2024-04-08 | link | Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | Ruiqi Zhang; Licong Lin; Yu Bai; Song Mei |
60 | 2023-11-17 | link | A Language Agent for Autonomous Driving | Jiageng Mao; Junjie Ye; Yuxi Qian; Marco Pavone; Yue Wang |
57 | 2023-04-03 | link | Inspecting and Editing Knowledge Representations in Language Models | Evan Hernandez; Belinda Z. Li; Jacob Andreas |
57 | 2023-10-16 | link | OpenAgents: An Open Platform for Language Agents in the Wild |
Tianbao Xie; Fan Zhou; Zhoujun Cheng; Peng Shi; Luoxuan Weng; Yitao Liu; Toh Jing Hua; Junning Zhao; Qian Liu; Che Liu; Zeyu Liu; Yiheng Xu; Hongjin SU; Dongchan Shin; Caiming Xiong; Tao Yu |
53 | 2024-01-12 | link | Fine-grained Hallucination Detection and Editing for Language Models | Abhika Mishra; Akari Asai; Vidhisha Balachandran; Yizhong Wang; Graham Neubig; Yulia Tsvetkov; Hannaneh Hajishirzi |
53 | 2023-12-14 | link | Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking |
Jacob Eisenstein; Chirag Nagpal; Alekh Agarwal; Ahmad Beirami; Alexander Nicholas D'Amour; Krishnamurthy Dj Dvijotham; Adam Fisch; Katherine A Heller; Stephen Robert Pfohl; Deepak Ramachandran; Peter Shaw; Jonathan Berant |
53 | 2023-12-11 | link | LLM360: Towards Fully Transparent Open-Source LLMs | Zhengzhong Liu; Aurick Qiao; Willie Neiswanger; Hongyi Wang; Bowen Tan; Tianhua Tao; Junbo Li; Yuqi Wang; Suqi Sun; Omkar Pangarkar; Richard Fan; Yi Gu; Victor Miller; Yonghao Zhuang; Guowei He; Haonan Li; Fajri Koto; Liping Tang; Nikhil Ranjan; Zhiqiang Shen; Roberto Iriondo; Cun Mu; Zhiting Hu; Mark Schulze; Preslav Nakov; Timothy Baldwin; Eric P. Xing |
50 | 2024-02-12 | link | Do Membership Inference Attacks Work on Large Language Models? | Michael Duan; Anshuman Suri; Niloofar Mireshghallah; Sewon Min; Weijia Shi; Luke Zettlemoyer; Yulia Tsvetkov; Yejin Choi; David Evans; Hannaneh Hajishirzi |
49 | 2024-02-09 | link | V-STaR: Training Verifiers for Self-Taught Reasoners | Arian Hosseini; Xingdi Yuan; Nikolay Malkin; Aaron Courville; Alessandro Sordoni; Rishabh Agarwal |
49 | 2024-03-14 | link | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking |
Eric Zelikman; Georges Raif Harik; Yijia Shao; Varuna Jayasiri; Nick Haber; Noah Goodman |
46 | 2024-04-11 | link | AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs |
Zeyi Liao; Huan Sun |
45 | 2023-09-26 | link | VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | Han Lin; Abhay Zala; Jaemin Cho; Mohit Bansal |
43 | 2023-11-16 | link | HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs | Junying Chen; Xidong Wang; Ke Ji; Anningzhe Gao; Feng Jiang; Shunian Chen; Hongbo Zhang; Song Dingjie; Wenya Xie; Chuyi Kong; Jianquan Li; Xiang Wan; Haizhou Li; Benyou Wang |
42 | 2024-03-12 | link | Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Sainbayar Sukhbaatar; Olga Golovneva; Vasu Sharma; Hu Xu; Xi Victoria Lin; Baptiste Roziere; Jacob Kahn; Shang-Wen Li; Wen-tau Yih; Jason E Weston; Xian Li |
41 | 2023-02-11 | link | A Reparameterized Discrete Diffusion Model for Text Generation | Lin Zheng; Jianbo Yuan; Lei Yu; Lingpeng Kong |
37 | 2023-09-27 | link | Large Language Model Routing with Benchmark Datasets | Tal Shnitzer; Anthony Ou; Mírian Silva; Kate Soule; Yuekai Sun; Justin Solomon; Neil Thompson; Mikhail Yurochkin |
35 | 2023-10-03 | link | Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation | Eric Zelikman; Eliana Lorch; Lester Mackey; Adam Tauman Kalai |
34 | 2024-04-08 | link | Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence |
Bo Peng; Daniel Goldstein; Quentin Gregory Anthony; Alon Albalak; Eric Alcaide; Stella Biderman; Eugene Cheah; Teddy Ferdinan; Kranthi Kiran GV; Haowen Hou; Satyapriya Krishna; Ronald McClelland Jr.; Niklas Muennighoff; Fares Obeid; Atsushi Saito; Guangyu Song; Haoqin Tu; Ruichong Zhang; Bingchen Zhao; Qihang Zhao; Jian Zhu; Rui-Jie Zhu |
34 | 2024-03-25 | link | Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators |
Yinhong Liu; Han Zhou; Zhijiang Guo; Ehsan Shareghi; Ivan Vulić; Anna Korhonen; Nigel Collier |
33 | 2024-02-27 | link | Massive Activations in Large Language Models | Mingjie Sun; Xinlei Chen; J Zico Kolter; Zhuang Liu |
32 | 2024-04-01 | link | Mapping the Increasing Use of LLMs in Scientific Papers | Weixin Liang; Yaohui Zhang; Zhengxuan Wu; Haley Lepp; Wenlong Ji; Xuandong Zhao; Hancheng Cao; Sheng Liu; Siyu He; Zhi Huang; Diyi Yang; Christopher Potts; Christopher D Manning; James Y. Zou |
32 | 2024-02-21 | link | Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping |
Lucas Lehnert; Sainbayar Sukhbaatar; DiJia Su; Qinqing Zheng; Paul McVay; Michael Rabbat; Yuandong Tian |
32 | 2024-01-27 | link | MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | Yixuan Tang; Yi Yang |
31 | 2024-04-24 | link | Let's Think Dot by Dot: Hidden Computation in Transformer Language Models |
Jacob Pfau; William Merrill; Samuel R. Bowman |
31 | 2024-04-27 | link | Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities |
Kazuki Fujii; Taishi Nakamura; Mengsay Loem; Hiroki Iida; Masanari Ohi; Kakeru Hattori; Hirai Shota; Sakae Mizuki; Rio Yokota; Naoaki Okazaki |
31 | 2024-01-16 | link | Tuning Language Models by Proxy | Alisa Liu; Xiaochuang Han; Yizhong Wang; Yulia Tsvetkov; Yejin Choi; Noah A. Smith |
31 | 2024-03-31 | link | RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation | Chi-Min Chan; Chunpu Xu; Ruibin Yuan; Hongyin Luo; Wei Xue; Yike Guo; Jie Fu |
30 | 2023-10-16 | link | CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization |
Bodhisattwa Prasad Majumder; Bhavana Dalvi Mishra; Peter Jansen; Oyvind Tafjord; Niket Tandon; Li Zhang; Chris Callison-Burch; Peter Clark |
29 | 2023-10-23 | link | AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models | Sicheng Zhu; Ruiyi Zhang; Bang An; Gang Wu; Joe Barrow; Zichao Wang; Furong Huang; Ani Nenkova; Tong Sun |
29 | 2023-09-30 | link | Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration |
Qiushi Sun; Zhangyue Yin; Xiang Li; Zhiyong Wu; Xipeng Qiu; Lingpeng Kong |
28 | 2024-04-01 | link | LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models |
Yadong Zhang; Shaoguang Mao; Tao Ge; Xun Wang; Yan Xia; Wenshan Wu; Ting Song; Man Lan; Furu Wei |
28 | 2024-04-01 | link | Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data |
Matthias Gerstgrasser; Rylan Schaeffer; Apratim Dey; Rafael Rafailov; Tomasz Korbak; Henry Sleight; Rajashree Agrawal; John Hughes; Dhruv Bhandarkar Pai; Andrey Gromov; Dan Roberts; Diyi Yang; David L. Donoho; Sanmi Koyejo |
27 | 2024-01-24 | link | MambaByte: Token-free Selective State Space Model | Junxiong Wang; Tushaar Gangavarapu; Jing Nathan Yan; Alexander M Rush |
27 | 2024-04-09 | link | Autonomous Evaluation and Refinement of Digital Agents | Jiayi Pan; Yichi Zhang; Nicholas Tomlin; Yifei Zhou; Sergey Levine; Alane Suhr |
27 | 2023-08-15 | link | RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | Jie Huang; Wei Ping; Peng Xu; Mohammad Shoeybi; Kevin Chang; Bryan Catanzaro |
26 | 2024-04-11 | link | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models |
Haotian Zhang; Haoxuan You; Philipp Dufter; Bowen Zhang; Chen Chen; Hong-You Chen; Tsu-Jui Fu; William Yang Wang; Shih-Fu Chang; Zhe Gan; Yinfei Yang |
25 | 2023-10-18 | link | Understanding Retrieval Augmentation for Long-Form Question Answering | Hung-Ting Chen; Fangyuan Xu; Shane Arora; Eunsol Choi |
25 | 2024-04-09 | link | VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? |
Junpeng Liu; Yifan Song; Bill Yuchen Lin; Wai Lam; Graham Neubig; Yuanzhi Li; Xiang Yue |
25 | 2024-04-04 | link | Locating and Editing Factual Associations in Mamba | Arnab Sen Sharma; David Atkinson; David Bau |
25 | 2023-07-13 | link | Effective Prompt Extraction from Language Models | Yiming Zhang; Nicholas Carlini; Daphne Ippolito |
25 | 2023-05-10 | link | Bot or Human? Detecting ChatGPT Imposters with A Single Question |
Hong Wang; Xuan Luo; Weizhi Wang; Melody Yu; Xifeng Yan |
25 | 2024-04-18 | link | TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding |
Hanshi Sun; Zhuoming Chen; Xinyu Yang; Yuandong Tian; Beidi Chen |
23 | 2024-01-30 | link | Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens |
Jiacheng Liu; Sewon Min; Luke Zettlemoyer; Yejin Choi; Hannaneh Hajishirzi |
22 | 2024-04-29 | link | MileBench: Benchmarking MLLMs in Long Context | Song Dingjie; Shunian Chen; Guiming Hardy Chen; Fei Yu; Xiang Wan; Benyou Wang |
22 | 2024-02-14 | link | LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset |
Botao Yu; Frazier N. Baker; Ziqi Chen; Xia Ning; Huan Sun |
22 | 2024-04-01 | link | What is in Your Safe Data? Identifying Benign Data that Breaks Safety |
Luxi He; Mengzhou Xia; Peter Henderson |
21 | 2024-04-11 | link | HGRN2: Gated Linear RNNs with State Expansion | Zhen Qin; Songlin Yang; Weixuan Sun; Xuyang Shen; Dong Li; Weigao Sun; Yiran Zhong |
21 | 2024-04-01 | link | FABLES: Evaluating faithfulness and content selection in book-length summarization | Yekyung Kim; Yapei Chang; Marzena Karpinska; Aparna Garimella; Varun Manjunatha; Kyle Lo; Tanya Goyal; Mohit Iyyer |
20 | 2024-07-16 | link | SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning | Chenyang Zhao; Xueying Jia; Vijay Viswanathan; Graham Neubig; Tongshuang Wu |
19 | 2024-04-02 | link | Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey |
Philipp Mondorf; Barbara Plank |
19 | 2024-03-20 | link | Reverse Training to Nurse the Reversal Curse | Olga Golovneva; Zeyuan Allen-Zhu; Jason E Weston; Sainbayar Sukhbaatar |
18 | 2023-07-12 | link | Instruction Mining: Instruction Data Selection for Tuning Large Language Models |
Yihan Cao; Yanbin Kang; Chi Wang; Lichao Sun |
17 | 2024-04-15 | link | Compression Represents Intelligence Linearly | Yuzhen Huang; Jinghan Zhang; Zifei Shan; Junxian He |
17 | 2024-02-13 | link | On Limitations of the Transformer Architecture | Binghui Peng; Srini Narayanan; Christos Papadimitriou |
17 | 2024-04-07 | link | How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse |
Mohamed El Amine Seddik; Suei-Wen Chen; Soufiane Hayou; Pierre Youssef; Merouane Abdelkader DEBBAH |
16 | 2023-12-02 | link | Eliciting Latent Knowledge from Quirky Language Models | Alex Troy Mallen; Madeline Brumley; Julia Kharchenko; Nora Belrose |
16 | 2024-03-18 | link | What Are Tools Anyway? A Survey from the Language Model Perspective |
Zhiruo Wang; Zhoujun Cheng; Hao Zhu; Daniel Fried; Graham Neubig |
16 | 2024-05-10 | link | LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play |
Li-Chun Lu; Shou-Jen Chen; Tsung-Min Pai; Chan-Hung Yu; Hung-yi Lee; Shao-Hua Sun |
16 | 2024-03-26 | link | Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms |
Michael Hanna; Sandro Pezzelle; Yonatan Belinkov |
16 | 2024-03-27 | link | Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback |
Hongshen Xu; Zichen Zhu; Situo Zhang; Da Ma; Shuai Fan; Lu Chen; Kai Yu |
15 | 2024-03-14 | link | Logits of API-Protected LLMs Leak Proprietary Information | Matthew Finlayson; Xiang Ren; Swabha Swayamdipta |
15 | 2024-03-24 | link | The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization |
Shengyi Huang; Michael Noukhovitch; Arian Hosseini; Kashif Rasul; Weixun Wang; Lewis Tunstall |
15 | 2024-07-22 | link | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability |
Zhuoyan Xu; Zhenmei Shi; Yingyu Liang |
15 | 2024-03-28 | link | STaR-GATE: Teaching Language Models to Ask Clarifying Questions | Chinmaya Andukuri; Jan-Philipp Fränken; Tobias Gerstenberg; Noah Goodman |
13 | 2023-10-05 | link | SteP: Stacked LLM Policies for Web Actions | Paloma Sodhi; S.R.K Branavan; Yoav Artzi; Ryan McDonald |
13 | 2024-04-15 | link | A Survey on Deep Learning for Theorem Proving | Zhaoyu Li; Jialiang Sun; Logan Murphy; Qidong Su; Zenan Li; Xian Zhang; Kaiyu Yang; Xujie Si |
13 | 2023-06-05 | link | Early Weight Averaging meets High Learning Rates for LLM Pre-training |
Sunny Sanyal; Atula Tejaswi Neerkaje; Jean Kaddour; Abhishek Kumar; sujay sanghavi |
13 | 2024-04-11 | link | From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples |
Robert Vacareanu; Vlad Andrei Negru; Vasile Suciu; Mihai Surdeanu |
13 | 2024-04-04 | link | How Easily do Irrelevant Inputs Skew the Responses of Large Language Models? |
Siye Wu; Jian Xie; Jiangjie Chen; Tinghui Zhu; Kai Zhang; Yanghua Xiao |
12 | 2024-02-07 | link | Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding | Zachary Ankner; Rishab Parthasarathy; Aniruddha Nrusimha; Christopher Rinard; Jonathan Ragan-Kelley; William Brandon |
12 | 2024-03-17 | link | StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows | Yiran Wu; Tianwei Yue; Shaokun Zhang; Chi Wang; Qingyun Wu |
12 | 2024-04-03 | link | Auxiliary task demands mask the capabilities of smaller language models |
Jennifer Hu; Michael Frank |
12 | 2024-04-01 | link | Do language models plan ahead for future tokens? | Wilson Wu; John Xavier Morris; Lionel Levine |
11 | 2024-02-06 | link | "Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors |
Lin Guan; Yifan Zhou; Denis Liu; Yantian Zha; Heni Ben Amor; Subbarao Kambhampati |
11 | 2024-03-31 | link | The Larger the Better? Improved LLM Code-Generation via Budget Reallocation |
Michael Hassid; Tal Remez; Jonas Gehring; Roy Schwartz; Yossi Adi |
11 | 2024-04-09 | link | Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models |
Sebastian Bordt; Harsha Nori; Vanessa Cristiny Rodrigues Vasconcelos; Besmira Nushi; Rich Caruana |
11 | 2024-03-30 | link | Multi-hop Question Answering under Temporal Knowledge Editing | Keyuan Cheng; Gang Lin; Haoyang Fei; Yuxuan Zhai; Lu Yu; Muhammad Asif Ali; Lijie Hu; Di Wang |
11 | 2023-12-11 | link | Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions |
Federico Cassano; Luisa Li; Akul Sethi; Noah Shinn; Abby Brennan-Jones; Jacob Ginesin; Edward Berman; George Chakhnashvili; Anton Lozhkov; Carolyn Jane Anderson; Arjun Guha |
11 | 2024-01-21 | link | With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation |
Yan Wang; Dongyang Ma; Deng Cai |
11 | 2023-10-06 | link | An In-Context Learning Agent for Formal Theorem-Proving | Amitayush Thakur; George Tsoukalas; Yeming Wen; Jimmy Xin; Swarat Chaudhuri |
10 | 2024-02-26 | link | StructLM: Towards Building Generalist Models for Structured Knowledge Grounding | Alex Zhuang; Ge Zhang; Tianyu Zheng; Xinrun Du; Junjie Wang; Weiming Ren; Wenhao Huang; Jie Fu; Xiang Yue; Wenhu Chen |
10 | 2024-03-28 | link | Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM |
Chunqiu Steven Xia; Yinlin Deng; LINGMING ZHANG |
10 | 2024-04-15 | link | Impact of Preference Noise on the Alignment Performance of Generative Language Models |
Yang Gao; Dana Alon; Donald Metzler |
10 | 2024-03-21 | link | Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models |
Adam Karvonen |
9 | 2024-04-16 | link | Can Language Models Solve Olympiad Programming? | Ben Shi; Michael Tang; Karthik R Narasimhan; Shunyu Yao |
9 | 2024-04-16 | link | CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting |
Huihan Li; Liwei Jiang; Nouha Dziri; Xiang Ren; Yejin Choi |
9 | 2024-04-08 | link | Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data | Tim Baumgärtner; Yang Gao; Dana Alon; Donald Metzler |
9 | 2024-04-01 | link | Stream of Search (SoS): Learning to Search in Language | Kanishk Gandhi; Denise H J Lee; Gabriel Grand; Muxin Liu; Winson Cheng; Archit Sharma; Noah Goodman |
9 | 2023-10-02 | link | Resolving Knowledge Conflicts in Large Language Models | Yike Wang; Shangbin Feng; Heng Wang; Weijia Shi; Vidhisha Balachandran; Tianxing He; Yulia Tsvetkov |
9 | 2023-06-19 | link | SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design |
Carl Edwards; Aakanksha Naik; Tushar Khot; Martin D. Burke; Heng Ji; Tom Hope |
9 | 2024-05-10 | link | Linearizing Large Language Models | Jean Mercat; Igor Vasiljevic; Sedrick Scott Keh; Kushal Arora; Achal Dave; Adrien Gaidon; Thomas Kollar |
9 | 2024-04-05 | link | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xeron Du; Zhouliang Yu; Songyang Gao; Ding Pan; Cheng Yuyang; Ziyang Ma; Ruibin Yuan; Xingwei Qu; Jiaheng Liu; Tianyu Zheng; Xinchen Luo; Guorui Zhou; Wenhu Chen; Ge Zhang |
8 | 2023-10-09 | link | Guiding Language Model Reasoning with Planning Tokens | Xinyi Wang; Lucas Caccia; Oleksiy Ostapenko; Xingdi Yuan; William Yang Wang; Alessandro Sordoni |
8 | 2024-02-22 | link | Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image |
Zefeng Wang; Zhen Han; Shuo Chen; Fan Xue; Zifeng Ding; Xun Xiao; Volker Tresp; Philip Torr; Jindong Gu |
8 | 2024-04-02 | link | Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation |
Declan Grabb; Max Lamparth; Nina Vasan |
8 | 2023-10-05 | link | DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training | Dacheng Li; Rulin Shao; Anze Xie; Eric P. Xing; Xuezhe Ma; Ion Stoica; Joseph E. Gonzalez; Hao Zhang |
8 | 2024-06-05 | link | Does your data spark joy? Performance gains from domain upsampling at the end of training |
Cody Blakeney; Mansheej Paul; Brett W. Larsen; Sean Owen; Jonathan Frankle |
8 | 2024-05-10 | link | SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models |
Haojie Duanmu; Zhihang Yuan; Xiuhong Li; Jiangfei Duan; Xingcheng ZHANG; Dahua Lin |
8 | 2024-04-17 | link | Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization |
Costas Mavromatis; Petros Karypis; George Karypis |
8 | 2024-03-07 | link | How Far Are We from Intelligent Visual Deductive Reasoning? | Yizhe Zhang; Richard He Bai; Ruixiang ZHANG; Jiatao Gu; Shuangfei Zhai; Joshua M. Susskind; Navdeep Jaitly |
8 | 2024-04-01 | link | Source-Aware Training Enables Knowledge Attribution in Language Models | Muhammad Khalifa; David Wadden; Emma Strubell; Honglak Lee; Lu Wang; Iz Beltagy; Hao Peng |
8 | 2024-04-04 | link | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi; Sarkar Snigdha Sarathi Das; Renze Lou; Jihyun Janice Ahn; Yilun Zhao; Xiaoxin Lu; Nan Zhang; Yusen Zhang; Haoran Ranran Zhang; Sujeeth Reddy Vummanthala; Salika Dave; Shaobo Qin; Arman Cohan; Wenpeng Yin; Rui Zhang |
8 | 2023-05-22 | link | Should We Attend More or Less? Modulating Attention for Fairness |
Abdelrahman Zayed; Goncalo Mordido; Samira Shabanian; Sarath Chandar |
8 | 2024-04-01 | link | IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Deqing Fu; Ruohao Guo; Ghazal Khalighinejad; Ollie Liu; Bhuwan Dhingra; Dani Yogatama; Robin Jia; Willie Neiswanger |
8 | 2023-12-01 | link | Instruction-tuning Aligns LLMs to the Human Brain | Khai Loong Aw; Syrielle Montariol; Badr AlKhamissi; Martin Schrimpf; Antoine Bosselut |
7 | 2024-04-05 | link | Prompt Public Large Language Models to Synthesize Data for Private On-device Applications |
Shanshan Wu; Zheng Xu; Yanxiang Zhang; Yuanbo Zhang; Daniel Ramage |
7 | 2024-01-22 | link | The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models |
Kian Ahrabian; Zhivar Sourati; Kexuan Sun; Jiarui Zhang; Yifan Jiang; Fred Morstatter; Jay Pujara |
7 | 2024-01-31 | link | Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning | Tinghui Zhu; Kai Zhang; Jian Xie; Yu Su |
7 | 2024-03-13 | link | Scattered Mixture-of-Experts Implementation | Shawn Tan; Yikang Shen; Rameswar Panda; Aaron Courville |
7 | 2024-07-11 | link | Automata-based constraints for language model decoding | Terry Koo; Frederick Liu; Luheng He |
7 | 2024-04-01 | link | Exploring the Mystery of Influential Data for Mathematical Reasoning | Xinzhe Ni; Yeyun Gong; Zhibin Gou; yelong shen; Yujiu Yang; Nan Duan; Weizhu Chen |
7 | 2024-05-06 | link | Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training | Zexuan Zhong; Mengzhou Xia; Danqi Chen; Mike Lewis |
7 | 2024-05-01 | link | AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts |
Zefang Liu; Jiahua Luo |
7 | 2024-04-08 | link | LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models |
Shibo Hao; Yi Gu; Haotian Luo; Tianyang Liu; Xiyan Shao; Xinyuan Wang; Shuhua Xie; Haodi Ma; Adithya Samavedhi; Qiyue Gao; Zhen Wang; Zhiting Hu |
7 | 2024-07-25 | link | Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption |
Shi Luohe; Hongyi Zhang; Yao Yao; Zuchao Li; hai zhao |
7 | 2024-04-25 | link | List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs |
An Yan; Zhengyuan Yang; Junda Wu; Wanrong Zhu; Jianwei Yang; Linjie Li; Kevin Lin; Jianfeng Wang; Julian McAuley; Jianfeng Gao; Lijuan Wang |
6 | 2023-08-24 | link | CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias |
Vipul Gupta; Pranav Narayanan Venkit; Hugo Laurençon; Shomir Wilson; Rebecca J. Passonneau |
6 | 2024-04-04 | link | Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations |
Mahjabin Nahar; Haeseung Seo; Eun-Ju Lee; Aiping Xiong; Dongwon Lee |
6 | 2023-11-15 | link | Towards Verifiable Text Generation with Symbolic References | Lucas Torroba Hennigen; Zejiang Shen; Aniruddha Nrusimha; Bernhard Gapp; David Sontag; Yoon Kim |
6 | 2024-04-01 | link | Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs |
Pengda Wang; Zilin Xiao; Hanjie Chen; Frederick L. Oswald |
6 | 2024-06-20 | link | Timo: Towards Better Temporal Reasoning for Language Models | Zhaochen Su; Jun Zhang; Tong Zhu; Xiaoye Qu; Juntao Li; Min zhang; Yu Cheng |
6 | 2023-11-07 | link | Uncovering Intermediate Variables in Transformers using Circuit Probing | Michael A. Lepori; Thomas Serre; Ellie Pavlick |
6 | 2024-05-19 | link | Hummer: Towards Limited Competitive Preference Dataset | Yusen Wu; Li Jiang; Junwu Xiong; Jingqing Ruan; Yichuan Ding; Qingpei Guo; zujie wen; JUN ZHOU; Xiaotie Deng |
6 | 2023-05-24 | link | Using Natural Language Explanations to Rescale Human Judgments | Manya Wadhwa; Jifan Chen; Junyi Jessy Li; Greg Durrett |
6 | 2024-08-12 | link | Evaluating Language Models for Efficient Code Generation | Jiawei Liu; Songrun Xie; Junhao Wang; Yuxiang Wei; Yifeng Ding; LINGMING ZHANG |
6 | 2023-11-09 | link | Efficient Parallelization Layouts for Large-Scale Distributed Model Training | Johannes Hagemann; Samuel Weinbach; Konstantin Dobler; Maximilian Schall; Gerard de Melo |
5 | 2024-03-30 | link | ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction | Mingyu Jin; Haochen Xue; Zhenting Wang; Boming Kang; Ruosong Ye; Kaixiong Zhou; Mengnan Du; Yongfeng Zhang |
5 | 2023-09-26 | link | Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding |
Jiacheng Liu; Andrew Cohen; Ramakanth Pasunuru; Yejin Choi; Hannaneh Hajishirzi; Asli Celikyilmaz |
5 | 2023-10-27 | link | TarGEN: Targeted Data Generation with Large Language Models | Himanshu Gupta; Kevin Scaria; Ujjwala Anantheswaran; Shreyas Verma; Mihir Parmar; Saurabh Arjun Sawant; Chitta Baral; Swaroop Mishra |
5 | 2024-02-24 | link | Empowering Large Language Model Agents through Action Learning | Haiteng Zhao; Chang Ma; Guoyin Wang; Jing Su; Lingpeng Kong; Jingjing Xu; Zhi-Hong Deng; Hongxia Yang |
5 | 2024-05-10 | link | LMD3: Language Model Data Density Dependence | John Kirchenbauer; Garrett Honke; Gowthami Somepalli; Jonas Geiping; Katherine Lee; Daphne Ippolito; Tom Goldstein; David Andre |
5 | 2024-03-13 | link | PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models |
Siddharth Mishra-Sharma; YIDING SONG; Jesse Thaler |
5 | 2024-03-19 | link | Dated Data: Tracing Knowledge Cutoffs in Large Language Models | Jeffrey Cheng; Marc Marone; Orion Weller; Dawn Lawrie; Daniel Khashabi; Benjamin Van Durme |
5 | 2024-06-20 | link | Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning |
Lynn Chua; Badih Ghazi; Yangsibo Huang; Pritish Kamath; Ravi Kumar; Daogao Liu; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang |
5 | 2024-04-11 | link | Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck |
Nathan Godey; Éric Villemonte de la Clergerie; Benoît Sagot |
5 | 2024-04-16 | link | Forcing Diffuse Distributions out of Language Models | Yiming Zhang; Avi Schwarzschild; Nicholas Carlini; J Zico Kolter; Daphne Ippolito |
5 | 2024-04-05 | link | Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers |
Andy Yang; David Chiang |
5 | 2024-08-17 | link | How Susceptible are LLMs to Influence in Prompts? | Sotiris Anagnostidis; Jannis Bulian |
4 | 2024-01-29 | link | NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness |
Manav Singhal; Tushar Aggarwal; Abhijeet Awasthi; Nagarajan Natarajan; Aditya Kanade |
4 | 2024-06-11 | link | Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? | Xingyu Fu; Muyu He; Yujie Lu; William Yang Wang; Dan Roth |
4 | 2024-03-31 | link | Learning to Plan for Language Modeling from Unlabeled Data | Nathan Cornille; Marie-Francine Moens; Florian Mai |
4 | 2024-07-22 | link | Benchmarks as Microscopes: A Call for Model Metrology | Michael Saxon; Ari Holtzman; Peter West; William Yang Wang; Naomi Saphra |
4 | 2024-09-01 | link | Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models |
Bang An; Sicheng Zhu; Ruiyi Zhang; Michael-Andrei Panaitescu-Liess; Yuancheng Xu; Furong Huang |
4 | 2024-08-16 | link | See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses |
Yulong Chen; Yang Liu; Jianhao Yan; Xuefeng Bai; Ming Zhong; Yinghao Yang; Ziyi Yang; Chenguang Zhu; Yue Zhang |
4 | 2024-02-19 | link | ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs |
Pengrui Han; Rafal Dariusz Kocielnik; Adhithya Prakash Saravanan; Roy Luoyao Jiang; Or Sharir; Anima Anandkumar |
4 | 2024-04-01 | link | Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation | Harry Dong; Beidi Chen; Yuejie Chi |
4 | 2023-10-18 | link | DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning | Abhay Zala; Han Lin; Jaemin Cho; Mohit Bansal |
4 | 2024-05-03 | link | Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection |
Guillem Ramírez; Alexandra Birch; Ivan Titov |
3 | 2024-05-02 | link | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen; Zhilin Wang; Olivier Delalleau; Jiaqi Zeng; Yi Dong; Daniel Egert; Shengyang Sun; Jimmy J. Zhang; Sahil Jain; Ali Taghibakhshi; Markel Sanz Ausin; Ashwath Aithal; Oleksii Kuchaiev |
3 | 2024-03-29 | link | MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models |
Peng Ding; Jiading Fang; Peng Li; Kangrui Wang; Xiaochen Zhou; Mo Yu; Jing Li; Hongyuan Mei; Matthew Walter |
3 | 2024-04-21 | link | Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images |
Ali Naseh; Katherine Thai; Mohit Iyyer; Amir Houmansadr |
3 | 2024-04-18 | link | AmbigDocs: Reasoning across Documents on Different Entities under the Same Name |
Yoonsang Lee; Xi Ye; Eunsol Choi |
3 | 2024-03-31 | link | CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs |
Jingzhe Shi; Jialuo Li; Qinwei Ma; Zaiwen Yang; Huan Ma; Lei Li |
3 | 2024-04-01 | link | Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided |
Hongli Zhan; Allen Zheng; Yoon Kyung Lee; Jina Suh; Junyi Jessy Li; Desmond Ong |
3 | None | link | ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training |
Zonghan Yang; Peng Li; Ming Yan; Ji Zhang; Fei Huang; Yang Liu |
3 | 2024-05-02 | link | D2PO: Discriminator-Guided DPO with Response Evaluation Models | Prasann Singhal; Nathan Lambert; Scott Niekum; Tanya Goyal; Greg Durrett |
3 | 2024-02-02 | link | Can MLLMs Perform Text-to-Image In-Context Learning? | Yuchen Zeng; Wonjun Kang; Yicong Chen; Hyung Il Koo; Kangwook Lee |
3 | 2024-01-24 | link | TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance |
Haorui Wang; Rongzhi Zhang; Yinghao Li; Lingkai Kong; Yuchen Zhuang; Xiusi Chen; Chao Zhang |
3 | 2024-08-08 | link | Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP |
François Remy; Pieter Delobelle; Hayastan Avetisyan; Alfiya Khabibullina; Miryam de Lhoneux; Thomas Demeester |
3 | 2024-08-13 | link | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas |
Louis Kwok; Michal Bravansky; Lewis Griffin |
3 | 2024-05-04 | link | Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness | Xinran Zhao; Tong Chen; Sihao Chen; Hongming Zhang; Tongshuang Wu |
3 | 2024-03-23 | link | IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models |
Haz Sameen Shahgir; Khondker Salman Sayeed; Abhik Bhattacharjee; Wasi Uddin Ahmad; Yue Dong; Rifat Shahriyar |
3 | 2024-05-15 | link | PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models |
Devansh Jain; Priyanshu Kumar; Samuel Gehman; Xuhui Zhou; Thomas Hartvigsen; Maarten Sap |
3 | 2023-05-21 | link | Description-Based Text Similarity | Shauli Ravfogel; Valentina Pyatkin; Amir David Nissan Cohen; Avshalom Manevich; Yoav Goldberg |
3 | 2024-04-03 | link | From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives |
Shuxian Fan; Adam Visokay; Kentaro Hoffman; Stephen Salerno; Li Liu; Jeffrey T. Leek; Tyler McCormick |
3 | 2024-08-27 | link | Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations |
Yize Zhao; Tina Behnia; Vala Vakilian; Christos Thrampoulidis |
3 | 2024-03-18 | link | EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents |
Abhay Zala; Jaemin Cho; Han Lin; Jaehong Yoon; Mohit Bansal |
2 | 2024-05-01 | link | WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting |
Olly Styles; Sam Miller; Patricio Cerda-Mardini; Tanaya Guha; Victor Sanchez; Bertie Vidgen |
2 | 2024-08-02 | link | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs |
Yilun Hua; Yoav Artzi |
2 | 2024-08-05 | link | LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory |
Jillian Ross; Yoon Kim; Andrew Lo |
2 | 2024-05-27 | link | On Fairness of Low-Rank Adaptation of Large Models | Zhoujie Ding; Ken Liu; Pura Peetathawatchai; Berivan Isik; Sanmi Koyejo |
2 | 2024-04-12 | link | Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think |
Xinpeng Wang; Chengzhi Hu; Bolei Ma; Paul Rottger; Barbara Plank |
2 | 2024-01-04 | link | SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval |
Griffin Thomas Adams; Jason Zucker; Noémie Elhadad |
2 | 2024-04-04 | link | Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph |
Marco Bronzini; Carlo Nicolini; Bruno Lepri; Jacopo Staiano; Andrea Passerini |
2 | 2024-08-10 | link | Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers |
MohammadReza Ebrahimi; Sunny Panchal; Roland Memisevic |
2 | 2024-03-20 | link | Information-Theoretic Distillation for Reference-less Summarization | Jaehun Jung; Ximing Lu; Liwei Jiang; Faeze Brahman; Peter West; Pang Wei Koh; Yejin Choi |
2 | 2024-04-19 | link | Stronger Random Baselines for In-Context Learning | Gregory Yauney; David Mimno |
2 | 2024-04-03 | link | Scalable Model Editing via Customized Expert Networks | Zihan Yao; Yu He; Tianyu Qi; Ming Li |
2 | 2024-04-06 | link | Language Models as Critical Thinking Tools: A Case Study of Philosophers |
Andre Ye; Jared Moore; Rose Novick; Amy X Zhang |
2 | 2024-03-22 | link | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan; Brenden Lake; Mengye Ren |
2 | 2023-07-13 | link | Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues? |
Bo-Ru Lu; Nikita Haduong; Chia-Hsuan Lee; Zeqiu Wu; Hao Cheng; Paul Koester; Jean Utke; Tao Yu; Noah A. Smith; Mari Ostendorf |
2 | 2024-04-15 | link | Personalized Collaborative Fine-Tuning for On-Device Large Language Models | Nicolas Wagner; Dongyang Fan; Martin Jaggi |
2 | 2024-04-02 | link | Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game |
Silin Du; Xiaowei Zhang |
2 | 2024-02-05 | link | UniMem: Towards a Unified View of Long-Context Large Language Models |
Junjie Fang; Likai Tang; Hongzhe Bi; Yujia Qin; Si Sun; Zhenyu Li; Haolun Li; Yongjian Li; Xin Cong; Yankai Lin; Yukun Yan; Xiaodong Shi; Sen Song; Zhiyuan Liu; Maosong Sun |
2 | 2024-08-09 | link | How Well Do LLMs Identify Cultural Unity in Diversity? | Jialin Li; Junli Wang; Junjie Hu; Ming Jiang |
2 | 2024-04-03 | link | An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models |
Emmy Liu; Graham Neubig; Jacob Andreas |
2 | 2024-08-26 | link | Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? | Urja Khurana; Eric Nalisnick; Antske Fokkens; Swabha Swayamdipta |
1 | 2024-10-07 | link | Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates |
Avanika Narayan; Mayee F Chen; Kush Bhatia; Christopher Re |
1 | 2024-08-09 | link | Tabular Transfer Learning via Prompting LLMs | Jaehyun Nam; Woomin Song; Seong Hyeon Park; Jihoon Tack; Sukmin Yun; Jaehyung Kim; Kyu Hwan Oh; Jinwoo Shin |
1 | 2024-11-25 | link | Predicting Emergent Capabilities by Finetuning | Charlie Victor Snell; Eric Wallace; Dan Klein; Sergey Levine |
1 | 2024-05-30 | link | How Multilingual Are Large Language Models Fine-Tuned for Translation? | Aquia Richburg; Marine Carpuat |
1 | None | link | What makes a good metric? Evaluating automatic metrics for text-to-image consistency |
Candace Ross; Melissa Hall; Adriana Romero-Soriano; Adina Williams |
1 | 2024-08-15 | link | Web Retrieval Agents for Evidence-Based Misinformation Detection | Jacob-Junqi Tian; Hao Yu; Yury Orlovskiy; Tyler Vergho; Mauricio Rivera; Mayank Goel; Zachary Yang; Jean-François Godbout; Reihaneh Rabbany; Kellin Pelrine |
1 | 2024-07-12 | link | Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation |
Biqing Qi; Kaiyan Zhang; Kai Tian; Haoxiang Li; Zhang-Ren Chen; Sihang Zeng; Ermo Hua; Hu Jinfang; Bowen Zhou |
1 | 2024-04-17 | link | AgentKit: Structured LLM Reasoning with Dynamic Graphs | Yue Wu; Yewen Fan; So Yeon Min; Shrimai Prabhumoye; Stephen Marcus McAleer; Ruslan Salakhutdinov; Yonatan Bisk; Yuanzhi Li; Tom Mitchell |
1 | 2023-07-15 | link | CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices |
Weilin Zhao; Yuxiang Huang; Xu Han; Zhiyuan Liu; Zhengyan Zhang; Kuai Li; Chen Chen; TAO YANG; Maosong Sun |
1 | 2024-02-13 | link | Measuring and Controlling Instruction (In)Stability in Language Model Dialogs | Kenneth Li; Tianle Liu; Naomi Bashkansky; David Bau; Fernanda Viégas; Hanspeter Pfister; Martin Wattenberg |
1 | 2023-11-14 | link | AI-generated text boundary detection with RoFT | Laida Kushnareva; Tatiana Gaintseva; Dmitry Abulkhanov; Kristian Kuznetsov; German Magai; Eduard Tulchinskii; Serguei Barannikov; Sergey Nikolenko; Irina Piontkovskaya |
1 | 2024-07-18 | link | Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data |
Charles Jin |
1 | 2024-04-09 | link | Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports |
Tianyu Cao; Natraj Raman; Danial Dervovic; Chenhao Tan |
1 | 2024-09-17 | link | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration |
Jiahui Gao; Renjie Pi; Tianyang Han; Han Wu; Lanqing HONG; Lingpeng Kong; Xin Jiang; Zhenguo Li |
1 | 2024-08-23 | link | LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models |
Chongyan Sun; Ken Lin; Shiwei Wang; Hulong Wu; Chengfei Fu; Zhen Wang |
1 | 2024-04-27 | link | Building a Large Japanese Web Corpus for Large Language Models |
Naoaki Okazaki; Kakeru Hattori; Hirai Shota; Hiroki Iida; Masanari Ohi; Kazuki Fujii; Taishi Nakamura; Mengsay Loem; Rio Yokota; Sakae Mizuki |
1 | 2024-05-30 | link | Reasoning about concepts with LLMs: Inconsistencies abound | Rosario Uceda Sosa; Karthikeyan Natesan Ramamurthy; Maria Chang; Moninder Singh |
1 | 2024-04-16 | link | DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion |
Yu Li; Han Jiang; Chuanyang Gong; Zhihua Wei |
1 | 2024-04-04 | link | PRobELM: Plausibility Ranking Evaluation for Language Models | Moy Yuan; Eric Chamoun; Rami Aly; Chenxi Whitehouse; Andreas Vlachos |
0 | 2024-03-11 | link | 3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation | Huaisheng Zhu; Teng Xiao; Vasant G Honavar |
0 | 2024-05-08 | link | ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation | Ana Brassard; Benjamin Heinzerling; Keito Kudo; Keisuke Sakaguchi; Kentaro Inui |
0 | 2024-04-08 | link | GeniL: A Multilingual Dataset on Generalizing Language | Aida Mostafazadeh Davani; Sagar Gubbi Venkatesh; Sunipa Dev; Shachi Dave; Vinodkumar Prabhakaran |
0 | 2024-04-30 | link | Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics |
James Michaelov; Catherine Arnett; Ben Bergen |
0 | 2024-02-28 | link | Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore | Sheikh Shafayat; Eunsu Kim; Juhyun Oh; Alice Oh |
0 | 2024-03-04 | link | CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text |
Zhenru Lin; Yiqun Yao; Yang Yuan |
0 | 2024-07-13 | link | Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | KuanChao Chu; Yi-Pei Chen; Hideki Nakayama |
0 | 2024-04-01 | link | Forklift: An Extensible Neural Lifter | Jordi Armengol-Estapé; Rodrigo C. O. Rocha; Jackson Woodruff; Pasquale Minervini; Michael O'Boyle |
0 | 2024-08-06 | link | Data Checklist: On Unit-Testing Datasets with Usable Information | Heidi Chenyu Zhang; Shabnam Behzad; Kawin Ethayarajh; Dan Jurafsky |
0 | 2024-10-08 | link | Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective |
Xueying Bai; Yifan Sun; Niranjan Balasubramanian |
0 | 2024-08-12 | link | Long-Form Answers to Visual Questions from Blind and Low Vision People |
Mina Huh; Fangyuan Xu; Yi-Hao Peng; Chongyan Chen; Hansika Murugu; Danna Gurari; Eunsol Choi; Amy Pavel |
0 | 2024-06-11 | link | MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs |
Vera Neplenbroek; Arianna Bisazza; Raquel Fernández |
0 | None | link | Redesigning Information Markets in the Era of Language Models | Martin Weiss; Nasim Rahaman; Manuel Wuthrich; Yoshua Bengio; Li Erran Li; Bernhard Schölkopf; Christopher Pal |
0 | 2023-07-27 | link | Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners |
Jihyeon Lee; Dain Kim; Doohae Jung; Boseop Kim; Kyoung-Woon On |
0 | 2024-04-01 | link | LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models | Haoran Li; Junqi Liu; Zexian Wang; Shiyuan Luo; Xiaowei Jia; Huaxiu Yao |
0 | 2024-05-17 | link | Prompt Exploration with Prompt Regression | Michael Feffer; Ronald Xu; Yuekai Sun; Mikhail Yurochkin |
0 | 2024-08-06 | link | LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification |
Zhen Qin; Junru Wu; Jiaming Shen; Tianqi Liu; Xuanhui Wang |
0 | 2024-07-16 | link | InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification | Yujia Hu; Zhiqiang Hu; Chun Wei Seah; Roy Ka-Wei Lee |
0 | 2024-07-12 | link | Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers |
Jong Myoung Kim; Young-Jun Lee; Yong-Jin Han; Ho-Jin Choi; Sangkeun Jung |
0 | 2024-08-10 | link | Investigating Instruction Tuning Large Language Models on Graphs | Kerui Zhu; Bo-Wei Huang; Bowen Jin; Yizhu Jiao; Ming Zhong; Kevin Chang; Shou-De Lin; Jiawei Han |
0 | 2024-09-03 | link | Unforgettable Generalization in Language Models | Eric Zhang; Leshem Choshen; Jacob Andreas |
0 | 2024-07-11 | link | HDT: Hierarchical Document Transformer | Haoyu He; Markus Flicke; Jan Buchmann; Iryna Gurevych; Andreas Geiger |
0 | 2024-04-24 | link | Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents |
Evgenii Kortukov; Alexander Rubinstein; Elisa Nguyen; Seong Joon Oh |
0 | 2024-03-29 | link | Measuring Taiwanese Mandarin Language Understanding | Po-Heng Chen; Sijia Cheng; Wei-Lin Chen; Yen-Ting Lin; Yun-Nung Chen |
0 | 2024-04-11 | link | Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning |
Quanyu Long; Yin Wu; Wenya Wang; Sinno Jialin Pan |
0 | 2023-12-01 | link | Nonparametric Variational Regularisation of Pretrained Transformers | Fabio James Fehr; James Henderson |
0 | 2024-08-09 | link | FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers |
Joshua Nathaniel Williams; J Zico Kolter |
0 | 2023-10-22 | link | O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models |
Yuchen Xiao; Yanchao Sun; Mengda Xu; Udari Madhushani Sehwag; Jared Vann; Deepeka Garg; Sumitra Ganesh |
0 | 2024-08-05 | link | ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning |
Yuxuan Wang; Alan Yuille; Zhuowan Li; Zilong Zheng |
0 | 2024-08-14 | link | Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability |
Jiri Hron; Laura A Culp; Gamaleldin Fathy Elsayed; Rosanne Liu; Jasper Snoek; Simon Kornblith; Alex Rizkowsky; Isabelle Simpson; Jascha Sohl-Dickstein; Noah Fiedel; Aaron T Parisi; Alexander A Alemi; Azade Nova; Ben Adlam; Bernd Bohnet; Gaurav Mishra; Hanie Sedghi; Izzeddin Gur; Jaehoon Lee; John D Co-Reyes; Kathleen Kenealy; Kelvin Xu; Kevin Swersky; Igor Mordatch; Lechao Xiao; Maxwell Bileschi; Peter J Liu; Roman Novak; Sharad Vikram; Tris Warkentin; Jeffrey Pennington |
0 | None | link | Handling Open-Vocabulary Constructs in Formalizing Speci-fications: Retrieval-Augmented Parsing with Expert Knowledge |
Mohammad Saqib Hasan; Sayontan Ghosh; Dhruv Verma; Geoff Kuenning; Erez Zadok; Scott Smolka; Niranjan Balasubramanian |