4688 |
2023-04-05 |
link |
Segment Anything |
|
2833 |
2023-02-10 |
link |
Adding Conditional Control to Text-to-Image Diffusion Models |
|
1116 |
2022-12-19 |
link |
Scalable Diffusion Models with Transformers |
|
786 |
2023-03-20 |
link |
Zero-1-to-3: Zero-shot One Image to 3D Object |
|
531 |
2022-12-22 |
link |
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation |
|
464 |
2023-03-27 |
link |
Sigmoid Loss for Language Image Pre-Training |
|
445 |
2023-03-24 |
link |
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation |
|
403 |
2023-03-23 |
link |
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators |
|
403 |
2023-02-06 |
link |
Structure and Content-Guided Video Synthesis with Diffusion Models |
|
347 |
2022-11-17 |
link |
DiffusionDet: Diffusion Model for Object Detection |
|
341 |
2023-03-14 |
link |
ViperGPT: Visual Inference via Python Execution for Reasoning |
|
290 |
2022-06-02 |
link |
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images |
|
285 |
2023-04-17 |
link |
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing |
|
283 |
2022-12-08 |
link |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models |
|
277 |
2023-03-22 |
link |
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions |
|
262 |
2023-03-24 |
link |
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior |
|
252 |
2023-03-16 |
link |
LERF: Language Embedded Radiance Fields |
|
247 |
2023-02-27 |
link |
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation |
|
238 |
2023-06-23 |
link |
LightGlue: Local Feature Matching at Light Speed |
|
212 |
2022-11-22 |
link |
DETRs with Collaborative Hybrid Assignments Training |
|
198 |
2022-05-30 |
link |
Prompt-aligned Gradient for Prompt Tuning |
|
197 |
2023-03-13 |
link |
Erasing Concepts from Diffusion Models |
|
196 |
2023-03-20 |
link |
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning |
|
192 |
2023-04-05 |
link |
Generative Novel View Synthesis with 3D-Aware Diffusion Models |
|
187 |
2023-03-22 |
link |
Pix2Video: Video Editing using Image Diffusion |
|
185 |
2023-05-17 |
link |
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models |
|
184 |
2023-03-23 |
link |
DreamBooth3D: Subject-Driven Text-to-3D Generation |
|
183 |
2022-12-10 |
link |
NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction |
|
179 |
2022-12-05 |
link |
PhysDiff: Physics-Guided Human Motion Diffusion Model |
|
161 |
2023-03-03 |
link |
Unleashing Text-to-Image Diffusion Models for Visual Perception |
|
160 |
2022-09-07 |
link |
What does a platypus look like? Generating customized prompts for zero-shot image classification |
|
151 |
2023-03-21 |
link |
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering |
|
151 |
2023-03-20 |
link |
Text2Tex: Text-driven Texture Synthesis via Diffusion Models |
|
145 |
2023-03-16 |
link |
Large Selective Kernel Network for Remote Sensing Object Detection |
|
143 |
2023-03-16 |
link |
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving |
|
141 |
2023-03-21 |
link |
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models |
|
139 |
2023-07-20 |
link |
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion |
|
139 |
2023-03-21 |
link |
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection |
|
134 |
2023-01-02 |
link |
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection |
|
132 |
2023-03-16 |
link |
DiffIR: Efficient Diffusion Model for Image Restoration |
|
130 |
2022-02-08 |
link |
DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models |
|
126 |
2023-04-13 |
link |
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction |
|
126 |
2023-03-16 |
link |
DIRE for Diffusion-Generated Image Detection |
|
125 |
2023-03-28 |
link |
SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis |
|
124 |
2023-03-23 |
link |
Ablating Concepts in Text-to-Image Diffusion Models |
|
123 |
2023-03-27 |
link |
The Stable Signature: Rooting Watermarks in Latent Diffusion Models |
|
122 |
2023-04-11 |
link |
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction |
|
119 |
2023-08-18 |
link |
StableVideo: Text-driven Consistency-aware Diffusion Video Editing |
|
113 |
2023-03-21 |
link |
VAD: Vectorized Scene Representation for Efficient Autonomous Driving |
|
112 |
2022-12-30 |
link |
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning |
|
112 |
2022-12-15 |
link |
Rethinking Vision Transformers for MobileNet Size and Speed |
|
111 |
2022-10-03 |
link |
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training |
|
110 |
2023-03-16 |
link |
Efficient Diffusion Training via Min-SNR Weighting Strategy |
|
110 |
2023-08-22 |
link |
ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes |
|
109 |
2023-03-17 |
link |
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model |
|
108 |
2023-03-25 |
link |
Masked Diffusion Transformer is a Strong Image Synthesizer |
|
108 |
2023-06-08 |
link |
Tracking Everything Everywhere All at Once |
|
105 |
2023-03-07 |
link |
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception |
|
104 |
2023-03-28 |
link |
Unmasked Teacher: Towards Training-Efficient Video Foundation Models |
|
103 |
2023-04-03 |
link |
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model |
|
100 |
2023-06-05 |
link |
Scene as Occupancy |
|
99 |
2023-03-30 |
link |
DDP: Diffusion Model for Dense Visual Prediction |
|
98 |
2023-03-21 |
link |
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models |
|
98 |
2023-08-01 |
link |
FLatten Transformer: Vision Transformer using Focused Linear Attention |
|
98 |
2023-02-03 |
link |
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes |
|
97 |
2023-08-07 |
link |
Dual Aggregation Transformer for Image Super-Resolution |
|
97 |
2023-02-08 |
link |
Q-Diffusion: Quantizing Diffusion Models |
|
97 |
2023-03-29 |
link |
HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion |
|
96 |
2022-11-21 |
link |
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning |
|
96 |
2023-07-21 |
link |
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields |
|
96 |
2023-06-14 |
link |
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement |
|
95 |
2023-03-03 |
link |
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement |
|
93 |
2023-04-13 |
link |
What does CLIP know about a red circle? Visual prompt engineering for VLMs |
|
92 |
2023-04-09 |
link |
Point-SLAM: Dense Neural Point Cloud-based SLAM |
|
90 |
2023-08-24 |
link |
Dense Text-to-Image Generation with Attention Modulation |
|
90 |
2023-03-24 |
link |
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization |
|
88 |
2023-09-07 |
link |
Tracking Anything with Decoupled Video Segmentation |
|
87 |
2023-03-13 |
link |
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion |
|
86 |
2022-11-09 |
link |
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives |
|
86 |
2023-07-20 |
link |
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image |
|
86 |
2022-07-26 |
link |
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment |
|
84 |
2023-03-20 |
link |
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models |
|
83 |
2022-08-23 |
link |
Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition |
|
82 |
2022-10-20 |
link |
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers |
|
80 |
2023-09-05 |
link |
GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction |
|
80 |
2023-06-13 |
link |
Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data |
|
77 |
2023-03-09 |
link |
Rethinking Range View Representation for LiDAR Segmentation |
|
75 |
2023-04-14 |
link |
Delta Denoising Score |
|
75 |
2023-08-15 |
link |
StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models |
|
74 |
2023-08-04 |
link |
Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation |
|
74 |
2022-10-12 |
link |
MotionBERT: A Unified Perspective on Learning Human Motion Representations |
|
73 |
2022-10-03 |
link |
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance |
|
72 |
2023-08-31 |
link |
InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion |
|
72 |
2023-07-27 |
link |
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking |
|
72 |
2023-03-20 |
link |
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation |
|
71 |
2023-08-18 |
link |
SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos |
|
71 |
2023-07-31 |
link |
UniVTG: Towards Unified Video-Language Temporal Grounding |
|
70 |
2023-03-30 |
link |
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement |
|
69 |
2023-04-23 |
link |
Score-Based Diffusion Models as Principled Priors for Inverse Imaging |
|
69 |
2023-06-06 |
link |
ATT3D: Amortized Text-to-3D Object Synthesis |
|
67 |
2022-07-04 |
link |
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference |
|
66 |
2023-03-10 |
link |
GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving |
|
65 |
2023-02-23 |
link |
Teaching CLIP to Count to Ten |
|
65 |
2023-09-28 |
link |
MotionLM: Multi-Agent Motion Forecasting as Language Modeling |
|
64 |
2023-03-27 |
link |
Zero-Shot Composed Image Retrieval with Textual Inversion |
|
63 |
2023-04-02 |
link |
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning |
|
62 |
2023-04-13 |
link |
Expressive Text-to-Image Generation with Rich Text |
|
62 |
2023-08-16 |
link |
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions |
|
61 |
2023-03-21 |
link |
Vox-E: Text-guided Voxel Editing of 3D Objects |
|
61 |
2023-03-30 |
link |
AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control |
|
61 |
2023-08-25 |
link |
Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model |
|
61 |
2023-08-23 |
link |
CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No |
|
61 |
2023-04-13 |
link |
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning |
|
60 |
2023-03-14 |
link |
Editing Implicit Assumptions in Text-to-Image Diffusion Models |
|
59 |
2023-03-30 |
link |
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions |
|
59 |
2023-02-28 |
link |
DREAM: Efficient Dataset Distillation by Representative Matching |
|
59 |
2023-04-09 |
link |
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation |
|
59 |
2023-08-08 |
link |
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment |
|
59 |
2022-01-03 |
link |
Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning |
|
58 |
2023-04-11 |
link |
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models |
|
58 |
2023-03-23 |
link |
End-to-End Diffusion Latent Optimization Improves Classifier Guidance |
|
56 |
2023-03-13 |
link |
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images |
|
56 |
2022-12-30 |
link |
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training |
|
55 |
2023-03-21 |
link |
Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation |
|
55 |
2023-05-02 |
link |
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis |
|
55 |
2023-03-21 |
link |
Implicit Neural Representation for Cooperative Low-light Image Enhancement |
|
54 |
2023-03-17 |
link |
A Unified Continual Learning Framework with General Parameter-Efficient Tuning |
|
54 |
2022-03-24 |
link |
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection |
|
54 |
2023-03-14 |
link |
Adaptive Rotated Convolution for Rotated Object Detection |
|
54 |
2023-09-07 |
link |
ProPainter: Improving Propagation and Transformer for Video Inpainting |
|
53 |
2022-11-25 |
link |
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction |
|
53 |
2023-08-04 |
link |
FB-BEV: BEV Representation from Forward-Backward View Transformations |
|
53 |
2023-03-19 |
link |
SKED: Sketch-guided Text-based 3D Editing |
|
52 |
2023-03-31 |
link |
Diffusion Action Segmentation |
|
52 |
2023-04-13 |
link |
Verbs in Action: Improving verb understanding in video-language models |
|
52 |
2022-04-06 |
link |
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection |
|
51 |
2023-03-31 |
link |
3D-aware Image Generation using 2D Diffusion Models |
|
51 |
2023-04-03 |
link |
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement |
|
51 |
2023-03-22 |
link |
SHERF: Generalizable Human NeRF from a Single Image |
|
51 |
2023-01-03 |
link |
Rethinking Mobile Block for Efficient Attention-based Models |
|
50 |
2023-04-12 |
link |
UniverSeg: Universal Medical Image Segmentation |
|
50 |
2023-07-11 |
link |
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone |
|
49 |
2023-02-27 |
link |
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution |
|
49 |
2022-12-16 |
link |
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers |
|
49 |
2023-03-15 |
link |
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation |
|
48 |
2023-06-12 |
link |
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts |
|
48 |
2023-06-23 |
link |
Zero-shot spatial layout conditioning for text-to-image diffusion models |
|
48 |
2023-08-27 |
link |
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing |
|
48 |
2023-08-31 |
link |
PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction |
|
47 |
2023-06-29 |
link |
Towards Zero-Shot Scale-Aware Monocular Depth Estimation |
|
47 |
2022-11-21 |
link |
Parametric Classification for Generalized Category Discovery: A Baseline Study |
|
46 |
2023-03-27 |
link |
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications |
|
45 |
2023-07-17 |
link |
Scale-Aware Modulation Meet Transformer |
|
45 |
2021-11-18 |
link |
One-Shot Generative Domain Adaptation |
|
45 |
2023-08-01 |
link |
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving |
|
45 |
2023-06-27 |
link |
PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment |
|
45 |
2023-04-03 |
link |
CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception |
|
45 |
2023-03-19 |
link |
NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping |
|
44 |
2023-09-28 |
link |
MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond |
|
44 |
2023-03-12 |
link |
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models |
|
44 |
2022-11-29 |
link |
DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models |
|
44 |
2023-08-29 |
link |
Learning to Upsample by Learning to Sample |
|
44 |
2023-03-23 |
link |
From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels |
|
43 |
2022-11-22 |
link |
DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction |
|
43 |
2023-03-17 |
link |
Denoising Diffusion Autoencoders are Unified Self-supervised Learners |
|
43 |
2023-07-21 |
link |
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation |
|
42 |
2023-05-10 |
link |
Perpetual Humanoid Control for Real-time Simulated Avatars |
|
41 |
2023-07-26 |
link |
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models |
|
41 |
2023-03-15 |
link |
Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer |
|
41 |
2022-10-18 |
link |
Perceptual Grouping in Contrastive Vision-Language Models |
|
41 |
2022-12-30 |
link |
Imitator: Personalized Speech-driven 3D Facial Animation |
|
41 |
2023-08-11 |
link |
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning |
|
40 |
2023-05-21 |
link |
Synthesizing Diverse Human Motions in 3D Indoor Scenes |
|
40 |
2023-03-30 |
link |
NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation |
|
40 |
2023-04-04 |
link |
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing |
|
40 |
2023-03-17 |
link |
No Fear of Classifier Biases: Neural Collapse Inspired Federated Learning with Synthetic and Fixed Classifier |
|
40 |
2022-12-04 |
link |
Multiscale Structure Guided Diffusion for Image Deblurring |
|
40 |
2022-12-09 |
link |
Audiovisual Masked Autoencoders |
|
40 |
2023-01-23 |
link |
InfiniCity: Infinite-Scale City Synthesis |
|
40 |
2022-10-02 |
link |
IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis |
|
39 |
2023-08-21 |
link |
EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition |
|
39 |
2023-07-18 |
link |
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis |
|
39 |
2023-08-15 |
link |
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation |
|
39 |
2023-05-03 |
link |
AG3D: Learning to Generate 3D Avatars from 2D Image Collections |
|
39 |
2023-08-19 |
link |
Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders |
|
38 |
2023-06-13 |
link |
Hidden Biases of End-to-End Driving Models |
|
38 |
2022-12-01 |
link |
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding |
|
38 |
2023-04-07 |
link |
V3Det: Vast Vocabulary Visual Detection Dataset |
|
38 |
2023-04-07 |
link |
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis |
|
38 |
2023-03-20 |
link |
Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection |
|
38 |
2023-09-11 |
link |
ITI-Gen: Inclusive Text-to-Image Generation |
|
37 |
2023-09-02 |
link |
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism |
|
37 |
2023-09-27 |
link |
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions |
|
37 |
2023-09-11 |
link |
Multi3DRefer: Grounding Text Description to Multiple 3D Objects |
|
37 |
2023-03-29 |
link |
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding |
|
37 |
2023-03-16 |
link |
Robust Evaluation of Diffusion-Based Adversarial Purification |
|
37 |
2023-08-21 |
link |
UnLoc: A Unified Framework for Video Localization Tasks |
|
37 |
2023-02-22 |
link |
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities |
|
37 |
2023-03-09 |
link |
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction |
|
37 |
2023-03-17 |
link |
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model |
|
36 |
2023-05-08 |
link |
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models |
|
36 |
2023-04-05 |
link |
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration |
|
36 |
2023-04-12 |
link |
SiLK: Simple Learned Keypoints |
|
36 |
2023-09-26 |
link |
Structure Invariant Transformation for better Adversarial Transferability |
|
36 |
2023-08-24 |
link |
NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes |
|
36 |
2023-03-30 |
link |
Going Beyond Nouns With Vision & Language Models Using Synthetic Data |
|
36 |
2023-07-26 |
link |
ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation |
|
35 |
2023-04-19 |
link |
Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra |
|
35 |
2023-03-23 |
link |
ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting |
|
35 |
2023-03-15 |
link |
Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models |
|
35 |
2023-08-21 |
link |
Texture Generation on 3D Meshes with Point-UV Diffusion |
|
35 |
2022-11-15 |
link |
Will Large-scale Generative Models Corrupt Future Datasets? |
|
35 |
2023-04-27 |
link |
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection |
|
35 |
2023-07-24 |
link |
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports |
|
35 |
2023-05-15 |
link |
Document Understanding Dataset and Evaluation (DUDE) |
|
35 |
2023-09-10 |
link |
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion |
|
34 |
2023-04-20 |
link |
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer |
|
34 |
2023-04-14 |
link |
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models |
|
34 |
2023-07-26 |
link |
Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception |
|
34 |
2022-03-15 |
link |
ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation |
|
34 |
2022-11-20 |
link |
Normalizing Flows for Human Pose Anomaly Detection |
|
34 |
2022-11-17 |
link |
SPACE: Speech-driven Portrait Animation with Controllable Expression |
|
34 |
2023-01-03 |
link |
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection |
|
34 |
2023-09-21 |
link |
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance |
|
34 |
2023-01-05 |
link |
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token |
|
34 |
2023-04-24 |
link |
Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization |
|
34 |
2023-07-27 |
link |
PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization |
|
34 |
2023-02-07 |
link |
HumanMAC: Masked Motion Completion for Human Motion Prediction |
|
34 |
2023-08-30 |
link |
Introducing Language Guidance in Prompt-based Continual Learning |
|
34 |
2023-08-16 |
link |
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption |
|
34 |
2022-11-18 |
link |
LVOS: A Benchmark for Long-term Video Object Segmentation |
|
34 |
2023-06-06 |
link |
DVIS: Decoupled Video Instance Segmentation Framework |
|
33 |
2022-12-07 |
link |
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation |
|
33 |
2023-03-22 |
link |
RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration |
|
33 |
2023-08-18 |
link |
Robust Monocular Depth Estimation under Challenging Conditions |
|
33 |
2023-03-28 |
link |
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance |
|
33 |
2023-04-20 |
link |
SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation |
|
33 |
2023-09-12 |
link |
Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model |
|
33 |
2023-04-09 |
link |
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes |
|
33 |
2023-04-10 |
link |
Instance Neural Radiance Field |
|
33 |
2023-07-28 |
link |
Scaling Data Generation in Vision-and-Language Navigation |
|
33 |
2023-07-14 |
link |
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts |
|
33 |
2023-04-06 |
link |
Diffusion Models as Masked Autoencoders |
|
33 |
2023-07-20 |
link |
Urban Radiance Field Representation with Deformable Neural Mesh Primitives |
|
33 |
2023-03-21 |
link |
CC3D: Layout-Conditioned Generation of Compositional 3D Scenes |
|
33 |
2023-10-01 |
link |
PADCLIP: Pseudo-labeling with Adaptive Debiasing in CLIP for Unsupervised Domain Adaptation |
|
33 |
2023-06-26 |
link |
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis |
|
33 |
2022-11-10 |
link |
High Quality Entity Segmentation |
|
33 |
2022-11-30 |
link |
CLIPascene: Scene Sketching with Different Types and Levels of Abstraction |
|
33 |
2023-07-27 |
link |
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection |
|
33 |
2022-11-19 |
link |
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer |
|
32 |
2023-09-03 |
link |
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment |
|
32 |
2023-03-15 |
link |
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning |
|
32 |
2023-05-18 |
link |
Going Denser with Open-Vocabulary Part Segmentation |
|
32 |
2023-08-21 |
link |
Diffusion Model as Representation Learner |
|
32 |
2023-07-19 |
link |
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering |
|
32 |
2023-03-23 |
link |
First Session Adaptation: A Strong Replay-Free Baseline for Class-Incremental Learning |
|
32 |
2023-03-22 |
link |
Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding |
|
31 |
2022-12-05 |
link |
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields |
|
31 |
2023-07-24 |
link |
Less is More: Focus Attention for Efficient DETR |
|
31 |
2022-12-26 |
link |
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos |
|
31 |
2022-11-19 |
link |
MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception |
|
30 |
2023-07-17 |
link |
Revisiting Scene Text Recognition: A Data Perspective |
|
30 |
2023-06-13 |
link |
Efficient 3D Semantic Segmentation with Superpoint Transformer |
|
30 |
2023-08-29 |
link |
Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation |
|
30 |
2023-03-22 |
link |
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models |
|
30 |
2023-03-06 |
link |
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning |
|
30 |
2023-09-18 |
link |
Unified Coarse-to-Fine Alignment for Video-Text Retrieval |
|
30 |
2023-05-16 |
link |
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation |
|
30 |
2022-12-11 |
link |
COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec |
|
30 |
2023-08-18 |
link |
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning |
|
30 |
2023-05-02 |
link |
Neural LiDAR Fields for Novel View Synthesis |
|
30 |
2022-12-20 |
link |
Full-Body Articulated Human-Object Interaction |
|
30 |
2023-08-28 |
link |
Priority-Centric Human Motion Generation in Discrete Latent Space |
|
29 |
2023-03-28 |
link |
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation |
|
29 |
2023-04-03 |
link |
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction |
|
29 |
2023-04-21 |
link |
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models |
|
29 |
2023-03-24 |
link |
Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction |
|
29 |
2023-08-09 |
link |
Robust Object Modeling for Visual Tracking |
|
29 |
2023-03-12 |
link |
Traj-MAE: Masked Autoencoders for Trajectory Prediction |
|
29 |
2022-12-07 |
link |
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors |
|
29 |
2022-12-10 |
link |
Source-free Depth for Object Pop-out |
|
29 |
2023-09-22 |
link |
Cross-Modal Translation and Alignment for Survival Analysis |
|
29 |
2022-12-06 |
link |
Adaptive Testing of Computer Vision Models |
|
29 |
2023-05-18 |
link |
Inspecting the Geographical Representativeness of Images from Text-to-Image Models |
|
29 |
2023-07-25 |
link |
Spectrum-guided Multi-granularity Referring Video Object Segmentation |
|
29 |
2023-07-27 |
link |
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining |
|
29 |
2023-03-30 |
link |
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling |
|
28 |
2022-11-15 |
link |
A Low-Shot Object Counting Network With Iterative Prototype Adaptation |
|
28 |
2023-04-11 |
link |
SATR: Zero-Shot Semantic Segmentation of 3D Shapes |
|
28 |
2022-12-03 |
link |
StegaNeRF: Embedding Invisible Information within Neural Radiance Fields |
|
28 |
2023-07-28 |
link |
Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification |
|
28 |
2023-08-23 |
link |
SG-Former: Self-guided Transformer with Evolving Token Reallocation |
|
28 |
2023-03-27 |
link |
DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition |
|
28 |
2023-09-29 |
link |
Forward Flow for Novel View Synthesis of Dynamic Scenes |
|
28 |
2023-06-10 |
link |
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception |
|
28 |
2022-11-26 |
link |
Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation |
|
28 |
2023-04-03 |
link |
Navigating to Objects Specified by Images |
|
28 |
2023-03-15 |
link |
Stochastic Segmentation with Conditional Categorical Diffusion Models |
|
28 |
2023-03-21 |
link |
SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation |
|
28 |
2023-09-11 |
link |
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase |
|
28 |
2023-08-15 |
link |
Memory-and-Anticipation Transformer for Online Action Understanding |
|
28 |
2023-03-20 |
link |
Open-vocabulary Panoptic Segmentation with Embedding Modulation |
|
28 |
2023-07-24 |
link |
GridMM: Grid Memory Map for Vision-and-Language Navigation |
|
28 |
2023-05-29 |
link |
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers |
|
27 |
2023-03-12 |
link |
DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration |
|
27 |
2023-04-19 |
link |
Reference-guided Controllable Inpainting of Neural Radiance Fields |
|
27 |
2023-07-18 |
link |
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation |
|
27 |
2023-03-21 |
link |
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models |
|
27 |
2023-03-16 |
link |
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation |
|
27 |
2023-07-25 |
link |
E2VPT: An Effective and Efficient Approach for Visual Prompt Tuning |
|
27 |
2023-09-19 |
link |
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration |
|
26 |
2023-02-02 |
link |
Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion |
|
26 |
2023-07-20 |
link |
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces |
|
26 |
2023-07-14 |
link |
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection |
|
26 |
2023-06-14 |
link |
Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction |
|
26 |
2023-01-16 |
link |
UATVR: Uncertainty-Adaptive Text-Video Retrieval |
|
26 |
2023-06-21 |
link |
HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models |
|
26 |
2023-07-26 |
link |
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception |
|
26 |
2023-06-20 |
link |
Dynamic Perceiver for Efficient Visual Recognition |
|
26 |
2023-07-12 |
link |
GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video |
|
26 |
2023-08-09 |
link |
Bird’s-Eye-View Scene Graph for Vision-Language Navigation |
|
26 |
2023-04-10 |
link |
Detection Transformer with Stable Matching |
|
26 |
2023-01-02 |
link |
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation |
|
26 |
2023-02-17 |
link |
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts |
|
26 |
2022-11-26 |
link |
RbA: Segmenting Unknown Regions Rejected by All |
|
26 |
2023-01-24 |
link |
Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation |
|
26 |
2023-02-16 |
link |
Parallax-Tolerant Unsupervised Deep Image Stitching |
|
26 |
2022-12-05 |
link |
One-shot Implicit Animatable Avatars with Model-based Priors |
|
26 |
2023-09-16 |
link |
AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose |
|
26 |
2023-04-20 |
link |
Implicit Temporal Modeling with Learnable Alignment for Video Recognition |
|
26 |
2022-11-22 |
link |
CASSPR: Cross Attention Single Scan Place Recognition |
|
25 |
2023-04-03 |
link |
NeMF: Inverse Volume Rendering with Neural Microflake Field |
|
25 |
2023-03-08 |
link |
CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation |
|
25 |
2023-08-31 |
link |
FACET: Fairness in Computer Vision Evaluation Benchmark |
|
25 |
2023-08-31 |
link |
EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild |
|
25 |
2023-09-11 |
link |
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips |
|
25 |
2023-07-25 |
link |
Unmasking Anomalies in Road-Scene Segmentation |
|
25 |
2023-07-20 |
link |
General Image-to-Image Translation with One-Shot Image Guidance |
|
25 |
2023-03-23 |
link |
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World |
|
25 |
2023-02-14 |
link |
VQ3D: Learning a 3D-Aware Generative Model on ImageNet |
|
25 |
2023-01-09 |
link |
Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments |
|
25 |
2023-03-09 |
link |
GPGait: Generalized Pose-based Gait Recognition |
|
25 |
2023-04-04 |
link |
Black Box Few-Shot Adaptation for Vision-Language models |
|
25 |
2023-09-29 |
link |
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World |
|
25 |
2023-08-19 |
link |
SwinLSTM: Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM |
|
25 |
2023-07-31 |
link |
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy |
|
25 |
2023-08-27 |
link |
Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks |
|
25 |
2023-09-28 |
link |
FLIP: Cross-domain Face Anti-spoofing with Language Guidance |
|
24 |
2023-08-14 |
link |
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking |
|
24 |
2023-08-11 |
link |
Exploring Predicate Visual Context in Detecting of Human–Object Interactions |
|
24 |
2023-08-28 |
link |
HoloFusion: Towards Photo-realistic 3D Generative Modeling |
|
24 |
2023-07-24 |
link |
CTVIS: Consistent Training for Online Video Instance Segmentation |
|
24 |
2023-03-16 |
link |
DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars |
|
24 |
2023-08-21 |
link |
When Prompt-based Incremental Learning Does Not Meet Strong Pretraining |
|
24 |
2023-07-16 |
link |
Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections |
|
24 |
2023-09-19 |
link |
NDDepth: Normal-Distance Assisted Monocular Depth Estimation |
|
24 |
2023-01-05 |
link |
Event Camera Data Pre-training |
|
24 |
2022-10-03 |
link |
Masked Spiking Transformer |
|
24 |
2023-07-31 |
link |
CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification |
|
24 |
2023-08-21 |
link |
STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning |
|
24 |
2023-09-15 |
link |
Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion |
|
24 |
2023-08-18 |
link |
MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection |
|
23 |
2022-09-12 |
link |
PreSTU: Pre-Training for Scene-Text Understanding |
|
23 |
2023-10-01 |
link |
MasQCLIP for Open-Vocabulary Universal Image Segmentation |
|
23 |
2023-03-15 |
link |
Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution |
|
23 |
2023-07-09 |
link |
Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers |
|
23 |
2023-07-19 |
link |
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation |
|
23 |
2022-10-05 |
link |
Bayesian Prompt Learning for Image-Language Model Generalization |
|
23 |
2023-03-18 |
link |
Grounding 3D Object Affordance from 2D Interactions in Images |
|
23 |
2023-03-10 |
link |
Overwriting Pretrained Bias with Finetuning Data |
|
23 |
2023-08-26 |
link |
Beyond One-to-One: Rethinking the Referring Image Segmentation |
|
23 |
2022-08-19 |
link |
SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability |
|
23 |
2023-06-30 |
link |
FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis |
|
23 |
2023-08-07 |
link |
Part-Aware Transformer for Generalizable Person Re-identification |
|
23 |
2023-06-15 |
link |
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories |
|
23 |
2023-08-26 |
link |
Point-Query Quadtree for Crowd Counting, Localization, and More |
|
23 |
2022-06-18 |
link |
Gender Artifacts in Visual Datasets |
|
23 |
2022-11-17 |
link |
DETRDistill: A Universal Knowledge Distillation Framework for DETR-families |
|
23 |
2023-08-18 |
link |
RLIPv2: Fast Scaling of Relational Language-Image Pre-training |
|
22 |
2023-08-28 |
link |
R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras |
|
22 |
2022-11-23 |
link |
ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field |
|
22 |
2023-03-16 |
link |
Among Us: Adversarially Robust Collaborative Perception by Consensus |
|
22 |
2022-10-10 |
link |
FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training |
|
22 |
2022-11-17 |
link |
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones |
|
22 |
2023-07-16 |
link |
EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes |
|
22 |
2023-02-28 |
link |
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks |
|
22 |
2023-08-27 |
link |
High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net |
|
22 |
2023-08-08 |
link |
LATR: 3D Lane Detection from Monocular Images with Transformer |
|
22 |
2023-08-21 |
link |
Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction |
|
22 |
2023-07-29 |
link |
XMem++: Production-level Video Segmentation From Few Annotated Frames |
|
22 |
2023-04-21 |
link |
Deep Multiview Clustering by Contrasting Cluster Assignments |
|
22 |
2023-08-23 |
link |
Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack |
|
22 |
2023-02-28 |
link |
BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View Images |
|
22 |
2023-05-10 |
link |
Relightify: Relightable 3D Faces from a Single Image via Diffusion Models |
|
22 |
2023-07-20 |
link |
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization |
|
22 |
2023-01-17 |
link |
A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction |
|
22 |
2023-10-01 |
link |
GET: Group Event Transformer for Event-Based Vision |
|
22 |
2023-04-26 |
link |
Neural-PBIR Reconstruction of Shape, Material, and Illumination |
|
22 |
2023-07-15 |
link |
ExposureDiffusion: Learning to Expose for Low-light Image Enhancement |
|
22 |
2023-08-14 |
link |
S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields |
|
22 |
2023-10-01 |
link |
Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models |
|
22 |
2023-07-24 |
link |
A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation |
|
22 |
2023-10-01 |
link |
ContactGen: Generative Contact Modeling for Grasp Generation |
|
22 |
2023-08-05 |
link |
Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation |
|
22 |
2023-08-21 |
link |
MGMAE: Motion Guided Masking for Video Masked Autoencoding |
|
22 |
2023-09-26 |
link |
Generating Visual Scenes from Touch |
|
21 |
2023-06-15 |
link |
Evaluating Data Attribution for Text-to-Image Models |
|
21 |
2023-08-22 |
link |
ReFit: Recurrent Fitting Network for 3D Human Recovery |
|
21 |
2023-08-05 |
link |
An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability |
|
21 |
2023-08-14 |
link |
Masked Motion Predictors are Strong 3D Action Representation Learners |
|
21 |
2023-07-16 |
link |
Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer |
|
21 |
2023-08-09 |
link |
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization |
|
21 |
2023-08-01 |
link |
Online Prototype Learning for Online Continual Learning |
|
21 |
2023-09-02 |
link |
Contrastive Feature Masking Open-Vocabulary Vision Transformer |
|
21 |
2023-03-21 |
link |
Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation |
|
21 |
2023-06-09 |
link |
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds |
|
21 |
2023-07-21 |
link |
Core: Cooperative Reconstruction for Multi-Agent Perception |
|
21 |
2023-06-08 |
link |
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models |
|
21 |
2023-09-08 |
link |
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion |
|
21 |
2023-09-26 |
link |
Nearest Neighbor Guidance for Out-of-Distribution Detection |
|
21 |
2023-09-12 |
link |
Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning |
|
21 |
2023-07-27 |
link |
Clustering based Point Cloud Representation Learning for 3D Analysis |
|
21 |
2023-08-28 |
link |
Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor |
|
21 |
2023-08-16 |
link |
Membrane Potential Batch Normalization for Spiking Neural Networks |
|
21 |
2023-09-15 |
link |
Deformable Neural Radiance Fields using RGB and Event Cameras |
|
20 |
2023-04-03 |
link |
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network |
|
20 |
2023-04-05 |
link |
Dynamic Point Fields |
|
20 |
2022-12-26 |
link |
Generalized Differentiable RANSAC |
|
20 |
2023-03-28 |
link |
CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution |
|
20 |
2023-06-08 |
link |
LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs |
|
20 |
2023-03-16 |
link |
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion |
|
20 |
2023-07-19 |
link |
Generative Prompt Model for Weakly Supervised Object Localization |
|
20 |
2023-07-20 |
link |
BlendFace: Re-designing Identity Encoders for Face-Swapping |
|
20 |
2023-01-05 |
link |
CiT: Curation in Training for Effective Vision-Language Data |
|
20 |
2023-08-23 |
link |
Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields |
|
20 |
2023-08-01 |
link |
Improving Pixel-based MIM by Reducing Wasted Modeling Capability |
|
20 |
2023-07-14 |
link |
RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World |
|
20 |
2022-07-22 |
link |
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization |
|
20 |
2023-08-18 |
link |
Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events |
|
20 |
2023-08-24 |
link |
Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation |
|
20 |
2022-12-08 |
link |
Graph Matching with Bi-level Noisy Correspondence |
|
20 |
2022-11-04 |
link |
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis |
|
20 |
2023-08-15 |
link |
ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces |
|
20 |
2023-04-18 |
link |
Fast Neural Scene Flow |
|
20 |
2023-09-05 |
link |
Empowering Low-Light Image Enhancer through Customized Learnable Priors |
|
20 |
2023-03-09 |
link |
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition |
|
20 |
2023-07-27 |
link |
Diverse Inpainting and Editing with GAN Inversion |
|
20 |
2023-04-24 |
link |
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video |
|
20 |
2023-08-20 |
link |
March in Chat: Interactive Prompting for Remote Embodied Referring Expression |
|
20 |
2023-07-29 |
link |
CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation |
|
19 |
2023-09-04 |
link |
Mask-Attention-Free Transformer for 3D Instance Segmentation |
|
19 |
2023-09-05 |
link |
RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image |
|
19 |
2023-09-03 |
link |
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection |
|
19 |
2023-09-27 |
link |
SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations |
|
19 |
2023-07-26 |
link |
Adaptive Frequency Filters As Efficient Global Token Mixers |
|
19 |
2023-03-22 |
link |
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation |
|
19 |
2023-09-25 |
link |
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving |
|
19 |
2023-04-04 |
link |
NPC: Neural Point Characters from Video |
|
19 |
2023-08-14 |
link |
Dreamwalker: Mental Planning for Continuous Vision-Language Navigation |
|
19 |
2023-01-11 |
link |
LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis |
|
19 |
2023-04-04 |
link |
Towards Open-Vocabulary Video Instance Segmentation |
|
19 |
2022-12-09 |
link |
Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet |
|
19 |
2023-01-06 |
link |
Object as Query: Lifting any 2D Object Detector to 3D Detection |
|
19 |
2023-09-29 |
link |
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study |
|
19 |
2022-05-19 |
link |
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection |
|
19 |
2023-03-21 |
link |
TMA: Temporal Motion Aggregation for Event-based Optical Flow |
|
19 |
2023-09-24 |
link |
LogicSeg: Parsing Visual Semantics with Neural Logic Learning and Reasoning |
|
19 |
2023-07-26 |
link |
Human-centric Scene Understanding for 3D Large-scale Scenarios |
|
19 |
2023-04-17 |
link |
Pretrained Language Models as Visual Planners for Human Assistance |
|
18 |
2023-06-09 |
link |
Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction |
|
18 |
2023-03-22 |
link |
UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework |
|
18 |
2023-07-19 |
link |
What do neural networks learn in image classification? A frequency shortcut perspective |
|
18 |
2023-08-11 |
link |
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods |
|
18 |
2023-07-27 |
link |
Online Clustered Codebook |
|
18 |
2023-04-12 |
link |
Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views |
|
18 |
2023-09-11 |
link |
Class-Incremental Grouping Network for Continual Audio-Visual Learning |
|
18 |
2022-05-27 |
link |
Semi-supervised Semantics-guided Adversarial Training for Robust Trajectory Prediction |
|
18 |
2022-07-18 |
link |
UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird’s-Eye-View |
|
18 |
2023-08-28 |
link |
Referring Image Segmentation Using Text Supervision |
|
18 |
2023-08-18 |
link |
Self-Calibrated Cross Attention Network for Few-Shot Segmentation |
|
18 |
2023-07-21 |
link |
SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection |
|
18 |
2023-09-12 |
link |
Modality Unifying Network for Visible-Infrared Person Re-Identification |
|
18 |
2023-07-31 |
link |
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning |
|
18 |
2023-09-16 |
link |
ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images |
|
18 |
2022-11-14 |
link |
BiViT: Extremely Compressed Binary Vision Transformers |
|
18 |
2023-04-05 |
link |
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules |
|
18 |
2022-08-29 |
link |
SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection |
|
18 |
2023-03-24 |
link |
Anomaly Detection under Distribution Shift |
|
18 |
2023-08-25 |
link |
IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization |
|
17 |
2023-08-06 |
link |
Source-free Domain Adaptive Human Pose Estimation |
|
17 |
2023-09-03 |
link |
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models |
|
17 |
2023-07-20 |
link |
Cascade-DETR: Delving into High-Quality Universal Object Detection |
|
17 |
2023-07-20 |
link |
Lighting up NeRF via Unsupervised Decomposition and Enhancement |
|
17 |
2022-12-07 |
link |
Domain generalization of 3D semantic segmentation in autonomous driving |
|
17 |
2023-08-19 |
link |
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations |
|
17 |
2023-05-03 |
link |
DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion |
|
17 |
2023-08-14 |
link |
ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion |
|
17 |
2023-10-01 |
link |
Social Diffusion: Long-term Multiple Human Motion Anticipation |
|
17 |
2023-08-16 |
link |
Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network |
|
17 |
2023-04-24 |
link |
Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis |
|
17 |
2022-10-11 |
link |
Multi-Object Navigation with dynamically learned neural implicit representations |
|
17 |
2023-05-19 |
link |
Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models |
|
17 |
2023-08-13 |
link |
Compositional Feature Augmentation for Unbiased Scene Graph Generation |
|
17 |
2023-06-15 |
link |
Rosetta Neurons: Mining the Common Units in a Model Zoo |
|
17 |
2023-08-14 |
link |
PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects |
|
17 |
2023-08-10 |
link |
Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation |
|
17 |
2023-02-02 |
link |
Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors |
|
17 |
2022-11-01 |
link |
Self-supervised Character-to-Character Distillation for Text Recognition |
|
17 |
2023-04-26 |
link |
From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection |
|
17 |
2023-03-15 |
link |
Re-ReND: Real-time Rendering of NeRFs across Devices |
|
17 |
2023-07-18 |
link |
EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting |
|
16 |
2022-11-21 |
link |
Video Background Music Generation: Dataset, Method and Evaluation |
|
16 |
2023-08-06 |
link |
Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection |
|
16 |
2023-09-08 |
link |
Dynamic Mesh-Aware Radiance Fields |
|
16 |
2023-06-28 |
link |
Subclass-balancing Contrastive Learning for Long-tailed Recognition |
|
16 |
2023-08-22 |
link |
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts |
|
16 |
2023-08-07 |
link |
Heterogeneous Forgetting Compensation for Class-Incremental Learning |
|
16 |
2023-03-11 |
link |
DETA: Denoised Task Adaptation for Few-Shot Learning |
|
16 |
2023-03-31 |
link |
DIME-FM : DIstilling Multimodal and Efficient Foundation Models |
|
16 |
2023-07-26 |
link |
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution |
|
16 |
2023-08-11 |
link |
Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation |
|
16 |
2023-04-12 |
link |
Mesh2Tex: Generating Mesh Textures from Image Queries |
|
16 |
2023-09-21 |
link |
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance |
|
16 |
2023-09-18 |
link |
GEDepth: Ground Embedding for Monocular Depth Estimation |
|
16 |
2023-07-31 |
link |
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation |
|
16 |
2023-09-26 |
link |
Pre-training-free Image Manipulation Localization through Non-Mutually Exclusive Contrastive Learning |
|
16 |
2023-07-21 |
link |
Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition |
|
16 |
2022-11-28 |
link |
H3WB: Human3.6M 3D WholeBody Dataset and Benchmark |
|
16 |
2023-08-19 |
link |
Skill Transformer: A Monolithic Policy for Mobile Manipulation |
|
16 |
2022-12-22 |
link |
DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders |
|
16 |
2023-03-16 |
link |
Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less |
|
16 |
2023-08-07 |
link |
From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal |
|
16 |
2023-08-14 |
link |
Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents |
|
16 |
2023-01-05 |
link |
DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for Image Super-Resolution |
|
16 |
2023-03-20 |
link |
Robustifying Token Attention for Vision Transformers |
|
16 |
2023-09-14 |
link |
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning |
|
16 |
2023-05-25 |
link |
Action Sensitivity Learning for Temporal Action Localization |
|
16 |
2023-02-10 |
link |
Leveraging Inpainting for Single-Image Shadow Removal |
|
16 |
2023-08-20 |
link |
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer |
|
16 |
2023-10-01 |
link |
Lighting Every Darkness in Two Pairs : A Calibration-Free Pipeline for RAW Denoising |
|
16 |
2023-08-18 |
link |
Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning |
|
16 |
2023-07-18 |
link |
Object-aware Gaze Target Detection |
|
16 |
2023-07-19 |
link |
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions |
|
15 |
2023-08-22 |
link |
Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape |
|
15 |
2023-03-29 |
link |
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation |
|
15 |
2023-04-24 |
link |
Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection |
|
15 |
2023-09-28 |
link |
Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis |
|
15 |
2023-07-31 |
link |
Random Sub-Samples Generation for Self-Supervised Real Image Denoising |
|
15 |
2023-08-13 |
link |
RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks |
|
15 |
2023-08-28 |
link |
CLNeRF: Continual Learning Meets NeRF |
|
15 |
2023-07-20 |
link |
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection |
|
15 |
2023-03-10 |
link |
GECCO: Geometrically-Conditioned Point Diffusion Models |
|
15 |
2023-08-02 |
link |
Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation |
|
15 |
2023-08-17 |
link |
Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling |
|
15 |
2023-08-02 |
link |
Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation |
|
15 |
2023-08-20 |
link |
DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization |
|
15 |
2023-08-23 |
link |
Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification |
|
15 |
2023-04-27 |
link |
ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs |
|
15 |
2023-03-24 |
link |
UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields |
|
15 |
2023-09-15 |
link |
PoseFix: Correcting 3D Human Poses with Natural Language |
|
15 |
2023-08-10 |
link |
Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation |
|
15 |
2023-08-15 |
link |
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval |
|
15 |
2023-09-26 |
link |
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation |
|
15 |
2023-07-27 |
link |
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models |
|
15 |
2022-10-16 |
link |
Scratching Visual Transformer's Back with Uniform Attention |
|
15 |
2023-08-20 |
link |
DomainAdaptor: A Novel Approach to Test-time Adaptation |
|
15 |
2023-08-19 |
link |
Calibrating Uncertainty for Semi-Supervised Crowd Counting |
|
15 |
2023-07-27 |
link |
P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds |
|
15 |
2023-08-01 |
link |
ELFNet: Evidential Local-global Fusion for Stereo Matching |
|
15 |
2023-08-19 |
link |
3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation |
|
15 |
2023-03-09 |
link |
MBPTrack: Improving 3D Point Cloud Tracking with Memory networks and Box Priors |
|
15 |
2023-07-20 |
link |
See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data |
|