Last updated: 2024-12-09 08:49:40. Maintained by Weisen Jiang.

citation date review title (pdf) authors
4688 2023-04-05 link Segment Anything
2833 2023-02-10 link Adding Conditional Control to Text-to-Image Diffusion Models
1116 2022-12-19 link Scalable Diffusion Models with Transformers
786 2023-03-20 link Zero-1-to-3: Zero-shot One Image to 3D Object
531 2022-12-22 link Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video
Generation
464 2023-03-27 link Sigmoid Loss for Language Image Pre-Training
445 2023-03-24 link Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content
Creation
403 2023-03-23 link Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
403 2023-02-06 link Structure and Content-Guided Video Synthesis with Diffusion Models
347 2022-11-17 link DiffusionDet: Diffusion Model for Object Detection
341 2023-03-14 link ViperGPT: Visual Inference via Python Execution for Reasoning
290 2022-06-02 link PETRv2: A Unified Framework for 3D Perception from Multi-Camera
Images
285 2023-04-17 link MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis
and Editing
283 2022-12-08 link LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
Language Models
277 2023-03-22 link Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
262 2023-03-24 link Make-It-3D: High-Fidelity 3D Creation from A Single Image with
Diffusion Prior
252 2023-03-16 link LERF: Language Embedded Radiance Fields
247 2023-02-27 link ELITE: Encoding Visual Concepts into Textual Embeddings for Customized
Text-to-Image Generation
238 2023-06-23 link LightGlue: Local Feature Matching at Light Speed
212 2022-11-22 link DETRs with Collaborative Hybrid Assignments Training
198 2022-05-30 link Prompt-aligned Gradient for Prompt Tuning
197 2023-03-13 link Erasing Concepts from Diffusion Models
196 2023-03-20 link SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
192 2023-04-05 link Generative Novel View Synthesis with 3D-Aware Diffusion Models
187 2023-03-22 link Pix2Video: Video Editing using Image Diffusion
185 2023-05-17 link Preserve Your Own Correlation: A Noise Prior for Video
Diffusion Models
184 2023-03-23 link DreamBooth3D: Subject-Driven Text-to-3D Generation
183 2022-12-10 link NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view
Reconstruction
179 2022-12-05 link PhysDiff: Physics-Guided Human Motion Diffusion Model
161 2023-03-03 link Unleashing Text-to-Image Diffusion Models for Visual Perception
160 2022-09-07 link What does a platypus look like? Generating customized prompts
for zero-shot image classification
151 2023-03-21 link TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question
Answering
151 2023-03-20 link Text2Tex: Text-driven Texture Synthesis via Diffusion Models
145 2023-03-16 link Large Selective Kernel Network for Remote Sensing Object Detection
143 2023-03-16 link SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
141 2023-03-21 link Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
139 2023-07-20 link BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
139 2023-03-21 link Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object
Detection
134 2023-01-02 link CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
132 2023-03-16 link DiffIR: Efficient Diffusion Model for Image Restoration
130 2022-02-08 link DALL-EVAL: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models
126 2023-04-13 link Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation
and Reconstruction
126 2023-03-16 link DIRE for Diffusion-Generated Image Detection
125 2023-03-28 link SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis
124 2023-03-23 link Ablating Concepts in Text-to-Image Diffusion Models
123 2023-03-27 link The Stable Signature: Rooting Watermarks in Latent Diffusion Models
122 2023-04-11 link OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
119 2023-08-18 link StableVideo: Text-driven Consistency-aware Diffusion Video Editing
113 2023-03-21 link VAD: Vectorized Scene Representation for Efficient Autonomous Driving
112 2022-12-30 link Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation
Learning
112 2022-12-15 link Rethinking Vision Transformers for MobileNet Size and Speed
111 2022-10-03 link CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth
Pre-Training
110 2023-03-16 link Efficient Diffusion Training via Min-SNR Weighting Strategy
110 2023-08-22 link ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes
109 2023-03-17 link FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
108 2023-03-25 link Masked Diffusion Transformer is a Strong Image Synthesizer
108 2023-06-08 link Tracking Everything Everywhere All at Once
105 2023-03-07 link OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy
Perception
104 2023-03-28 link Unmasked Teacher: Towards Training-Efficient Video Foundation Models
103 2023-04-03 link ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
100 2023-06-05 link Scene as Occupancy
99 2023-03-30 link DDP: Diffusion Model for Dense Visual Prediction
98 2023-03-21 link DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation
Using Diffusion Models
98 2023-08-01 link FLatten Transformer: Vision Transformer using Focused Linear Attention
98 2023-02-03 link MOSE: A New Dataset for Video Object Segmentation in
Complex Scenes
97 2023-08-07 link Dual Aggregation Transformer for Image Super-Resolution
97 2023-02-08 link Q-Diffusion: Quantizing Diffusion Models
97 2023-03-29 link HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion
96 2022-11-21 link PointCLIP V2: Prompting CLIP and GPT for Powerful 3D
Open-world Learning
96 2023-07-21 link Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
96 2023-06-14 link TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement
95 2023-03-03 link Delicate Textured Mesh Recovery from NeRF via Adaptive Surface
Refinement
93 2023-04-13 link What does CLIP know about a red circle? Visual
prompt engineering for VLMs
92 2023-04-09 link Point-SLAM: Dense Neural Point Cloud-based SLAM
90 2023-08-24 link Dense Text-to-Image Generation with Attention Modulation
90 2023-03-24 link FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
88 2023-09-07 link Tracking Anything with Decoupled Video Segmentation
87 2023-03-13 link DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
86 2022-11-09 link Exploring Video Quality Assessment on User Generated Contents from
Aesthetic and Technical Perspectives
86 2023-07-20 link Metric3D: Towards Zero-shot Metric 3D Prediction from A Single
Image
86 2022-07-26 link Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
84 2023-03-20 link Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
83 2022-08-23 link Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition
82 2022-10-20 link SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
80 2023-09-05 link GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction
80 2023-06-13 link Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data
77 2023-03-09 link Rethinking Range View Representation for LiDAR Segmentation
75 2023-04-14 link Delta Denoising Score
75 2023-08-15 link StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
74 2023-08-04 link Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for
Image Fusion and Segmentation
74 2022-10-12 link MotionBERT: A Unified Perspective on Learning Human Motion Representations
73 2022-10-03 link Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
72 2023-08-31 link InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
72 2023-07-27 link PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking
72 2023-03-20 link EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation
71 2023-08-18 link SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
71 2023-07-31 link UniVTG: Towards Unified Video-Language Temporal Grounding
70 2023-03-30 link Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
69 2023-04-23 link Score-Based Diffusion Models as Principled Priors for Inverse Imaging
69 2023-06-06 link ATT3D: Amortized Text-to-3D Object Synthesis
67 2022-07-04 link I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
66 2023-03-10 link GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction
and Planning for Autonomous Driving
65 2023-02-23 link Teaching CLIP to Count to Ten
65 2023-09-28 link MotionLM: Multi-Agent Motion Forecasting as Language Modeling
64 2023-03-27 link Zero-Shot Composed Image Retrieval with Textual Inversion
63 2023-04-02 link UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum
and Iterative Generalist-Specialist Learning
62 2023-04-13 link Expressive Text-to-Image Generation with Rich Text
62 2023-08-16 link MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions
61 2023-03-21 link Vox-E: Text-guided Voxel Editing of 3D Objects
61 2023-03-30 link AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized
Shape and Pose Control
61 2023-08-25 link Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion
Model
61 2023-08-23 link CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say
No
61 2023-04-13 link DiffFit: Unlocking Transferability of Large Diffusion Models via Simple
Parameter-Efficient Fine-Tuning
60 2023-03-14 link Editing Implicit Assumptions in Text-to-Image Diffusion Models
59 2023-03-30 link Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
59 2023-02-28 link DREAM: Efficient Dataset Distillation by Representative Matching
59 2023-04-09 link HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image
Generation
59 2023-08-08 link 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
59 2022-01-03 link Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning
58 2023-04-11 link HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
58 2023-03-23 link End-to-End Diffusion Latent Optimization Improves Classifier Guidance
56 2023-03-13 link Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic
and Compositional Images
56 2022-12-30 link HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
55 2023-03-21 link Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation
55 2023-05-02 link TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis
55 2023-03-21 link Implicit Neural Representation for Cooperative Low-light Image Enhancement
54 2023-03-17 link A Unified Continual Learning Framework with General Parameter-Efficient Tuning
54 2022-03-24 link MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
54 2023-03-14 link Adaptive Rotated Convolution for Rotated Object Detection
54 2023-09-07 link ProPainter: Improving Propagation and Transformer for Video Inpainting
53 2022-11-25 link BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
53 2023-08-04 link FB-BEV: BEV Representation from Forward-Backward View Transformations
53 2023-03-19 link SKED: Sketch-guided Text-based 3D Editing
52 2023-03-31 link Diffusion Action Segmentation
52 2023-04-13 link Verbs in Action: Improving verb understanding in video-language models
52 2022-04-06 link Unleashing Vanilla Vision Transformer with Masked Image Modeling for
Object Detection
51 2023-03-31 link 3D-aware Image Generation using 2D Diffusion Models
51 2023-04-03 link Not All Features Matter: Enhancing Few-shot CLIP with Adaptive
Prior Refinement
51 2023-03-22 link SHERF: Generalizable Human NeRF from a Single Image
51 2023-01-03 link Rethinking Mobile Block for Efficient Attention-based Models
50 2023-04-12 link UniverSeg: Universal Medical Image Segmentation
50 2023-07-11 link EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
49 2023-02-27 link Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution
49 2022-12-16 link RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
49 2023-03-15 link VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
48 2023-06-12 link Waffling around for Performance: Visual Classification with Random Words
and Broad Concepts
48 2023-06-23 link Zero-shot spatial layout conditioning for text-to-image diffusion models
48 2023-08-27 link MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for
Image Dehazing
48 2023-08-31 link PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction
47 2023-06-29 link Towards Zero-Shot Scale-Aware Monocular Depth Estimation
47 2022-11-21 link Parametric Classification for Generalized Category Discovery: A Baseline Study
46 2023-03-27 link SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision
Applications
45 2023-07-17 link Scale-Aware Modulation Meet Transformer
45 2021-11-18 link One-Shot Generative Domain Adaptation
45 2023-08-01 link DriveAdapter: Breaking the Coupling Barrier of Perception and Planning
in End-to-End Autonomous Driving
45 2023-06-27 link PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
45 2023-04-03 link CRN: Camera Radar Net for Accurate, Robust, Efficient 3D
Perception
45 2023-03-19 link NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry
and Mapping
44 2023-09-28 link MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering
and Beyond
44 2023-03-12 link Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language
Models
44 2022-11-29 link DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models
44 2023-08-29 link Learning to Upsample by Learning to Sample
44 2023-03-23 link From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach
with Normalized Loss and Customized Soft Labels
43 2022-11-22 link DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT
Reconstruction
43 2023-03-17 link Denoising Diffusion Autoencoders are Unified Self-supervised Learners
43 2023-07-21 link Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring
Image Segmentation
42 2023-05-10 link Perpetual Humanoid Control for Real-time Simulated Avatars
41 2023-07-26 link Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training
Models
41 2023-03-15 link Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
41 2022-10-18 link Perceptual Grouping in Contrastive Vision-Language Models
41 2022-12-30 link Imitator: Personalized Speech-driven 3D Facial Animation
41 2023-08-11 link Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning
40 2023-05-21 link Synthesizing Diverse Human Motions in 3D Indoor Scenes
40 2023-03-30 link NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation
40 2023-04-04 link Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion
Image Editing
40 2023-03-17 link No Fear of Classifier Biases: Neural Collapse Inspired Federated
Learning with Synthetic and Fixed Classifier
40 2022-12-04 link Multiscale Structure Guided Diffusion for Image Deblurring
40 2022-12-09 link Audiovisual Masked Autoencoders
40 2023-01-23 link InfiniCity: Infinite-Scale City Synthesis
40 2022-10-02 link IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel
View Synthesis
39 2023-08-21 link EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition
39 2023-07-18 link Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait
Synthesis
39 2023-08-15 link UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View
Representation
39 2023-05-03 link AG3D: Learning to Generate 3D Avatars from 2D Image
Collections
39 2023-08-19 link Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
38 2023-06-13 link Hidden Biases of End-to-End Driving Models
38 2022-12-01 link UniT3D: A Unified Transformer for 3D Dense Captioning and
Visual Grounding
38 2023-04-07 link V3Det: Vast Vocabulary Visual Detection Dataset
38 2023-04-07 link Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity
Text-to-Image Synthesis
38 2023-03-20 link Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction
with Reflection
38 2023-09-11 link ITI-Gen: Inclusive Text-to-Image Generation
37 2023-09-02 link AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
37 2023-09-27 link NeuRBF: A Neural Fields Representation with Adaptive Radial Basis
Functions
37 2023-09-11 link Multi3DRefer: Grounding Text Description to Multiple 3D Objects
37 2023-03-29 link ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding
37 2023-03-16 link Robust Evaluation of Diffusion-Based Adversarial Purification
37 2023-08-21 link UnLoc: A Unified Framework for Video Localization Tasks
37 2023-02-22 link Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia
Entities
37 2023-03-09 link ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document
Information Extraction
37 2023-03-17 link DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
36 2023-05-08 link Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models
36 2023-04-05 link TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
36 2023-04-12 link SiLK: Simple Learned Keypoints
36 2023-09-26 link Structure Invariant Transformation for better Adversarial Transferability
36 2023-08-24 link NeO 360: Neural Fields for Sparse View Synthesis of
Outdoor Scenes
36 2023-03-30 link Going Beyond Nouns With Vision & Language Models Using
Synthetic Data
36 2023-07-26 link ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation
35 2023-04-19 link Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra
35 2023-03-23 link ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting
35 2023-03-15 link Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models
35 2023-08-21 link Texture Generation on 3D Meshes with Point-UV Diffusion
35 2022-11-15 link Will Large-scale Generative Models Corrupt Future Datasets?
35 2023-04-27 link SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object
Detection
35 2023-07-24 link PRIOR: Prototype Representation Joint Learning from Medical Images and
Reports
35 2023-05-15 link Document Understanding Dataset and Evaluation (DUDE)
35 2023-09-10 link Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
34 2023-04-20 link HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer
34 2023-04-14 link Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
34 2023-07-26 link Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception
34 2022-03-15 link ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human
Motion Generation
34 2022-11-20 link Normalizing Flows for Human Pose Anomaly Detection
34 2022-11-17 link SPACE: Speech-driven Portrait Animation with Controllable Expression
34 2023-01-03 link Cross Modal Transformer: Towards Fast and Robust 3D Object
Detection
34 2023-09-21 link TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
34 2023-01-05 link All in Tokens: Unifying Output Space of Visual Tasks
via Soft Token
34 2023-04-24 link Enhancing Fine-Tuning based Backdoor Defense with Sharpness-Aware Minimization
34 2023-07-27 link PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
34 2023-02-07 link HumanMAC: Masked Motion Completion for Human Motion Prediction
34 2023-08-30 link Introducing Language Guidance in Prompt-based Continual Learning
34 2023-08-16 link ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
34 2022-11-18 link LVOS: A Benchmark for Long-term Video Object Segmentation
34 2023-06-06 link DVIS: Decoupled Video Instance Segmentation Framework
33 2022-12-07 link FineDance: A Fine-grained Choreography Dataset for 3D Full Body
Dance Generation
33 2023-03-22 link RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point
Cloud Registration
33 2023-08-18 link Robust Monocular Depth Estimation under Challenging Conditions
33 2023-03-28 link X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via
Dynamic Textual Guidance
33 2023-04-20 link SINC: Spatial Composition of 3D Human Motions for Simultaneous
Action Generation
33 2023-09-12 link Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
33 2023-04-09 link ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous
States in Realistic 3D Scenes
33 2023-04-10 link Instance Neural Radiance Field
33 2023-07-28 link Scaling Data Generation in Vision-and-Language Navigation
33 2023-07-14 link Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
33 2023-04-06 link Diffusion Models as Masked Autoencoders
33 2023-07-20 link Urban Radiance Field Representation with Deformable Neural Mesh Primitives
33 2023-03-21 link CC3D: Layout-Conditioned Generation of Compositional 3D Scenes
33 2023-10-01 link PADCLIP: Pseudo-labeling with Adaptive Debiasing in CLIP for Unsupervised
Domain Adaptation
33 2023-06-26 link A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
33 2022-11-10 link High Quality Entity Segmentation
33 2022-11-30 link CLIPascene: Scene Sketching with Different Types and Levels of
Abstraction
33 2023-07-27 link NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object
Detection
33 2022-11-19 link TORE: Token Reduction for Efficient Human Mesh Recovery with
Transformer
32 2023-09-03 link EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
32 2023-03-15 link Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
32 2023-05-18 link Going Denser with Open-Vocabulary Part Segmentation
32 2023-08-21 link Diffusion Model as Representation Learner
32 2023-07-19 link DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric
Rendering
32 2023-03-23 link First Session Adaptation: A Strong Replay-Free Baseline for Class-Incremental
Learning
32 2023-03-22 link Make Encoder Great Again in 3D GAN Inversion through
Geometry and Occlusion-Aware Encoding
31 2022-12-05 link SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields
31 2023-07-24 link Less is More: Focus Attention for Efficient DETR
31 2022-12-26 link MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular
Videos
31 2022-11-19 link MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception
30 2023-07-17 link Revisiting Scene Text Recognition: A Data Perspective
30 2023-06-13 link Efficient 3D Semantic Segmentation with Superpoint Transformer
30 2023-08-29 link Efficient Model Personalization in Federated Learning via Client-Specific Prompt
Generation
30 2023-03-22 link FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
30 2023-03-06 link CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
30 2023-09-18 link Unified Coarse-to-Fine Alignment for Video-Text Retrieval
30 2023-05-16 link Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
30 2022-12-11 link COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec
30 2023-08-18 link Small Object Detection via Coarse-to-fine Proposal Generation and Imitation
Learning
30 2023-05-02 link Neural LiDAR Fields for Novel View Synthesis
30 2022-12-20 link Full-Body Articulated Human-Object Interaction
30 2023-08-28 link Priority-Centric Human Motion Generation in Discrete Latent Space
29 2023-03-28 link Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
Report Generation
29 2023-04-03 link Temporal Enhanced Training of Multi-view 3D Object Detector via
Historical Object Prediction
29 2023-04-21 link Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
29 2023-03-24 link Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction
29 2023-08-09 link Robust Object Modeling for Visual Tracking
29 2023-03-12 link Traj-MAE: Masked Autoencoders for Trajectory Prediction
29 2022-12-07 link Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
29 2022-12-10 link Source-free Depth for Object Pop-out
29 2023-09-22 link Cross-Modal Translation and Alignment for Survival Analysis
29 2022-12-06 link Adaptive Testing of Computer Vision Models
29 2023-05-18 link Inspecting the Geographical Representativeness of Images from Text-to-Image Models
29 2023-07-25 link Spectrum-guided Multi-granularity Referring Video Object Segmentation
29 2023-07-27 link Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining
29 2023-03-30 link SynBody: Synthetic Dataset with Layered Human Models for 3D
Human Perception and Modeling
28 2022-11-15 link A Low-Shot Object Counting Network With Iterative Prototype Adaptation
28 2023-04-11 link SATR: Zero-Shot Semantic Segmentation of 3D Shapes
28 2022-12-03 link StegaNeRF: Embedding Invisible Information within Neural Radiance Fields
28 2023-07-28 link Multiple Instance Learning Framework with Masked Hard Instance Mining
for Whole Slide Image Classification
28 2023-08-23 link SG-Former: Self-guided Transformer with Evolving Token Reallocation
28 2023-03-27 link DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition
28 2023-09-29 link Forward Flow for Novel View Synthesis of Dynamic Scenes
28 2023-06-10 link Aria Digital Twin: A New Benchmark Dataset for Egocentric
3D Machine Perception
28 2022-11-26 link Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic
Segmentation
28 2023-04-03 link Navigating to Objects Specified by Images
28 2023-03-15 link Stochastic Segmentation with Conditional Categorical Diffusion Models
28 2023-03-21 link SALAD: Part-Level Latent Diffusion for 3D Shape Generation and
Manipulation
28 2023-09-11 link UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the
OpenPCSeg Codebase
28 2023-08-15 link Memory-and-Anticipation Transformer for Online Action Understanding
28 2023-03-20 link Open-vocabulary Panoptic Segmentation with Embedding Modulation
28 2023-07-24 link GridMM: Grid Memory Map for Vision-and-Language Navigation
28 2023-05-29 link DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
27 2023-03-12 link DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image
Restoration
27 2023-04-19 link Reference-guided Controllable Inpainting of Neural Radiance Fields
27 2023-07-18 link OnlineRefer: A Simple Online Baseline for Referring Video Object
Segmentation
27 2023-03-21 link LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic
Models
27 2023-03-16 link Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
27 2023-07-25 link E2VPT: An Effective and Efficient Approach for Visual Prompt
Tuning
27 2023-09-19 link AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for
Automated Diffusion Model Acceleration
26 2023-02-02 link Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image
Fusion
26 2023-07-20 link HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and
Retarget Faces
26 2023-07-14 link Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly
Detection
26 2023-06-14 link Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency
for Survival Prediction
26 2023-01-16 link UATVR: Uncertainty-Adaptive Text-Video Retrieval
26 2023-06-21 link HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
26 2023-07-26 link AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive
Driving Perception
26 2023-06-20 link Dynamic Perceiver for Efficient Visual Recognition
26 2023-07-12 link GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human
Pose Estimation from Monocular Video
26 2023-08-09 link Bird’s-Eye-View Scene Graph for Vision-Language Navigation
26 2023-04-10 link Detection Transformer with Stable Matching
26 2023-01-02 link Betrayed by Captions: Joint Caption Grounding and Generation for
Open Vocabulary Instance Segmentation
26 2023-02-17 link Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
26 2022-11-26 link RbA: Segmenting Unknown Regions Rejected by All
26 2023-01-24 link Using a Waffle Iron for Automotive Point Cloud Semantic
Segmentation
26 2023-02-16 link Parallax-Tolerant Unsupervised Deep Image Stitching
26 2022-12-05 link One-shot Implicit Animatable Avatars with Model-based Priors
26 2023-09-16 link AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven
Hand Pose
26 2023-04-20 link Implicit Temporal Modeling with Learnable Alignment for Video Recognition
26 2022-11-22 link CASSPR: Cross Attention Single Scan Place Recognition
25 2023-04-03 link NeMF: Inverse Volume Rendering with Neural Microflake Field
25 2023-03-08 link CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit
Representation
25 2023-08-31 link FACET: Fairness in Computer Vision Evaluation Benchmark
25 2023-08-31 link EMDB: The Electromagnetic Database of Global 3D Human Pose
and Shape in the Wild
25 2023-09-11 link Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
25 2023-07-25 link Unmasking Anomalies in Road-Scene Segmentation
25 2023-07-20 link General Image-to-Image Translation with One-Shot Image Guidance
25 2023-03-23 link Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in
an Open World
25 2023-02-14 link VQ3D: Learning a 3D-Aware Generative Model on ImageNet
25 2023-01-09 link Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments
25 2023-03-09 link GPGait: Generalized Pose-based Gait Recognition
25 2023-04-04 link Black Box Few-Shot Adaptation for Vision-Language models
25 2023-09-29 link HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI
Assistants in the Real World
25 2023-08-19 link SwinLSTM: Improving Spatiotemporal Prediction Accuracy using Swin Transformer and
LSTM
25 2023-07-31 link Revisiting the Parameter Efficiency of Adapters from the Perspective
of Precision Redundancy
25 2023-08-27 link Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal
of Raindrops and Rain Streaks
25 2023-09-28 link FLIP: Cross-domain Face Anti-spoofing with Language Guidance
24 2023-08-14 link Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
24 2023-08-11 link Exploring Predicate Visual Context in Detecting of Human–Object Interactions
24 2023-08-28 link HoloFusion: Towards Photo-realistic 3D Generative Modeling
24 2023-07-24 link CTVIS: Consistent Training for Online Video Instance Segmentation
24 2023-03-16 link DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human
Avatars
24 2023-08-21 link When Prompt-based Incremental Learning Does Not Meet Strong Pretraining
24 2023-07-16 link Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained
Image Collections
24 2023-09-19 link NDDepth: Normal-Distance Assisted Monocular Depth Estimation
24 2023-01-05 link Event Camera Data Pre-training
24 2022-10-03 link Masked Spiking Transformer
24 2023-07-31 link CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification
24 2023-08-21 link STEERER: Resolving Scale Variations for Counting and Localization via
Selective Inheritance Learning
24 2023-09-15 link Robust e-NeRF: NeRF from Sparse & Noisy Events under
Non-Uniform Motion
24 2023-08-18 link MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection
23 2022-09-12 link PreSTU: Pre-Training for Scene-Text Understanding
23 2023-10-01 link MasQCLIP for Open-Vocabulary Universal Image Segmentation
23 2023-03-15 link Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution
23 2023-07-09 link Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers
23 2023-07-19 link Implicit Identity Representation Conditioned Memory Compensation Network for Talking
Head Video Generation
23 2022-10-05 link Bayesian Prompt Learning for Image-Language Model Generalization
23 2023-03-18 link Grounding 3D Object Affordance from 2D Interactions in Images
23 2023-03-10 link Overwriting Pretrained Bias with Finetuning Data
23 2023-08-26 link Beyond One-to-One: Rethinking the Referring Image Segmentation
23 2022-08-19 link SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability
23 2023-06-30 link FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis
23 2023-08-07 link Part-Aware Transformer for Generalizable Person Re-identification
23 2023-06-15 link Encyclopedic VQA: Visual questions about detailed properties of fine-grained
categories
23 2023-08-26 link Point-Query Quadtree for Crowd Counting, Localization, and More
23 2022-06-18 link Gender Artifacts in Visual Datasets
23 2022-11-17 link DETRDistill: A Universal Knowledge Distillation Framework for DETR-families
23 2023-08-18 link RLIPv2: Fast Scaling of Relational Language-Image Pre-training
22 2023-08-28 link R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple
Cameras
22 2022-11-23 link ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field
22 2023-03-16 link Among Us: Adversarially Robust Collaborative Perception by Consensus
22 2022-10-10 link FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training
22 2022-11-17 link EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
22 2023-07-16 link EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes
22 2023-02-28 link Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural
Networks
22 2023-08-27 link High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset
and A Frequency-Aware Shadow Erasing Net
22 2023-08-08 link LATR: 3D Lane Detection from Monocular Images with Transformer
22 2023-08-21 link Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction
22 2023-07-29 link XMem++: Production-level Video Segmentation From Few Annotated Frames
22 2023-04-21 link Deep Multiview Clustering by Contrasting Cluster Assignments
22 2023-08-23 link Does Physical Adversarial Example Really Matter to Autonomous Driving?
Towards System-Level Effect of Adversarial Object Evasion Attack
22 2023-02-28 link BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View
Images
22 2023-05-10 link Relightify: Relightable 3D Faces from a Single Image via
Diffusion Models
22 2023-07-20 link EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
22 2023-01-17 link A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel
View Synthesis and Implicit Scene Reconstruction
22 2023-10-01 link GET: Group Event Transformer for Event-Based Vision
22 2023-04-26 link Neural-PBIR Reconstruction of Shape, Material, and Illumination
22 2023-07-15 link ExposureDiffusion: Learning to Expose for Low-light Image Enhancement
22 2023-08-14 link S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for
Neural Fields
22 2023-10-01 link Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
22 2023-07-24 link A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative
Learning for Semantic Segmentation
22 2023-10-01 link ContactGen: Generative Contact Modeling for Grasp Generation
22 2023-08-05 link Sketch and Text Guided Diffusion Model for Colored Point
Cloud Generation
22 2023-08-21 link MGMAE: Motion Guided Masking for Video Masked Autoencoding
22 2023-09-26 link Generating Visual Scenes from Touch
21 2023-06-15 link Evaluating Data Attribution for Text-to-Image Models
21 2023-08-22 link ReFit: Recurrent Fitting Network for 3D Human Recovery
21 2023-08-05 link An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial
Transferability
21 2023-08-14 link Masked Motion Predictors are Strong 3D Action Representation Learners
21 2023-07-16 link Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View
Transformer
21 2023-08-09 link GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization
21 2023-08-01 link Online Prototype Learning for Online Continual Learning
21 2023-09-02 link Contrastive Feature Masking Open-Vocabulary Vision Transformer
21 2023-03-21 link Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation
21 2023-06-09 link DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential
Point Clouds
21 2023-07-21 link Core: Cooperative Reconstruction for Multi-Agent Perception
21 2023-06-08 link Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
21 2023-09-08 link The Power of Sound (TPoS): Audio Reactive Video Generation
with Stable Diffusion
21 2023-09-26 link Nearest Neighbor Guidance for Out-of-Distribution Detection
21 2023-09-12 link Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning
21 2023-07-27 link Clustering based Point Cloud Representation Learning for 3D Analysis
21 2023-08-28 link Multi-Modal Neural Radiance Field for Monocular Dense SLAM with
a Light-Weight ToF Sensor
21 2023-08-16 link Membrane Potential Batch Normalization for Spiking Neural Networks
21 2023-09-15 link Deformable Neural Radiance Fields using RGB and Event Cameras
20 2023-04-03 link Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
20 2023-04-05 link Dynamic Point Fields
20 2022-12-26 link Generalized Differentiable RANSAC
20 2023-03-28 link CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image
Arbitrary-Scale Super Resolution
20 2023-06-08 link LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed
NeRFs
20 2023-03-16 link DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
20 2023-07-19 link Generative Prompt Model for Weakly Supervised Object Localization
20 2023-07-20 link BlendFace: Re-designing Identity Encoders for Face-Swapping
20 2023-01-05 link CiT: Curation in Training for Effective Vision-Language Data
20 2023-08-23 link Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields
20 2023-08-01 link Improving Pixel-based MIM by Reducing Wasted Modeling Capability
20 2023-07-14 link RFLA: A Stealthy Reflected Light Adversarial Attack in the
Physical World
20 2022-07-22 link Divide and Conquer: 3D Point Cloud Instance Segmentation With
Point-Wise Binarization
20 2023-08-18 link Label-Free Event-based Object Recognition via Joint Learning with Image
Reconstruction from Events
20 2023-08-24 link Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation
20 2022-12-08 link Graph Matching with Bi-level Noisy Correspondence
20 2022-11-04 link Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis
20 2023-08-15 link ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces
20 2023-04-18 link Fast Neural Scene Flow
20 2023-09-05 link Empowering Low-Light Image Enhancer through Customized Learnable Priors
20 2023-03-09 link MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual
Speech Translation and Recognition
20 2023-07-27 link Diverse Inpainting and Editing with GAN Inversion
20 2023-04-24 link HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single
Video
20 2023-08-20 link March in Chat: Interactive Prompting for Remote Embodied Referring
Expression
20 2023-07-29 link CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation
19 2023-09-04 link Mask-Attention-Free Transformer for 3D Instance Segmentation
19 2023-09-05 link RawHDR: High Dynamic Range Image Reconstruction from a Single
Raw Image
19 2023-09-03 link CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
19 2023-09-27 link SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations
19 2023-07-26 link Adaptive Frequency Filters As Efficient Global Token Mixers
19 2023-03-22 link LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
19 2023-09-25 link Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous
Driving
19 2023-04-04 link NPC: Neural Point Characters from Video
19 2023-08-14 link Dreamwalker: Mental Planning for Continuous Vision-Language Navigation
19 2023-01-11 link LinkGAN: Linking GAN Latents to Pixels for Controllable Image
Synthesis
19 2023-04-04 link Towards Open-Vocabulary Video Instance Segmentation
19 2022-12-09 link Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious
Features in ImageNet
19 2023-01-06 link Object as Query: Lifting any 2D Object Detector to
3D Detection
19 2023-09-29 link Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A
Pilot Study
19 2022-05-19 link Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
19 2023-03-21 link TMA: Temporal Motion Aggregation for Event-based Optical Flow
19 2023-09-24 link LogicSeg: Parsing Visual Semantics with Neural Logic Learning and
Reasoning
19 2023-07-26 link Human-centric Scene Understanding for 3D Large-scale Scenarios
19 2023-04-17 link Pretrained Language Models as Visual Planners for Human Assistance
18 2023-06-09 link Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction
18 2023-03-22 link UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception
Framework
18 2023-07-19 link What do neural networks learn in image classification? A
frequency shortcut perspective
18 2023-08-11 link FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis
of Explainable AI Methods
18 2023-07-27 link Online Clustered Codebook
18 2023-04-12 link Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric
Views
18 2023-09-11 link Class-Incremental Grouping Network for Continual Audio-Visual Learning
18 2022-05-27 link Semi-supervised Semantics-guided Adversarial Training for Robust Trajectory Prediction
18 2022-07-18 link UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in
Bird’s-Eye-View
18 2023-08-28 link Referring Image Segmentation Using Text Supervision
18 2023-08-18 link Self-Calibrated Cross Attention Network for Few-Shot Segmentation
18 2023-07-21 link SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object
Detection
18 2023-09-12 link Modality Unifying Network for Visible-Infrared Person Re-Identification
18 2023-07-31 link Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
18 2023-09-16 link ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images
18 2022-11-14 link BiViT: Extremely Compressed Binary Vision Transformers
18 2023-04-05 link ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules
18 2022-08-29 link SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection
18 2023-03-24 link Anomaly Detection under Distribution Shift
18 2023-08-25 link IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and
Outliers Utilization
17 2023-08-06 link Source-free Domain Adaptive Human Pose Estimation
17 2023-09-03 link LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts
for Vision-Language Models
17 2023-07-20 link Cascade-DETR: Delving into High-Quality Universal Object Detection
17 2023-07-20 link Lighting up NeRF via Unsupervised Decomposition and Enhancement
17 2022-12-07 link Domain generalization of 3D semantic segmentation in autonomous driving
17 2023-08-19 link VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations
17 2023-05-03 link DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross
Diffusion
17 2023-08-14 link ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal
and Robust Vehicle Evasion
17 2023-10-01 link Social Diffusion: Long-term Multiple Human Motion Anticipation
17 2023-08-16 link Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete
Image Modelling Network
17 2023-04-24 link Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
17 2022-10-11 link Multi-Object Navigation with dynamically learned neural implicit representations
17 2023-05-19 link Chupa: Carving 3D Clothed Humans from Skinned Shape Priors
using 2D Diffusion Probabilistic Models
17 2023-08-13 link Compositional Feature Augmentation for Unbiased Scene Graph Generation
17 2023-06-15 link Rosetta Neurons: Mining the Common Units in a Model
Zoo
17 2023-08-14 link PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects
17 2023-08-10 link Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for
Panoramic Semantic Segmentation
17 2023-02-02 link Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors
17 2022-11-01 link Self-supervised Character-to-Character Distillation for Text Recognition
17 2023-04-26 link From Chaos Comes Order: Ordering Event Representations for Object
Recognition and Detection
17 2023-03-15 link Re-ReND: Real-time Rendering of NeRFs across Devices
17 2023-07-18 link EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
16 2022-11-21 link Video Background Music Generation: Dataset, Method and Evaluation
16 2023-08-06 link Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image
Anomaly Detection
16 2023-09-08 link Dynamic Mesh-Aware Radiance Fields
16 2023-06-28 link Subclass-balancing Contrastive Learning for Long-tailed Recognition
16 2023-08-22 link Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts
16 2023-08-07 link Heterogeneous Forgetting Compensation for Class-Incremental Learning
16 2023-03-11 link DETA: Denoised Task Adaptation for Few-Shot Learning
16 2023-03-31 link DIME-FM : DIstilling Multimodal and Efficient Foundation Models
16 2023-07-26 link ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
16 2023-08-11 link Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation
16 2023-04-12 link Mesh2Tex: Generating Mesh Textures from Image Queries
16 2023-09-21 link A Sentence Speaks a Thousand Images: Domain Generalization through
Distilling CLIP with Language Guidance
16 2023-09-18 link GEDepth: Ground Embedding for Monocular Depth Estimation
16 2023-07-31 link DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation
16 2023-09-26 link Pre-training-free Image Manipulation Localization through Non-Mutually Exclusive Contrastive Learning
16 2023-07-21 link Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition
16 2022-11-28 link H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
16 2023-08-19 link Skill Transformer: A Monolithic Policy for Mobile Manipulation
16 2022-12-22 link DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
16 2023-03-16 link Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget
Less
16 2023-08-07 link From Sky to the Ground: A Large-scale Benchmark and
Simple Baseline Towards Real Rain Removal
16 2023-08-14 link Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied
Agents
16 2023-01-05 link DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for
Image Super-Resolution
16 2023-03-20 link Robustifying Token Attention for Vision Transformers
16 2023-09-14 link Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer
Learning
16 2023-05-25 link Action Sensitivity Learning for Temporal Action Localization
16 2023-02-10 link Leveraging Inpainting for Single-Image Shadow Removal
16 2023-08-20 link ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer
16 2023-10-01 link Lighting Every Darkness in Two Pairs : A Calibration-Free
Pipeline for RAW Denoising
16 2023-08-18 link Online Class Incremental Learning on Stochastic Blurry Task Boundary
via Mask and Visual Prompt Tuning
16 2023-07-18 link Object-aware Gaze Target Detection
16 2023-07-19 link MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
15 2023-08-22 link Animal3D: A Comprehensive Dataset of 3D Animal Pose and
Shape
15 2023-03-29 link Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for
3D Human Pose Estimation
15 2023-04-24 link Once Detected, Never Lost: Surpassing Human Performance in Offline
LiDAR based 3D Object Detection
15 2023-09-28 link Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution
Face Synthesis
15 2023-07-31 link Random Sub-Samples Generation for Self-Supervised Real Image Denoising
15 2023-08-13 link RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks
15 2023-08-28 link CLNeRF: Continual Learning Meets NeRF
15 2023-07-20 link AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
15 2023-03-10 link GECCO: Geometrically-Conditioned Point Diffusion Models
15 2023-08-02 link Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient
Agreement Augmentation
15 2023-08-17 link Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
15 2023-08-02 link Dynamic Token Pruning in Plain Vision Transformers for Semantic
Segmentation
15 2023-08-20 link DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization
15 2023-08-23 link Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification
15 2023-04-27 link ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs
15 2023-03-24 link UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature
Fields
15 2023-09-15 link PoseFix: Correcting 3D Human Poses with Natural Language
15 2023-08-10 link Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting
for Robust 6D Object Pose Estimation
15 2023-08-15 link Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
15 2023-09-26 link DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge
Distillation
15 2023-07-27 link Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
15 2022-10-16 link Scratching Visual Transformer's Back with Uniform Attention
15 2023-08-20 link DomainAdaptor: A Novel Approach to Test-time Adaptation
15 2023-08-19 link Calibrating Uncertainty for Semi-Supervised Crowd Counting
15 2023-07-27 link P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
15 2023-08-01 link ELFNet: Evidential Local-global Fusion for Stereo Matching
15 2023-08-19 link 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human
Pose Estimation
15 2023-03-09 link MBPTrack: Improving 3D Point Cloud Tracking with Memory networks
and Box Priors
15 2023-07-20 link See More and Know More: Zero-shot Point Cloud Segmentation
via Multi-modal Visual Data