Last updated: 2024-12-09 08:45:54. Maintained by Weisen Jiang.

citation date review title (pdf) authors
4650 2022-07-06 link YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object
Detectors
2084 2022-08-25 link DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
1247 2022-11-17 link InstructPix2Pix: Learning to Follow Image Editing Instructions
916 2022-11-18 link Magic3D: High-Resolution Text-to-3D Content Creation
862 2022-10-17 link Imagic: Text-Based Real Image Editing with Diffusion Models
738 2023-04-18 link Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion
Models
631 2023-05-09 link ImageBind One Embedding Space to Bind Them All
627 2022-12-08 link Multi-Concept Customization of Text-to-Image Diffusion
614 2022-12-15 link Objaverse: A Universe of Annotated 3D Objects
539 2022-11-14 link EVA: Exploring the Limits of Masked Visual Representation Learning
at Scale
533 2022-12-14 link Reproducible Scaling Laws for Contrastive Language-Image Learning
489 2022-11-10 link InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
489 2023-01-02 link ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
467 2022-11-22 link Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
445 2023-01-17 link GLIGEN: Open-Set Grounded Text-to-Image Generation
442 2022-12-01 link Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for
3D Generation
438 2023-03-07 link Run, Don't Walk: Chasing Higher FLOPS for Faster Neural
Networks
433 2022-05-09 link Activating More Pixels in Image Super-Resolution Transformer
406 2022-12-20 link Planning-oriented Autonomous Driving
388 2023-01-24 link K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
376 2022-10-06 link On Distillation of Guided Diffusion Models
375 2022-10-06 link MaPLe: Multi-modal Prompt Learning
373 2022-11-14 link Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
366 2022-03-27 link Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
354 2023-03-09 link Scaling up GANs for Text-to-Image Synthesis
336 2022-10-09 link Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
320 2023-03-15 link BiFormer: Vision Transformer with Bi-Level Routing Attention
312 2022-11-18 link Visual Programming: Compositional visual reasoning without training
311 2022-11-23 link Paint by Example: Exemplar-based Image Editing with Diffusion Models
311 2023-01-23 link HexPlane: A Fast Representation for Dynamic Scenes
300 2022-06-06 link Mask DINO: Towards A Unified Transformer-based Framework for Object
Detection and Segmentation
280 2023-03-08 link Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
272 2023-06-01 link Neuralangelo: High-Fidelity Neural Surface Reconstruction
266 2022-07-30 link MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural
Field Rendering on Mobile Architectures
261 2022-12-01 link Scaling Language-Image Pre-Training via Masking
251 2022-11-28 link OpenScene: 3D Scene Understanding with Open Vocabularies
250 2023-03-29 link VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
243 2022-12-07 link Diffusion Art or Digital Forgery? Investigating Data Replication in
Diffusion Models
243 2022-11-10 link OneFormer: One Transformer to Rule Universal Image Segmentation
236 2022-12-08 link Executing your Commands via Motion Diffusion in Latent Space
231 2022-12-12 link RODIN: A Generative Model for Sculpting 3D Digital Avatars
Using Diffusion
222 2023-01-15 link Generating Human Motion from Textual Descriptions with Discrete Representations
221 2023-01-19 link Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
215 2023-06-01 link Image as a Foreign Language: BEIT Pretraining for Vision
and Vision-Language Tasks
206 2022-11-18 link BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition
via Perspective Supervision
201 2023-03-13 link FreeNeRF: Improving Few-Shot Neural Rendering with Free Frequency Regularization
199 2023-02-15 link Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
199 2022-11-30 link 3D Neural Field Generation Using Triplane Diffusion
198 2022-09-25 link All are Worth Words: A ViT Backbone for Diffusion
Models
198 2022-12-05 link Images Speak in Images: A Generalist Painter for In-Context
Visual Learning
196 2022-12-08 link SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
194 2022-12-21 link Generalized Decoding for Pixel, Image, and Language
193 2022-11-09 link Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
192 2022-11-26 link CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
190 2022-04-14 link Neighborhood Attention Transformer
188 2023-03-11 link High-resolution image reconstruction with latent diffusion models from human
brain activity
183 2023-02-23 link Side Adapter Network for Open-Vocabulary Semantic Segmentation
181 2022-12-14 link NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
179 2022-11-23 link CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning
178 2023-01-30 link DepGraph: Towards Any Structural Pruning
176 2022-11-23 link Inversion-based Style Transfer with Diffusion Models
175 2022-03-14 link All in One: Exploring Unified Video-Language Pre-Training
173 2022-12-01 link SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction
169 2022-12-02 link MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
167 2022-11-25 link SpaText: Spatio-Textual Representation for Controllable Image Generation
167 2022-12-09 link SmartBrush: Text and Shape Guided Object Inpainting with Diffusion
Model
166 2023-05-11 link EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
165 2022-11-19 link EDGE: Editable Dance Generation From Music
160 2022-12-02 link DiffRF: Rendering-Guided 3D Radiance Field Diffusion
159 2023-03-20 link VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and
Tracking
157 2023-03-29 link Implicit Diffusion Models for Continuous Super-Resolution
155 2022-11-22 link SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven
Single Image Talking Face Animation
154 2023-02-27 link Vid2Seq: Large-Scale Pretraining of a Visual Language Model for
Dense Video Captioning
154 2023-01-15 link Diffusion-based Generation, Optimization, and Planning in 3D Scenes
152 2023-03-21 link Learning A Sparse Transformer Network for Effective Image Deraining
152 2022-07-26 link DETRs with Hybrid Matching
151 2023-01-18 link OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction
and Generation
151 2022-08-21 link Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
150 2022-12-28 link Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models
150 2022-11-21 link Language in a Bottle: Language Model Guided Concept Bottlenecks
for Interpretable Image Classification
148 2022-12-10 link MAGVIT: Masked Generative Video Transformer
147 2022-12-06 link NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General
Image Priors
142 2023-03-27 link Learned Image Compression with Mixed Transformer-CNN Architectures
140 2023-02-15 link Video Probabilistic Diffusion Models in Projected Latent Space
139 2023-03-27 link SimpleNet: A Simple Network for Image Anomaly Detection and
Localization
138 2022-11-20 link DynIBaR: Neural Dynamic Image-Based Rendering
138 2022-06-04 link PIDNet: A Real-time Semantic Segmentation Network Inspired by PID
Controllers
135 2023-04-27 link Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural
Real-Time SLAM
134 2022-11-29 link NeuralLift-360: Lifting an in-the-Wild 2D Photo to A 3D
Object with 360° Views
133 2023-03-03 link Prompt, Generate, Then Cache: Cascade of Foundation Models Makes
Strong Few-Shot Learners
133 2022-11-21 link Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction
and Rendering
132 2023-06-01 link UniSim: A Neural Closed-Loop Sensor Simulator
132 2022-12-08 link SINE: SINgle Image Editing with Text-to-Image Diffusion Models
132 2022-11-22 link EDICT: Exact Diffusion Inversion via Coupled Transformations
131 2022-12-13 link Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting
129 2022-12-08 link MoFusion: A Framework for Denoising-Diffusion-Based Motion Synthesis
129 2023-03-12 link Universal Instance Perception as Object Discovery and Retrieval
129 2023-01-26 link Cut and Learn for Unsupervised Object Detection and Instance
Segmentation
129 2022-11-17 link RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
128 2023-02-20 link Towards Universal Fake Image Detectors that Generalize Across Generative
Models
127 2023-03-01 link Efficient and Explicit Modelling of Image Hierarchies for Image
Restoration
127 2022-12-07 link ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
126 2023-03-30 link LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
124 2023-04-03 link Generative Diffusion Prior for Unified Image Restoration and Enhancement
123 2023-03-23 link Visual-Language Prompt Tuning with Knowledge-Guided Context Optimization
122 2022-12-19 link MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and
Video Generation
121 2022-12-16 link Fake it Till You Make it: Learning Transferable Representations
from Synthetic ImageNet Clones
120 2022-11-14 link Seeing Beyond the Brain: Conditional Diffusion Model with Sparse
Masked Modeling for Vision Decoding
118 2022-11-29 link PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
118 2022-12-14 link ECON: Explicit Clothed humans Optimized via Normal integration
116 2022-08-25 link MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
116 2023-06-01 link SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
115 2023-03-27 link Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask
Learning Perspective
115 2023-06-01 link Query-Centric Trajectory Prediction
115 2022-12-06 link Fine-tuned CLIP Models are Efficient Video Learners
114 2022-05-26 link Revealing the Dark Secrets of Masked Image Modeling
114 2023-04-17 link Affordances from Human Videos as a Versatile Representation for
Robotics
113 2022-11-21 link ESLAM: Efficient Dense SLAM System Based on Hybrid Representation
of Signed Distance Fields
113 2023-03-26 link WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
112 2023-03-12 link Iterative Geometry Encoding Volume for Stereo Matching
112 2023-06-01 link Camouflaged Object Detection with Feature Decomposition and Edge Reconstruction
112 2021-02-23 link Deep Deterministic Uncertainty: A New Simple Baseline
112 2022-11-23 link Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised
Monocular Depth Estimation
112 2023-03-10 link MVImgNet: A Large-scale Dataset of Multi-view Images
111 2022-04-28 link ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
111 2023-01-05 link Robust Dynamic Radiance Fields
110 2022-11-23 link ReCo: Region-Controlled Text-to-Image Generation
109 2023-03-28 link F2-NeRF: Fast Neural Radiance Field Training with Free Camera
Trajectories
109 2022-12-01 link Finetune like you pretrain: Improved finetuning of zero-shot vision
models
108 2022-12-16 link PointAvatar: Deformable Point-Based Head Avatars from Videos
108 2022-06-09 link MobileOne: An Improved One millisecond Mobile Backbone
107 2023-03-20 link Visual Prompt Multi-Modal Tracking
107 2023-03-14 link V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception
106 2023-01-06 link CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
106 2022-03-20 link CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot
Object Navigation
106 2022-12-13 link Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point
Masked Autoencoders
105 2022-11-16 link MAGE: MAsked Generative Encoder to Unify Representation Learning and
Image Synthesis
103 2022-12-08 link Generating Holistic 3D Human Motion from Speech
103 2023-01-12 link CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
102 2022-11-28 link Post-Training Quantization on Diffusion Models
101 2022-12-06 link Diffusion-SDF: Text-to-Shape via Voxelized Diffusion
100 2022-12-19 link Panoptic Lifting for 3D Scene Understanding with Neural Fields
99 2022-11-17 link MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
98 2022-12-09 link Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
97 2023-03-24 link BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects
97 2022-11-21 link Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
96 2023-06-01 link GRES: Generalized Referring Expression Segmentation
95 2023-01-05 link WIRE: Wavelet Implicit Neural Representations
95 2023-03-29 link HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images
95 2023-03-20 link Explicit Visual Prompting for Low-Level Structure Segmentations
95 2023-02-14 link PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
94 2023-03-22 link Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person
Retrieval
92 2022-11-12 link OpenGait: Revisiting Gait Recognition Toward Better Practicality
92 2023-03-22 link Spherical Transformer for LiDAR-Based 3D Recognition
91 2023-04-17 link Interactive and Explainable Region-guided Radiology Report Generation
91 2022-05-16 link BBDM: Image-to-Image Translation with Brownian Bridge Diffusion Models
91 2022-11-22 link SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance
Fields
91 2023-03-13 link TriDet: Temporal Action Detection with Relative Boundary Modeling
90 2023-06-01 link MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion
88 2023-06-01 link BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike
Animated Motion
88 2023-04-24 link TensoIR: Tensorial Inverse Rendering
86 2022-12-05 link Unifying Vision, Text, and Layout for Universal Document Processing
86 2023-03-23 link CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting
and Anchor Pre-Matching
86 2022-11-22 link Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring
85 2023-01-15 link DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
85 2022-06-02 link Siamese Image Modeling for Self-Supervised Vision Representation Learning
85 2023-06-01 link Autoregressive Visual Tracking
85 2023-01-05 link HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling
84 2022-12-16 link CLIP is Also an Efficient Segmenter: A Text-Driven Approach
for Weakly Supervised Semantic Segmentation
84 2023-02-23 link DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
84 2022-11-20 link Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation
83 2022-11-22 link Instant Volumetric Head Avatars
83 2023-03-30 link Consistent View Synthesis with Pose-Guided Diffusion Models
83 2023-02-27 link Aligning Bag of Regions for Open-Vocabulary Object Detection
83 2022-06-16 link OmniMAE: Single Model Masked Pretraining on Images and Videos
83 2023-03-18 link Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report
Generation
82 2023-03-25 link SUDS: Scalable Urban Dynamic Scenes
81 2022-12-11 link Recurrent Vision Transformers for Object Detection with Event Cameras
81 2023-03-23 link SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
80 2023-03-14 link LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
80 2023-01-30 link GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
80 2023-03-16 link Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
79 2022-12-20 link InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
79 2023-03-24 link Robust Test-Time Adaptation in Dynamic Scenarios
79 2022-11-21 link DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
78 2023-03-24 link Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
78 2022-11-19 link Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
78 2022-12-16 link Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
77 2023-04-04 link Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory
Diffusion
77 2023-03-30 link Hierarchical Fine-Grained Image Forgery Detection and Localization
77 2023-03-04 link Virtual Sparse Convolution for Multimodal 3D Object Detection
77 2023-03-24 link Progressively Optimized Local Radiance Fields for Robust View Synthesis
77 2023-03-22 link Dense Distinct Query for End-to-End Object Detection
77 2023-03-20 link Leapfrog Diffusion Model for Stochastic Trajectory Prediction
76 2023-04-20 link Collaborative Diffusion for Multi-Modal Face Generation and Editing
76 2023-05-18 link 3D Registration with Maximal Cliques
75 2022-11-22 link Person Image Synthesis via Denoising Diffusion Model
75 2022-11-27 link Diffusion Probabilistic Model Made Slim
75 2023-05-01 link Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
74 2023-04-14 link DCFace: Synthetic Face Generation with Dual Condition Diffusion Model
73 2023-02-22 link Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild
via Self-supervised Scene Decomposition
73 2022-12-08 link Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised
Video Representation Learning
73 2022-07-20 link FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
72 2023-06-01 link Rethinking Federated Learning with Domain Shift: A Prototype View
72 2022-10-26 link Implicit Identity Leakage: The Stumbling Block to Improving Deepfake
Detection Generalization
72 2022-12-15 link FlexiViT: One Model for All Patch Sizes
72 2022-11-17 link CRAFT: Concept Recursive Activation FacTorization for Explainability
72 2023-04-25 link CompletionFormer: Depth Completion with Convolutions and Vision Transformers
71 2023-01-10 link DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
71 2022-02-01 link DKM: Dense Kernelized Feature Matching for Geometry Estimation
71 2023-03-26 link Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
71 2023-03-02 link UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal
Generation and Goal-Conditioned Policy
71 2022-06-14 link LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
71 2023-04-10 link Ambiguous Medical Image Segmentation Using Diffusion Models
71 2022-07-21 link Omni3D: A Large Benchmark and Model for 3D Object
Detection in the Wild
70 2023-06-01 link TryOnDiffusion: A Tale of Two UNets
70 2023-02-06 link Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image
Retrieval
70 2022-04-17 link NICO++: Towards Better Benchmarking for Domain Generalization
69 2023-03-28 link One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
69 2023-03-30 link Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for
Human-Centric Visual Tasks
69 2022-06-24 link Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
69 2022-12-10 link Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
69 2022-12-09 link ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow
Removal
69 2023-03-25 link Selective Structured State-Spaces for Long-Form Video Understanding
68 2023-01-04 link PACO: Parts and Attributes of Common Objects
68 2022-08-02 link ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
68 2022-11-10 link GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable
and Actionable Parts
67 2023-03-24 link Grid-guided Neural Radiance Fields for Large Urban Scenes
67 2023-01-05 link Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
67 2023-03-02 link FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow
Estimation
67 2022-11-19 link Parallel Diffusion Models of Operator and Image for Blind
Inverse Problems
67 2023-02-03 link vMAP: Vectorised Object Mapping for Neural Field SLAM
66 2023-01-16 link Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
66 2022-12-09 link Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
66 2022-12-09 link VindLU: A Recipe for Effective Video-and-Language Pretraining
66 2022-06-30 link LaserMix for Semi-Supervised LiDAR Semantic Segmentation
66 2022-12-11 link How to Backdoor Diffusion Models?
66 2023-04-06 link Neural Fields Meet Explicit Geometric Representations for Inverse Rendering
of Urban Scenes
66 2023-03-06 link Multimodal Prompting with Missing Modalities for Visual Recognition
65 2023-06-01 link Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology
Images
65 2022-11-21 link VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
65 2022-12-01 link Learning to Generate Text-Grounded Mask for Open-World Semantic Segmentation
from Only Image-Text Pairs
65 2022-10-03 link Visual Prompt Tuning for Generative Transfer Learning
65 2022-11-23 link Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
64 2023-03-20 link EqMotion: Equivariant Multi-Agent Motion Prediction with Invariant Interaction Reasoning
64 2023-03-30 link FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
64 2023-05-10 link Think Twice before Driving: Towards Scalable Decoders for End-to-End
Autonomous Driving
64 2023-06-01 link Multi-Label Compound Expression Recognition: C-EXPR Database & Network
64 2023-03-20 link Benchmarking Robustness of 3D Object Detection to Common Corruptions
in Autonomous Driving
64 2022-11-28 link High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization
64 2022-09-30 link Smallcap: Lightweight Image Captioning Prompted with Retrieval Augmentation
64 2023-03-24 link GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow
Global-Parsing Learning
63 2023-03-02 link Token Contrast for Weakly-Supervised Semantic Segmentation
63 2023-06-01 link Revisiting Reverse Distillation for Anomaly Detection
63 2023-03-21 link Detecting Everything in the Open World: Towards Universal Object
Detection
63 2023-04-13 link iDisc: Internal Discretization for Monocular Depth Estimation
63 2022-06-23 link EventNeRF: Neural Radiance Fields from a Single Colour Event
Camera
63 2023-03-29 link Generalized Relation Modeling for Transformer Tracking
62 2023-03-30 link DiffCollage: Parallel Generation of Large Content with Diffusion Models
62 2023-05-02 link Generalizing Dataset Distillation via Deep Generative Prior
62 2022-11-22 link MagicPony: Learning Articulated 3D Animals in the Wild
62 2023-04-02 link Re-IQA: Unsupervised Learning for Image Quality Assessment in the
Wild
62 2023-03-29 link Seeing What You Said: Talking Face Generation Guided by
a Lip Reading Expert
62 2023-06-01 link Implicit Identity Driven Deepfake Face Swapping Detection
62 2022-11-29 link Wavelet Diffusion Models are fast and scalable Image Generators
62 2021-12-13 link CR-FIQA: Face Image Quality Assessment by Learning Sample Relative
Classifiability
62 2023-03-03 link EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
62 2023-05-10 link V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception
and Forecasting
61 2023-04-10 link DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
61 2023-03-02 link Delivering Arbitrary-Modal Semantic Segmentation
61 2023-02-24 link Decoupling Human and Camera Motion from Videos in the
Wild
61 2023-03-07 link LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-
Modal Fusion
60 2022-12-21 link TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection
and Localization
60 2023-03-25 link Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for
Visible-Infrared Person Re-identification
59 2023-03-01 link Extracting Motion and Appearance via Inter-Frame Attention for Efficient
Video Frame Interpolation
59 2023-03-13 link Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
59 2022-11-22 link PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces Using Permutohedral
Lattices
59 2023-06-01 link Improved Distribution Matching for Dataset Condensation
58 2023-04-19 link AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
58 2023-03-17 link A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
58 2023-03-23 link PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°
58 2023-04-04 link MonoHuman: Animatable Human Neural Field from Monocular Video
58 2023-03-23 link CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained
or Not
58 2023-04-06 link Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
58 2023-04-17 link Avatars Grow Legs: Generating Smooth Human Motion from Sparse
Tracking Inputs with Diffusion Model
57 2023-03-22 link CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
57 2023-03-25 link NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
57 2022-11-21 link Understanding and Improving Visual Prompting: A Label-Mapping Perspective
57 2023-01-19 link Multiview Compressive Coding for 3D Reconstruction
57 2023-03-18 link MotionTrack: Learning Robust Short-Term and Long-Term Motions for Multi-Object
Tracking
57 2023-03-21 link Affordance Diffusion: Synthesizing Hand-Object Interactions
57 2023-06-01 link End-to-End Vectorized HD-map Construction with Piecewise Bézier Curve
57 2022-11-18 link Task Residual for Tuning Vision-Language Models
57 2023-03-01 link Multimodal Industrial Anomaly Detection via Hybrid Fusion
57 2022-12-09 link A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating
One Amplifies Others
57 2022-11-21 link Teaching Structured Vision & Language Concepts to Vision &
Language Models
56 2023-06-01 link Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images
Detection
56 2022-12-28 link Multi-Realism Image Compression with a Conditional Generator
56 2023-03-18 link Sharpness-Aware Gradient Matching for Domain Generalization
56 2022-06-21 link LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
56 2023-06-01 link Ingredient-oriented Multi-Degradation Learning for Image Restoration
55 2022-11-19 link DeepSolo: Let Transformer Decoder with Explicit Points Solo for
Text Spotting
55 2023-04-05 link METransformer: Radiology Report Generation by Transformer with Multiple Learnable
Expert Tokens
55 2022-12-15 link Vision Transformers are Parameter-Efficient Audio-Visual Learners
55 2023-03-30 link PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D
Human Pose Estimation
54 2023-04-08 link RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
54 2023-05-17 link ReasonNet: End-to-End Driving with Temporal and Global Reasoning
54 2022-11-21 link N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
54 2022-11-23 link Make-A-Story: Visual Memory Conditioned Consistent Story Generation
54 2023-04-02 link DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
54 2023-06-01 link ObjectStitch: Object Compositing with Diffusion Model
53 2022-12-03 link PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via
Pretrained Image-Language Models
53 2023-04-20 link Omni Aggregation Networks for Lightweight Image Super-Resolution
53 2023-03-26 link Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
53 2023-03-23 link Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
53 2022-11-23 link SVFormer: Semi-supervised Video Transformer for Action Recognition
53 2023-04-14 link Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
53 2023-05-11 link Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
53 2023-06-01 link FFCV: Accelerating Training by Removing Data Bottlenecks
53 2023-06-01 link PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification
53 2022-09-04 link An Empirical Study of End-to-End Video-Language Transformers with Masked
Visual Modeling
53 2023-06-01 link Multi-Modal Learning with Missing Modality via Shared-Specific Feature Modelling
53 2023-02-02 link RobustNeRF: Ignoring Distractors with Robust Losses
52 2023-03-14 link PiMAE: Point Cloud and Image Interactive Masked Autoencoders for
3D Object Detection
52 2022-11-29 link Compressing Volumetric Radiance Fields to 1 MB
52 2023-03-31 link 3D Human Pose Estimation via Intuitive Physics
52 2022-03-29 link Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
52 2023-01-20 link FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
52 2023-04-17 link Learning to Render Novel Views from Wide-Baseline Stereo Pairs
51 2023-03-15 link BEVHeight: A Robust Framework for Vision-based Roadside 3D Object
Detection
51 2022-09-07 link MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with
Multi-Depth Seeds for 3D Object Detection
51 2023-03-13 link DR2: Diffusion-Based Robust Degradation Remover for Blind Face Restoration
51 2023-04-19 link NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
51 2023-02-21 link PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction
51 2023-05-15 link Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
51 2022-11-25 link NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of
Surfaces with Arbitrary Topologies
51 2022-11-11 link Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection
51 2022-12-06 link GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point
Clouds
50 2023-03-14 link Rotation-Invariant Transformer for Point Cloud Matching
50 2023-03-10 link TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
50 2023-03-13 link SCPNet: Semantic Scene Completion on Point Cloud
50 2023-03-25 link Freestyle Layout-to-Image Synthesis
50 2022-11-17 link Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks
50 2023-03-06 link Referring Multi-Object Tracking
50 2022-12-22 link DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware
Scene Synthesis
49 2023-03-10 link Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
49 2022-12-12 link Accelerating Dataset Distillation via Model Augmentation
49 2023-04-05 link HNeRV: A Hybrid Neural Representation for Videos
49 2023-06-01 link Good is Bad: Causality Inspired Cloth-debiasing for Cloth-changing Person
Re-identification
49 2023-03-10 link CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with
Variational Alignment
49 2023-04-04 link OrienterNet: Visual Localization in 2D Public Maps with Neural
Matching
49 2023-06-01 link Dynamic Graph Learning with Content-guided Spatial-Frequency Relation Reasoning for
Deepfake Detection
49 2023-01-13 link CLIP the Gap: A Single Domain Generalization Approach for
Object Detection
49 2023-03-21 link Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting
the Decision Boundary
48 2023-01-23 link LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
48 2023-06-01 link MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding
from Object Detection
48 2023-05-10 link Low-Light Image Enhancement via Structure Modeling and Guidance
48 2022-12-16 link GFPose: Learning 3D Human Pose Prior with Gradient Fields
48 2023-03-15 link MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving
48 2022-12-06 link Rethinking Video ViTs: Sparse Video Tubes for Joint Image
and Video Learning
48 2023-01-04 link Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
48 2023-04-12 link Constructing Deep Spiking Neural Networks from Artificial Neural Networks
with Knowledge Distillation
48 2022-11-23 link BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields
48 2022-12-14 link PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
47 2022-11-21 link Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
47 2023-04-03 link Neural Volumetric Memory for Visual Locomotion Control
47 2023-05-08 link PillarNeXt: Rethinking Network Designs for 3D Object Detection in
LiDAR Point Clouds
47 2022-12-05 link Prototypical Residual Networks for Anomaly Detection and Localization
47 2023-03-24 link Decoupled Multimodal Distilling for Emotion Recognition
47 2023-03-15 link Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology
Whole Slide Image Classification
47 2022-12-22 link Removing Objects From Neural Radiance Fields
47 2022-12-15 link VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction
46 2023-03-30 link C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient
Source Free Domain Adaptation
46 2022-11-26 link Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head
Synthesis
46 2022-12-11 link PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized
Novel Category Discovery
46 2022-11-23 link HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with
Discrete and Continuous Denoising
46 2023-01-24 link RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in
Autonomous Driving
46 2023-04-04 link Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
46 2023-03-20 link Computationally Budgeted Continual Learning: What Does Matter?
46 2023-04-21 link Joint Token Pruning and Squeezing Towards More Aggressive Compression
of Vision Transformers
46 2022-12-02 link PROB: Probabilistic Objectness for Open World Object Detection
46 2022-11-21 link NeRF-RPN: A general framework for object detection in NeRFs
46 2023-06-01 link KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
46 2023-02-28 link A Hierarchical Representation Network for Accurate and Detailed Face
Reconstruction from In-The-Wild Images
45 2023-06-01 link AltFreezing for More General Video Face Forgery Detection
45 2023-03-13 link MP-Former: Mask-Piloted Transformer for Image Segmentation
45 2022-12-12 link ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation
45 2023-06-01 link Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
45 2023-04-11 link One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
45 2023-03-01 link Renderable Neural Radiance Map for Visual Navigation
45 2023-03-26 link You Only Segment Once: Towards Real-Time Panoptic Segmentation
45 2023-06-01 link Learning Weather-General and Weather-Specific Features for Image Restoration Under
Multiple Adverse Weather Conditions
45 2023-03-19 link StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields
44 2023-06-01 link 3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention
44 2023-03-23 link Collaboration Helps Camera Overtake LiDAR in 3D Detection
44 2022-07-04 link Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly
Detection
44 2023-05-23 link Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using
RGB and Poses
44 2023-03-28 link DisWOT: Student Architecture Search for Distillation WithOut Training
44 2023-01-18 link PIRLNav: Pretraining with Imitation and RL Finetuning for OBJECTNAV
44 2023-03-27 link Label-Free Liver Tumor Segmentation
44 2023-04-06 link Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
44 2022-12-15 link Unsupervised Object Localization: Observing the Background to Discover Objects
44 2023-03-30 link Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
44 2023-03-26 link OTAvatar: One-Shot Talking Face Avatar with Controllable Tri-Plane Rendering
44 2023-06-01 link Federated Domain Generalization with Generalization Adjustment
44 2022-11-12 link MARLIN: Masked Autoencoder for facial video Representation LearnINg
43 2022-12-15 link MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
43 2023-04-04 link Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target
Detection with Single Point Supervision
43 2022-11-29 link UDE: A Unified Driving Engine for Human Motion Generation
43 2022-12-06 link Unifying Short and Long-Term Tracking with Graph Hierarchies
43 2023-03-26 link BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
43 2023-03-14 link Diversity-Aware Meta Visual Prompting
43 2023-03-13 link Align and Attend: Multimodal Summarization with Dual Contrastive Losses
43 2022-11-23 link Texts as Images in Prompt Tuning for Multi-Label Image
Recognition
43 2022-06-09 link On Data Scaling in Masked Image Modeling
43 2022-07-16 link Clover: Towards A Unified Video-Language Alignment and Fusion Model
43 2022-12-09 link Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
43 2023-05-31 link Neural Kernel Surface Reconstruction
43 2023-04-03 link Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
43 2022-08-08 link Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
42 2023-04-13 link Representing Volumetric Videos as Dynamic MLP Maps
42 2022-11-22 link DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors
42 2022-11-02 link CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from
Natural Language
42 2023-01-06 link 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
42 2023-03-07 link Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain
Adaptation
42 2023-05-15 link NIKI: Neural Inverse Kinematics with Invertible Neural Networks for
3D Human Pose and Shape Estimation
42 2022-11-30 link 3D GAN Inversion with Facial Symmetry Prior
42 2023-03-05 link PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
42 2022-12-15 link Real-Time Neural Light Field on Mobile Devices
42 2022-12-06 link Semantic-Conditional Diffusion Networks for Image Captioning*
42 2023-04-11 link Video Event Restoration Based on Keyframes for Video Anomaly
Detection
42 2023-06-01 link DISC: Learning from Noisy Labels via Dynamic Instance-Specific Selection
and Correction
42 2023-03-09 link Diversity-Measurable Anomaly Detection
42 2023-06-01 link Change-Aware Sampling and Contrastive Learning for Satellite Images
42 2022-12-27 link Interactive Segmentation of Radiance Fields
41 2023-01-30 link Shape-Aware Text-Driven Layered Video Editing
41 2022-12-09 link Ego-Body Pose Estimation via Ego-Head Pose Estimation
41 2022-12-15 link DeepLSD: Line Segment Detection and Refinement with Deep Image
Gradients
41 2023-04-13 link DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
41 2023-03-25 link Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal
Representation Learning
41 2023-06-01 link All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative
Filters for Specific Degradations
41 2023-04-17 link Neural Map Prior for Autonomous Driving
41 2023-06-01 link TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras
in 3D Environments
41 2023-03-20 link Zero-Shot Noise2Noise: Efficient Image Denoising without any Data
41 2023-03-22 link Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly
Detection
41 2023-03-06 link Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene
Representation from 2D Supervision
41 2022-12-31 link Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained
Vision-Language Models
40 2022-11-21 link Vision Transformer with Super Token Sampling
40 2022-12-01 link Hyperbolic Contrastive Learning for Visual Representations beyond Objects
40 2023-03-05 link Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial
Scenes
40 2023-04-03 link Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
40 2022-12-08 link Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection
40 2022-12-14 link HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
40 2023-03-25 link DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields
40 2022-06-16 link MoDi: Unconditional Motion Synthesis from Diverse Data
40 2022-11-25 link Fine-Grained Face Swapping Via Regional GAN Inversion
40 2023-03-18 link DeAR: Debiasing Vision-Language Models with Additive Residuals
39 2022-12-22 link Re-basin via implicit Sinkhorn differentiation
39 2023-03-20 link 3D Concept Learning and Reasoning from Multi-View Images
39 2023-03-23 link Backdoor Defense via Adaptively Splitting Poisoned Dataset
39 2023-04-16 link SeaThru-NeRF: Neural Radiance Fields in Scattering Media
39 2023-02-28 link Turning a CLIP Model into a Scene Text Detector
39 2022-12-21 link PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields
39 2023-06-01 link Boundary-enhanced Co-training for Weakly Supervised Semantic Segmentation
38 2023-03-31 link Zero-shot Referring Image Segmentation with Global-Local Context Features
38 2023-02-18 link Temporal Interpolation is all You Need for Dynamic Neural
Radiance Fields
38 2023-03-30 link NeRF-Supervised Deep Stereo
38 2023-06-01 link Multi-Level Logit Distillation
38 2023-05-04 link LayoutDM: Transformer-based Diffusion Model for Layout Generation
38 2023-03-30 link Mixed Autoencoder for Self-Supervised Visual Representation Learning
38 2023-03-23 link Masked Image Training for Generalizable Deep Image Denoising
38 2022-03-27 link UV Volumes for Real-time Rendering of Editable Free-view Human
Performance
38 2023-05-09 link StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based
Generator
38 2023-04-10 link Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
38 2022-05-23 link PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D
Detection
38 2022-12-08 link MIME: Human-Aware 3D Scene Generation
38 2023-04-06 link Continual Detection Transformer for Incremental Object Detection
38 2023-04-12 link Hard Patches Mining for Masked Image Modeling
38 2023-06-01 link Slimmable Dataset Condensation
38 2023-03-25 link CFA: Class-Wise Calibrated Fair Adversarial Training
38 2023-03-20 link Make Landscape Flatter in Differentially Private Federated Learning
37 2023-01-18 link NeRF in the Palm of Your Hand: Corrective Augmentation
for Robotics via Novel-View Synthesis
37 2023-02-07 link Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera
3D Multi-Object Tracking
37 2023-06-01 link Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models
to Learn Any Unseen Style
37 2022-12-09 link PIVOT: Prompting for Video Continual Learning
37 2022-09-04 link Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
37 None link Clothing-Change Feature Augmentation for Person Re-Identification
37 2022-02-15 link Don't Lie to Me! Robust and Efficient Explainability with
Verified Perturbation Analysis
37 2023-04-20 link NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering
37 2023-03-24 link VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining
37 2022-12-15 link CLIPPO: Image-and-Language Understanding from Pixels Only
37 2023-03-23 link Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
37 2023-03-25 link Adaptive Sparse Convolutional Networks with Global Context Enhancement for
Faster Object Detection on Drone Images
37 2023-04-02 link Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot
Class-Incremental Learning
37 2023-01-03 link Understanding Imbalanced Semantic Segmentation Through Neural Collapse
37 2023-01-26 link Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring
37 2022-12-05 link I2MVFormer: Large Language Model Generated Multi-View Document Supervision for
Zero-Shot Image Classification
37 2023-03-29 link Self-Positioning Point-Based Transformer for Point Cloud Understanding
36 2023-04-23 link Evading DeepFake Detectors via Adversarial Statistical Consistency
36 2023-04-09 link CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
36 2023-06-01 link BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud
Registration
36 2023-03-08 link X-Avatar: Expressive Human Avatars
36 2022-07-04 link PVO: Panoptic Visual Odometry
36 2023-03-06 link Continuous Sign Language Recognition with Correlation Network
36 2023-04-10 link PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning
36 2022-06-22 link A Simple Baseline for Video Restoration with Grouped Spatial-Temporal
Shift
36 2023-03-30 link SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
36 2023-06-01 link Diffusion-Based Signed Distance Fields for 3D Shape Generation
36 2023-06-01 link Efficient RGB-T Tracking via Cross-Modality Distillation
36 2023-06-01 link ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients
36 2022-11-16 link T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection
36 2023-03-13 link Twin Contrastive Learning with Noisy Labels
36 2022-12-08 link CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
36 2022-11-21 link SceneComposer: Any-Level Semantic Image Synthesis
35 2023-01-18 link Behind the Scenes: Density Fields for Single View Reconstruction
35 2022-11-16 link A Generalized Framework for Video Instance Segmentation
35 2023-03-24 link Class-Incremental Exemplar Compression for Class-Incremental Learning
35 2023-02-17 link MixNeRF: Modeling a Ray with Mixture Density for Novel
View Synthesis from Sparse Inputs
35 2023-06-01 link BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual
Camera via Key-Points
35 2023-03-23 link Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
35 2022-06-09 link Simple Cues Lead to a Strong Multi-Object Tracker
35 2022-03-23 link Deep Frequency Filtering for Domain Generalization
35 2023-06-01 link MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
35 2023-04-17 link OVTrack: Open-Vocabulary Multiple Object Tracking
35 2023-03-28 link Improving the Transferability of Adversarial Samples by Path-Augmented Method
35 2023-03-08 link X-Pruner: eXplainable Pruning for Vision Transformers
35 2023-01-05 link HierVL: Learning Hierarchical Video-Language Embeddings
35 2022-12-06 link Learning Neural Parametric Head Models
35 2023-06-01 link Color Backdoor: A Robust Poisoning Attack in Color Space
35 2023-05-04 link Contrastive Mean Teacher for Domain Adaptive Object Detectors
35 2023-04-09 link Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
35 2022-11-14 link PMR: Prototypical Modal Rebalance for Multimodal Learning
35 2022-10-03 link LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision
& Language Models
35 2023-03-21 link ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
35 2023-04-05 link Detecting and Grounding Multi-Modal Media Manipulation
34 2023-03-06 link UniHCP: A Unified Model for Human-Centric Perceptions
34 2022-08-27 link TrojViT: Trojan Insertion in Vision Transformers
34 2023-06-01 link GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class
Incremental Task
34 2023-03-03 link Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene
Flow, Optical Flow and Stereo
34 2023-03-28 link HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language
Models
34 2023-03-21 link Joint Visual Grounding and Tracking with Natural Language Specification
34 2023-03-21 link Context De-Confounded Emotion Recognition
34 2023-02-28 link Attention-Based Point Cloud Edge Sampling
34 2023-06-01 link Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation
34 2023-04-04 link On the Stability-Plasticity Dilemma of Class-Incremental Learning
34 2023-01-12 link ViTs for SITS: Vision Transformers for Satellite Image Time
Series
34 2022-11-26 link Meta Architecture for Point Cloud Analysis
34 2022-11-29 link SparsePose: Sparse-View Camera Pose Regression and Refinement
34 2023-06-01 link Context-aware Alignment and Mutual Masking for 3D-Language Pre-training
34 2022-05-26 link MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
Hierarchical Vision Transformers
34 2022-10-06 link A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions
and Imitation Learning
34 2023-04-11 link Continual Semantic Segmentation with Automatic Memory Sample Selection
34 2023-04-10 link Improved Test-Time Adaptation for Domain Generalization
34 2023-01-06 link End-to-End 3D Dense Captioning with Vote2Cap-DETR
34 2023-01-02 link NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
34 2023-06-01 link Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation
Learning
34 2023-06-01 link Rethinking the Correlation in Few-Shot Segmentation: A Buoys View
34 2023-04-23 link TransFlow: Transformer as Flow Learner
34 2023-03-21 link Human Pose as Compositional Tokens
33 2022-11-29 link NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
33 2023-05-09 link PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
33 2023-03-26 link CelebV-Text: A Large-Scale Facial Text-Video Dataset
33 2023-02-28 link Backdoor Attacks Against Deep Image Compression via Adaptive Frequency
Trigger
33 2022-11-17 link Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
33 2023-06-01 link Spatial-Frequency Mutual Learning for Face Super-Resolution
33 2022-09-30 link PyPose: A Library for Robot Learning with Physics-based Optimization
33 2023-04-06 link Micron-BERT: BERT-Based Facial Micro-Expression Recognition
33 2023-06-01 link GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection
33 2023-03-14 link I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing
in Neural SDFs
33 2023-02-28 link ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing
Part Sensitive Transformer
33 2022-11-19 link LidarGait: Benchmarking 3D Gait Recognition with Point Clouds
33 2022-12-29 link Efficient Movie Scene Detection using State-Space Transformers
33 2022-07-07 link Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption
33 2023-03-27 link Hi4D: 4D Instance Segmentation of Close Human Interaction
33 2022-12-06 link Visual Query Tuning: Towards Effective Usage of Intermediate Representations
for Parameter and Memory Efficient Transfer Learning
33 2022-11-23 link Data-Driven Feature Tracking for Event Cameras
33 2023-03-09 link Masked Image Modeling with Local Multi-Scale Reconstruction
33 2023-06-01 link Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation
33 2023-03-25 link CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation
Synthesis
33 2023-02-25 link Point Cloud Forecasting as a Proxy for 4D Occupancy
Forecasting
33 2023-04-18 link Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
33 2023-03-21 link Data-Efficient Large Scale Place Recognition with Graded Similarity Supervision
33 2023-03-20 link Feature Alignment and Uniformity for Test Time Adaptation
33 2023-04-09 link Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
33 2023-03-23 link Detecting Backdoors in Pre-trained Encoders
33 2023-04-17 link ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object
Interaction Detection
32 2023-03-29 link Fair Federated Medical Image Segmentation via Client Contribution Estimation