4650 |
2022-07-06 |
link |
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors |
|
2084 |
2022-08-25 |
link |
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation |
|
1247 |
2022-11-17 |
link |
InstructPix2Pix: Learning to Follow Image Editing Instructions |
|
916 |
2022-11-18 |
link |
Magic3D: High-Resolution Text-to-3D Content Creation |
|
862 |
2022-10-17 |
link |
Imagic: Text-Based Real Image Editing with Diffusion Models |
|
738 |
2023-04-18 |
link |
Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models |
|
631 |
2023-05-09 |
link |
ImageBind One Embedding Space to Bind Them All |
|
627 |
2022-12-08 |
link |
Multi-Concept Customization of Text-to-Image Diffusion |
|
614 |
2022-12-15 |
link |
Objaverse: A Universe of Annotated 3D Objects |
|
539 |
2022-11-14 |
link |
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale |
|
533 |
2022-12-14 |
link |
Reproducible Scaling Laws for Contrastive Language-Image Learning |
|
489 |
2022-11-10 |
link |
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions |
|
489 |
2023-01-02 |
link |
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders |
|
467 |
2022-11-22 |
link |
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation |
|
445 |
2023-01-17 |
link |
GLIGEN: Open-Set Grounded Text-to-Image Generation |
|
442 |
2022-12-01 |
link |
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation |
|
438 |
2023-03-07 |
link |
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks |
|
433 |
2022-05-09 |
link |
Activating More Pixels in Image Super-Resolution Transformer |
|
406 |
2022-12-20 |
link |
Planning-oriented Autonomous Driving |
|
388 |
2023-01-24 |
link |
K-Planes: Explicit Radiance Fields in Space, Time, and Appearance |
|
376 |
2022-10-06 |
link |
On Distillation of Guided Diffusion Models |
|
375 |
2022-10-06 |
link |
MaPLe: Multi-modal Prompt Learning |
|
373 |
2022-11-14 |
link |
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures |
|
366 |
2022-03-27 |
link |
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking |
|
354 |
2023-03-09 |
link |
Scaling up GANs for Text-to-Image Synthesis |
|
336 |
2022-10-09 |
link |
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP |
|
320 |
2023-03-15 |
link |
BiFormer: Vision Transformer with Bi-Level Routing Attention |
|
312 |
2022-11-18 |
link |
Visual Programming: Compositional visual reasoning without training |
|
311 |
2022-11-23 |
link |
Paint by Example: Exemplar-based Image Editing with Diffusion Models |
|
311 |
2023-01-23 |
link |
HexPlane: A Fast Representation for Dynamic Scenes |
|
300 |
2022-06-06 |
link |
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation |
|
280 |
2023-03-08 |
link |
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models |
|
272 |
2023-06-01 |
link |
Neuralangelo: High-Fidelity Neural Surface Reconstruction |
|
266 |
2022-07-30 |
link |
MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures |
|
261 |
2022-12-01 |
link |
Scaling Language-Image Pre-Training via Masking |
|
251 |
2022-11-28 |
link |
OpenScene: 3D Scene Understanding with Open Vocabularies |
|
250 |
2023-03-29 |
link |
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking |
|
243 |
2022-12-07 |
link |
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models |
|
243 |
2022-11-10 |
link |
OneFormer: One Transformer to Rule Universal Image Segmentation |
|
236 |
2022-12-08 |
link |
Executing your Commands via Motion Diffusion in Latent Space |
|
231 |
2022-12-12 |
link |
RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion |
|
222 |
2023-01-15 |
link |
Generating Human Motion from Textual Descriptions with Discrete Representations |
|
221 |
2023-01-19 |
link |
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture |
|
215 |
2023-06-01 |
link |
Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks |
|
206 |
2022-11-18 |
link |
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision |
|
201 |
2023-03-13 |
link |
FreeNeRF: Improving Few-Shot Neural Rendering with Free Frequency Regularization |
|
199 |
2023-02-15 |
link |
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction |
|
199 |
2022-11-30 |
link |
3D Neural Field Generation Using Triplane Diffusion |
|
198 |
2022-09-25 |
link |
All are Worth Words: A ViT Backbone for Diffusion Models |
|
198 |
2022-12-05 |
link |
Images Speak in Images: A Generalist Painter for In-Context Visual Learning |
|
196 |
2022-12-08 |
link |
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation |
|
194 |
2022-12-21 |
link |
Generalized Decoding for Pixel, Image, and Language |
|
193 |
2022-11-09 |
link |
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models |
|
192 |
2022-11-26 |
link |
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion |
|
190 |
2022-04-14 |
link |
Neighborhood Attention Transformer |
|
188 |
2023-03-11 |
link |
High-resolution image reconstruction with latent diffusion models from human brain activity |
|
183 |
2023-02-23 |
link |
Side Adapter Network for Open-Vocabulary Semantic Segmentation |
|
181 |
2022-12-14 |
link |
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior |
|
179 |
2022-11-23 |
link |
CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning |
|
178 |
2023-01-30 |
link |
DepGraph: Towards Any Structural Pruning |
|
176 |
2022-11-23 |
link |
Inversion-based Style Transfer with Diffusion Models |
|
175 |
2022-03-14 |
link |
All in One: Exploring Unified Video-Language Pre-Training |
|
173 |
2022-12-01 |
link |
SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction |
|
169 |
2022-12-02 |
link |
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation |
|
167 |
2022-11-25 |
link |
SpaText: Spatio-Textual Representation for Controllable Image Generation |
|
167 |
2022-12-09 |
link |
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model |
|
166 |
2023-05-11 |
link |
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention |
|
165 |
2022-11-19 |
link |
EDGE: Editable Dance Generation From Music |
|
160 |
2022-12-02 |
link |
DiffRF: Rendering-Guided 3D Radiance Field Diffusion |
|
159 |
2023-03-20 |
link |
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking |
|
157 |
2023-03-29 |
link |
Implicit Diffusion Models for Continuous Super-Resolution |
|
155 |
2022-11-22 |
link |
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation |
|
154 |
2023-02-27 |
link |
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning |
|
154 |
2023-01-15 |
link |
Diffusion-based Generation, Optimization, and Planning in 3D Scenes |
|
152 |
2023-03-21 |
link |
Learning A Sparse Transformer Network for Effective Image Deraining |
|
152 |
2022-07-26 |
link |
DETRs with Hybrid Matching |
|
151 |
2023-01-18 |
link |
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation |
|
151 |
2022-08-21 |
link |
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation |
|
150 |
2022-12-28 |
link |
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models |
|
150 |
2022-11-21 |
link |
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification |
|
148 |
2022-12-10 |
link |
MAGVIT: Masked Generative Video Transformer |
|
147 |
2022-12-06 |
link |
NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors |
|
142 |
2023-03-27 |
link |
Learned Image Compression with Mixed Transformer-CNN Architectures |
|
140 |
2023-02-15 |
link |
Video Probabilistic Diffusion Models in Projected Latent Space |
|
139 |
2023-03-27 |
link |
SimpleNet: A Simple Network for Image Anomaly Detection and Localization |
|
138 |
2022-11-20 |
link |
DynIBaR: Neural Dynamic Image-Based Rendering |
|
138 |
2022-06-04 |
link |
PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers |
|
135 |
2023-04-27 |
link |
Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM |
|
134 |
2022-11-29 |
link |
NeuralLift-360: Lifting an in-the-Wild 2D Photo to A 3D Object with 360° Views |
|
133 |
2023-03-03 |
link |
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners |
|
133 |
2022-11-21 |
link |
Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering |
|
132 |
2023-06-01 |
link |
UniSim: A Neural Closed-Loop Sensor Simulator |
|
132 |
2022-12-08 |
link |
SINE: SINgle Image Editing with Text-to-Image Diffusion Models |
|
132 |
2022-11-22 |
link |
EDICT: Exact Diffusion Inversion via Coupled Transformations |
|
131 |
2022-12-13 |
link |
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting |
|
129 |
2022-12-08 |
link |
MoFusion: A Framework for Denoising-Diffusion-Based Motion Synthesis |
|
129 |
2023-03-12 |
link |
Universal Instance Perception as Object Discovery and Retrieval |
|
129 |
2023-01-26 |
link |
Cut and Learn for Unsupervised Object Detection and Instance Segmentation |
|
129 |
2022-11-17 |
link |
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation |
|
128 |
2023-02-20 |
link |
Towards Universal Fake Image Detectors that Generalize Across Generative Models |
|
127 |
2023-03-01 |
link |
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration |
|
127 |
2022-12-07 |
link |
ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation |
|
126 |
2023-03-30 |
link |
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation |
|
124 |
2023-04-03 |
link |
Generative Diffusion Prior for Unified Image Restoration and Enhancement |
|
123 |
2023-03-23 |
link |
Visual-Language Prompt Tuning with Knowledge-Guided Context Optimization |
|
122 |
2022-12-19 |
link |
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation |
|
121 |
2022-12-16 |
link |
Fake it Till You Make it: Learning Transferable Representations from Synthetic ImageNet Clones |
|
120 |
2022-11-14 |
link |
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding |
|
118 |
2022-11-29 |
link |
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding |
|
118 |
2022-12-14 |
link |
ECON: Explicit Clothed humans Optimized via Normal integration |
|
116 |
2022-08-25 |
link |
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining |
|
116 |
2023-06-01 |
link |
SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy |
|
115 |
2023-03-27 |
link |
Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective |
|
115 |
2023-06-01 |
link |
Query-Centric Trajectory Prediction |
|
115 |
2022-12-06 |
link |
Fine-tuned CLIP Models are Efficient Video Learners |
|
114 |
2022-05-26 |
link |
Revealing the Dark Secrets of Masked Image Modeling |
|
114 |
2023-04-17 |
link |
Affordances from Human Videos as a Versatile Representation for Robotics |
|
113 |
2022-11-21 |
link |
ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields |
|
113 |
2023-03-26 |
link |
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation |
|
112 |
2023-03-12 |
link |
Iterative Geometry Encoding Volume for Stereo Matching |
|
112 |
2023-06-01 |
link |
Camouflaged Object Detection with Feature Decomposition and Edge Reconstruction |
|
112 |
2021-02-23 |
link |
Deep Deterministic Uncertainty: A New Simple Baseline |
|
112 |
2022-11-23 |
link |
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation |
|
112 |
2023-03-10 |
link |
MVImgNet: A Large-scale Dataset of Multi-view Images |
|
111 |
2022-04-28 |
link |
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation |
|
111 |
2023-01-05 |
link |
Robust Dynamic Radiance Fields |
|
110 |
2022-11-23 |
link |
ReCo: Region-Controlled Text-to-Image Generation |
|
109 |
2023-03-28 |
link |
F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories |
|
109 |
2022-12-01 |
link |
Finetune like you pretrain: Improved finetuning of zero-shot vision models |
|
108 |
2022-12-16 |
link |
PointAvatar: Deformable Point-Based Head Avatars from Videos |
|
108 |
2022-06-09 |
link |
MobileOne: An Improved One millisecond Mobile Backbone |
|
107 |
2023-03-20 |
link |
Visual Prompt Multi-Modal Tracking |
|
107 |
2023-03-14 |
link |
V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception |
|
106 |
2023-01-06 |
link |
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior |
|
106 |
2022-03-20 |
link |
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation |
|
106 |
2022-12-13 |
link |
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders |
|
105 |
2022-11-16 |
link |
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis |
|
103 |
2022-12-08 |
link |
Generating Holistic 3D Human Motion from Speech |
|
103 |
2023-01-12 |
link |
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP |
|
102 |
2022-11-28 |
link |
Post-Training Quantization on Diffusion Models |
|
101 |
2022-12-06 |
link |
Diffusion-SDF: Text-to-Shape via Voxelized Diffusion |
|
100 |
2022-12-19 |
link |
Panoptic Lifting for 3D Scene Understanding with Neural Fields |
|
99 |
2022-11-17 |
link |
MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors |
|
98 |
2022-12-09 |
link |
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets |
|
97 |
2023-03-24 |
link |
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects |
|
97 |
2022-11-21 |
link |
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars |
|
96 |
2023-06-01 |
link |
GRES: Generalized Referring Expression Segmentation |
|
95 |
2023-01-05 |
link |
WIRE: Wavelet Implicit Neural Representations |
|
95 |
2023-03-29 |
link |
HOLODIFFUSION: Training a 3D Diffusion Model Using 2D Images |
|
95 |
2023-03-20 |
link |
Explicit Visual Prompting for Low-Level Structure Segmentations |
|
95 |
2023-02-14 |
link |
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation |
|
94 |
2023-03-22 |
link |
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval |
|
92 |
2022-11-12 |
link |
OpenGait: Revisiting Gait Recognition Toward Better Practicality |
|
92 |
2023-03-22 |
link |
Spherical Transformer for LiDAR-Based 3D Recognition |
|
91 |
2023-04-17 |
link |
Interactive and Explainable Region-guided Radiology Report Generation |
|
91 |
2022-05-16 |
link |
BBDM: Image-to-Image Translation with Brownian Bridge Diffusion Models |
|
91 |
2022-11-22 |
link |
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields |
|
91 |
2023-03-13 |
link |
TriDet: Temporal Action Detection with Relative Boundary Modeling |
|
90 |
2023-06-01 |
link |
MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion |
|
88 |
2023-06-01 |
link |
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion |
|
88 |
2023-04-24 |
link |
TensoIR: Tensorial Inverse Rendering |
|
86 |
2022-12-05 |
link |
Unifying Vision, Text, and Layout for Universal Document Processing |
|
86 |
2023-03-23 |
link |
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching |
|
86 |
2022-11-22 |
link |
Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring |
|
85 |
2023-01-15 |
link |
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets |
|
85 |
2022-06-02 |
link |
Siamese Image Modeling for Self-Supervised Vision Representation Learning |
|
85 |
2023-06-01 |
link |
Autoregressive Visual Tracking |
|
85 |
2023-01-05 |
link |
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling |
|
84 |
2022-12-16 |
link |
CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation |
|
84 |
2023-02-23 |
link |
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models |
|
84 |
2022-11-20 |
link |
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation |
|
83 |
2022-11-22 |
link |
Instant Volumetric Head Avatars |
|
83 |
2023-03-30 |
link |
Consistent View Synthesis with Pose-Guided Diffusion Models |
|
83 |
2023-02-27 |
link |
Aligning Bag of Regions for Open-Vocabulary Object Detection |
|
83 |
2022-06-16 |
link |
OmniMAE: Single Model Masked Pretraining on Images and Videos |
|
83 |
2023-03-18 |
link |
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation |
|
82 |
2023-03-25 |
link |
SUDS: Scalable Urban Dynamic Scenes |
|
81 |
2022-12-11 |
link |
Recurrent Vision Transformers for Object Detection with Event Cameras |
|
81 |
2023-03-23 |
link |
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field |
|
80 |
2023-03-14 |
link |
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation |
|
80 |
2023-01-30 |
link |
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis |
|
80 |
2023-03-16 |
link |
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation |
|
79 |
2022-12-20 |
link |
InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds |
|
79 |
2023-03-24 |
link |
Robust Test-Time Adaptation in Dynamic Scenarios |
|
79 |
2022-11-21 |
link |
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection |
|
78 |
2023-03-24 |
link |
Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing |
|
78 |
2022-11-19 |
link |
Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models |
|
78 |
2022-12-16 |
link |
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models |
|
77 |
2023-04-04 |
link |
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion |
|
77 |
2023-03-30 |
link |
Hierarchical Fine-Grained Image Forgery Detection and Localization |
|
77 |
2023-03-04 |
link |
Virtual Sparse Convolution for Multimodal 3D Object Detection |
|
77 |
2023-03-24 |
link |
Progressively Optimized Local Radiance Fields for Robust View Synthesis |
|
77 |
2023-03-22 |
link |
Dense Distinct Query for End-to-End Object Detection |
|
77 |
2023-03-20 |
link |
Leapfrog Diffusion Model for Stochastic Trajectory Prediction |
|
76 |
2023-04-20 |
link |
Collaborative Diffusion for Multi-Modal Face Generation and Editing |
|
76 |
2023-05-18 |
link |
3D Registration with Maximal Cliques |
|
75 |
2022-11-22 |
link |
Person Image Synthesis via Denoising Diffusion Model |
|
75 |
2022-11-27 |
link |
Diffusion Probabilistic Model Made Slim |
|
75 |
2023-05-01 |
link |
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation |
|
74 |
2023-04-14 |
link |
DCFace: Synthetic Face Generation with Dual Condition Diffusion Model |
|
73 |
2023-02-22 |
link |
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition |
|
73 |
2022-12-08 |
link |
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning |
|
73 |
2022-07-20 |
link |
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning |
|
72 |
2023-06-01 |
link |
Rethinking Federated Learning with Domain Shift: A Prototype View |
|
72 |
2022-10-26 |
link |
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization |
|
72 |
2022-12-15 |
link |
FlexiViT: One Model for All Patch Sizes |
|
72 |
2022-11-17 |
link |
CRAFT: Concept Recursive Activation FacTorization for Explainability |
|
72 |
2023-04-25 |
link |
CompletionFormer: Depth Completion with Convolutions and Vision Transformers |
|
71 |
2023-01-10 |
link |
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation |
|
71 |
2022-02-01 |
link |
DKM: Dense Kernelized Feature Matching for Geometry Estimation |
|
71 |
2023-03-26 |
link |
Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers |
|
71 |
2023-03-02 |
link |
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy |
|
71 |
2022-06-14 |
link |
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling |
|
71 |
2023-04-10 |
link |
Ambiguous Medical Image Segmentation Using Diffusion Models |
|
71 |
2022-07-21 |
link |
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild |
|
70 |
2023-06-01 |
link |
TryOnDiffusion: A Tale of Two UNets |
|
70 |
2023-02-06 |
link |
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval |
|
70 |
2022-04-17 |
link |
NICO++: Towards Better Benchmarking for Domain Generalization |
|
69 |
2023-03-28 |
link |
One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer |
|
69 |
2023-03-30 |
link |
Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks |
|
69 |
2022-06-24 |
link |
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning |
|
69 |
2022-12-10 |
link |
Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory |
|
69 |
2022-12-09 |
link |
ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal |
|
69 |
2023-03-25 |
link |
Selective Structured State-Spaces for Long-Form Video Understanding |
|
68 |
2023-01-04 |
link |
PACO: Parts and Attributes of Common Objects |
|
68 |
2022-08-02 |
link |
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries |
|
68 |
2022-11-10 |
link |
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts |
|
67 |
2023-03-24 |
link |
Grid-guided Neural Radiance Fields for Large Urban Scenes |
|
67 |
2023-01-05 |
link |
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training |
|
67 |
2023-03-02 |
link |
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation |
|
67 |
2022-11-19 |
link |
Parallel Diffusion Models of Operator and Image for Blind Inverse Problems |
|
67 |
2023-02-03 |
link |
vMAP: Vectorised Object Mapping for Neural Field SLAM |
|
66 |
2023-01-16 |
link |
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models |
|
66 |
2022-12-09 |
link |
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning |
|
66 |
2022-12-09 |
link |
VindLU: A Recipe for Effective Video-and-Language Pretraining |
|
66 |
2022-06-30 |
link |
LaserMix for Semi-Supervised LiDAR Semantic Segmentation |
|
66 |
2022-12-11 |
link |
How to Backdoor Diffusion Models? |
|
66 |
2023-04-06 |
link |
Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes |
|
66 |
2023-03-06 |
link |
Multimodal Prompting with Missing Modalities for Visual Recognition |
|
65 |
2023-06-01 |
link |
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images |
|
65 |
2022-11-21 |
link |
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models |
|
65 |
2022-12-01 |
link |
Learning to Generate Text-Grounded Mask for Open-World Semantic Segmentation from Only Image-Text Pairs |
|
65 |
2022-10-03 |
link |
Visual Prompt Tuning for Generative Transfer Learning |
|
65 |
2022-11-23 |
link |
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation |
|
64 |
2023-03-20 |
link |
EqMotion: Equivariant Multi-Agent Motion Prediction with Invariant Interaction Reasoning |
|
64 |
2023-03-30 |
link |
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation |
|
64 |
2023-05-10 |
link |
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving |
|
64 |
2023-06-01 |
link |
Multi-Label Compound Expression Recognition: C-EXPR Database & Network |
|
64 |
2023-03-20 |
link |
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving |
|
64 |
2022-11-28 |
link |
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization |
|
64 |
2022-09-30 |
link |
Smallcap: Lightweight Image Captioning Prompted with Retrieval Augmentation |
|
64 |
2023-03-24 |
link |
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning |
|
63 |
2023-03-02 |
link |
Token Contrast for Weakly-Supervised Semantic Segmentation |
|
63 |
2023-06-01 |
link |
Revisiting Reverse Distillation for Anomaly Detection |
|
63 |
2023-03-21 |
link |
Detecting Everything in the Open World: Towards Universal Object Detection |
|
63 |
2023-04-13 |
link |
iDisc: Internal Discretization for Monocular Depth Estimation |
|
63 |
2022-06-23 |
link |
EventNeRF: Neural Radiance Fields from a Single Colour Event Camera |
|
63 |
2023-03-29 |
link |
Generalized Relation Modeling for Transformer Tracking |
|
62 |
2023-03-30 |
link |
DiffCollage: Parallel Generation of Large Content with Diffusion Models |
|
62 |
2023-05-02 |
link |
Generalizing Dataset Distillation via Deep Generative Prior |
|
62 |
2022-11-22 |
link |
MagicPony: Learning Articulated 3D Animals in the Wild |
|
62 |
2023-04-02 |
link |
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild |
|
62 |
2023-03-29 |
link |
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert |
|
62 |
2023-06-01 |
link |
Implicit Identity Driven Deepfake Face Swapping Detection |
|
62 |
2022-11-29 |
link |
Wavelet Diffusion Models are fast and scalable Image Generators |
|
62 |
2021-12-13 |
link |
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability |
|
62 |
2023-03-03 |
link |
EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization |
|
62 |
2023-05-10 |
link |
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting |
|
61 |
2023-04-10 |
link |
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment |
|
61 |
2023-03-02 |
link |
Delivering Arbitrary-Modal Semantic Segmentation |
|
61 |
2023-02-24 |
link |
Decoupling Human and Camera Motion from Videos in the Wild |
|
61 |
2023-03-07 |
link |
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion |
|
60 |
2022-12-21 |
link |
TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization |
|
60 |
2023-03-25 |
link |
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification |
|
59 |
2023-03-01 |
link |
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation |
|
59 |
2023-03-13 |
link |
Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images |
|
59 |
2022-11-22 |
link |
PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces Using Permutohedral Lattices |
|
59 |
2023-06-01 |
link |
Improved Distribution Matching for Dataset Condensation |
|
58 |
2023-04-19 |
link |
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation |
|
58 |
2023-03-17 |
link |
A Dynamic Multi-Scale Voxel Flow Network for Video Prediction |
|
58 |
2023-03-23 |
link |
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° |
|
58 |
2023-04-04 |
link |
MonoHuman: Animatable Human Neural Field from Monocular Video |
|
58 |
2023-03-23 |
link |
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not |
|
58 |
2023-04-06 |
link |
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting |
|
58 |
2023-04-17 |
link |
Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model |
|
57 |
2023-03-22 |
link |
CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data |
|
57 |
2023-03-25 |
link |
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects |
|
57 |
2022-11-21 |
link |
Understanding and Improving Visual Prompting: A Label-Mapping Perspective |
|
57 |
2023-01-19 |
link |
Multiview Compressive Coding for 3D Reconstruction |
|
57 |
2023-03-18 |
link |
MotionTrack: Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking |
|
57 |
2023-03-21 |
link |
Affordance Diffusion: Synthesizing Hand-Object Interactions |
|
57 |
2023-06-01 |
link |
End-to-End Vectorized HD-map Construction with Piecewise Bézier Curve |
|
57 |
2022-11-18 |
link |
Task Residual for Tuning Vision-Language Models |
|
57 |
2023-03-01 |
link |
Multimodal Industrial Anomaly Detection via Hybrid Fusion |
|
57 |
2022-12-09 |
link |
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others |
|
57 |
2022-11-21 |
link |
Teaching Structured Vision & Language Concepts to Vision & Language Models |
|
56 |
2023-06-01 |
link |
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection |
|
56 |
2022-12-28 |
link |
Multi-Realism Image Compression with a Conditional Generator |
|
56 |
2023-03-18 |
link |
Sharpness-Aware Gradient Matching for Domain Generalization |
|
56 |
2022-06-21 |
link |
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs |
|
56 |
2023-06-01 |
link |
Ingredient-oriented Multi-Degradation Learning for Image Restoration |
|
55 |
2022-11-19 |
link |
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting |
|
55 |
2023-04-05 |
link |
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens |
|
55 |
2022-12-15 |
link |
Vision Transformers are Parameter-Efficient Audio-Visual Learners |
|
55 |
2023-03-30 |
link |
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation |
|
54 |
2023-04-08 |
link |
RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors |
|
54 |
2023-05-17 |
link |
ReasonNet: End-to-End Driving with Temporal and Global Reasoning |
|
54 |
2022-11-21 |
link |
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution |
|
54 |
2022-11-23 |
link |
Make-A-Story: Visual Memory Conditioned Consistent Story Generation |
|
54 |
2023-04-02 |
link |
DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks |
|
54 |
2023-06-01 |
link |
ObjectStitch: Object Compositing with Diffusion Model |
|
53 |
2022-12-03 |
link |
PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models |
|
53 |
2023-04-20 |
link |
Omni Aggregation Networks for Lightweight Image Super-Resolution |
|
53 |
2023-03-26 |
link |
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation |
|
53 |
2023-03-23 |
link |
Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective |
|
53 |
2022-11-23 |
link |
SVFormer: Semi-supervised Video Transformer for Action Recognition |
|
53 |
2023-04-14 |
link |
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement |
|
53 |
2023-05-11 |
link |
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers |
|
53 |
2023-06-01 |
link |
FFCV: Accelerating Training by Removing Data Bottlenecks |
|
53 |
2023-06-01 |
link |
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification |
|
53 |
2022-09-04 |
link |
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling |
|
53 |
2023-06-01 |
link |
Multi-Modal Learning with Missing Modality via Shared-Specific Feature Modelling |
|
53 |
2023-02-02 |
link |
RobustNeRF: Ignoring Distractors with Robust Losses |
|
52 |
2023-03-14 |
link |
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection |
|
52 |
2022-11-29 |
link |
Compressing Volumetric Radiance Fields to 1 MB |
|
52 |
2023-03-31 |
link |
3D Human Pose Estimation via Intuitive Physics |
|
52 |
2022-03-29 |
link |
Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection |
|
52 |
2023-01-20 |
link |
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer |
|
52 |
2023-04-17 |
link |
Learning to Render Novel Views from Wide-Baseline Stereo Pairs |
|
51 |
2023-03-15 |
link |
BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection |
|
51 |
2022-09-07 |
link |
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection |
|
51 |
2023-03-13 |
link |
DR2: Diffusion-Based Robust Degradation Remover for Blind Face Restoration |
|
51 |
2023-04-19 |
link |
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models |
|
51 |
2023-02-21 |
link |
PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction |
|
51 |
2023-05-15 |
link |
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors |
|
51 |
2022-11-25 |
link |
NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces with Arbitrary Topologies |
|
51 |
2022-11-11 |
link |
Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection |
|
51 |
2022-12-06 |
link |
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds |
|
50 |
2023-03-14 |
link |
Rotation-Invariant Transformer for Point Cloud Matching |
|
50 |
2023-03-10 |
link |
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets |
|
50 |
2023-03-13 |
link |
SCPNet: Semantic Scene Completion on Point Cloud |
|
50 |
2023-03-25 |
link |
Freestyle Layout-to-Image Synthesis |
|
50 |
2022-11-17 |
link |
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks |
|
50 |
2023-03-06 |
link |
Referring Multi-Object Tracking |
|
50 |
2022-12-22 |
link |
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis |
|
49 |
2023-03-10 |
link |
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection |
|
49 |
2022-12-12 |
link |
Accelerating Dataset Distillation via Model Augmentation |
|
49 |
2023-04-05 |
link |
HNeRV: A Hybrid Neural Representation for Videos |
|
49 |
2023-06-01 |
link |
Good is Bad: Causality Inspired Cloth-debiasing for Cloth-changing Person Re-identification |
|
49 |
2023-03-10 |
link |
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment |
|
49 |
2023-04-04 |
link |
OrienterNet: Visual Localization in 2D Public Maps with Neural Matching |
|
49 |
2023-06-01 |
link |
Dynamic Graph Learning with Content-guided Spatial-Frequency Relation Reasoning for Deepfake Detection |
|
49 |
2023-01-13 |
link |
CLIP the Gap: A Single Domain Generalization Approach for Object Detection |
|
49 |
2023-03-21 |
link |
Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary |
|
48 |
2023-01-23 |
link |
LEGO-Net: Learning Regular Rearrangements of Objects in Rooms |
|
48 |
2023-06-01 |
link |
MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding from Object Detection |
|
48 |
2023-05-10 |
link |
Low-Light Image Enhancement via Structure Modeling and Guidance |
|
48 |
2022-12-16 |
link |
GFPose: Learning 3D Human Pose Prior with Gradient Fields |
|
48 |
2023-03-15 |
link |
MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving |
|
48 |
2022-12-06 |
link |
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning |
|
48 |
2023-01-04 |
link |
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection |
|
48 |
2023-04-12 |
link |
Constructing Deep Spiking Neural Networks from Artificial Neural Networks with Knowledge Distillation |
|
48 |
2022-11-23 |
link |
BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields |
|
48 |
2022-12-14 |
link |
PD-Quant: Post-Training Quantization Based on Prediction Difference Metric |
|
47 |
2022-11-21 |
link |
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields |
|
47 |
2023-04-03 |
link |
Neural Volumetric Memory for Visual Locomotion Control |
|
47 |
2023-05-08 |
link |
PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds |
|
47 |
2022-12-05 |
link |
Prototypical Residual Networks for Anomaly Detection and Localization |
|
47 |
2023-03-24 |
link |
Decoupled Multimodal Distilling for Emotion Recognition |
|
47 |
2023-03-15 |
link |
Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification |
|
47 |
2022-12-22 |
link |
Removing Objects From Neural Radiance Fields |
|
47 |
2022-12-15 |
link |
VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction |
|
46 |
2023-03-30 |
link |
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation |
|
46 |
2022-11-26 |
link |
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis |
|
46 |
2022-12-11 |
link |
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery |
|
46 |
2022-11-23 |
link |
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising |
|
46 |
2023-01-24 |
link |
RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving |
|
46 |
2023-04-04 |
link |
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation |
|
46 |
2023-03-20 |
link |
Computationally Budgeted Continual Learning: What Does Matter? |
|
46 |
2023-04-21 |
link |
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers |
|
46 |
2022-12-02 |
link |
PROB: Probabilistic Objectness for Open World Object Detection |
|
46 |
2022-11-21 |
link |
NeRF-RPN: A general framework for object detection in NeRFs |
|
46 |
2023-06-01 |
link |
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation |
|
46 |
2023-02-28 |
link |
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images |
|
45 |
2023-06-01 |
link |
AltFreezing for More General Video Face Forgery Detection |
|
45 |
2023-03-13 |
link |
MP-Former: Mask-Piloted Transformer for Image Segmentation |
|
45 |
2022-12-12 |
link |
ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation |
|
45 |
2023-06-01 |
link |
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners |
|
45 |
2023-04-11 |
link |
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field |
|
45 |
2023-03-01 |
link |
Renderable Neural Radiance Map for Visual Navigation |
|
45 |
2023-03-26 |
link |
You Only Segment Once: Towards Real-Time Panoptic Segmentation |
|
45 |
2023-06-01 |
link |
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions |
|
45 |
2023-03-19 |
link |
StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields |
|
44 |
2023-06-01 |
link |
3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention |
|
44 |
2023-03-23 |
link |
Collaboration Helps Camera Overtake LiDAR in 3D Detection |
|
44 |
2022-07-04 |
link |
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection |
|
44 |
2023-05-23 |
link |
Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses |
|
44 |
2023-03-28 |
link |
DisWOT: Student Architecture Search for Distillation WithOut Training |
|
44 |
2023-01-18 |
link |
PIRLNav: Pretraining with Imitation and RL Finetuning for OBJECTNAV |
|
44 |
2023-03-27 |
link |
Label-Free Liver Tumor Segmentation |
|
44 |
2023-04-06 |
link |
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias |
|
44 |
2022-12-15 |
link |
Unsupervised Object Localization: Observing the Background to Discover Objects |
|
44 |
2023-03-30 |
link |
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery |
|
44 |
2023-03-26 |
link |
OTAvatar: One-Shot Talking Face Avatar with Controllable Tri-Plane Rendering |
|
44 |
2023-06-01 |
link |
Federated Domain Generalization with Generalization Adjustment |
|
44 |
2022-11-12 |
link |
MARLIN: Masked Autoencoder for facial video Representation LearnINg |
|
43 |
2022-12-15 |
link |
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation |
|
43 |
2023-04-04 |
link |
Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision |
|
43 |
2022-11-29 |
link |
UDE: A Unified Driving Engine for Human Motion Generation |
|
43 |
2022-12-06 |
link |
Unifying Short and Long-Term Tracking with Graph Hierarchies |
|
43 |
2023-03-26 |
link |
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning |
|
43 |
2023-03-14 |
link |
Diversity-Aware Meta Visual Prompting |
|
43 |
2023-03-13 |
link |
Align and Attend: Multimodal Summarization with Dual Contrastive Losses |
|
43 |
2022-11-23 |
link |
Texts as Images in Prompt Tuning for Multi-Label Image Recognition |
|
43 |
2022-06-09 |
link |
On Data Scaling in Masked Image Modeling |
|
43 |
2022-07-16 |
link |
Clover: Towards A Unified Video-Language Alignment and Fusion Model |
|
43 |
2022-12-09 |
link |
Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation |
|
43 |
2023-05-31 |
link |
Neural Kernel Surface Reconstruction |
|
43 |
2023-04-03 |
link |
Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising |
|
43 |
2022-08-08 |
link |
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature |
|
42 |
2023-04-13 |
link |
Representing Volumetric Videos as Dynamic MLP Maps |
|
42 |
2022-11-22 |
link |
DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors |
|
42 |
2022-11-02 |
link |
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language |
|
42 |
2023-01-06 |
link |
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars |
|
42 |
2023-03-07 |
link |
Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation |
|
42 |
2023-05-15 |
link |
NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation |
|
42 |
2022-11-30 |
link |
3D GAN Inversion with Facial Symmetry Prior |
|
42 |
2023-03-05 |
link |
PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow |
|
42 |
2022-12-15 |
link |
Real-Time Neural Light Field on Mobile Devices |
|
42 |
2022-12-06 |
link |
Semantic-Conditional Diffusion Networks for Image Captioning* |
|
42 |
2023-04-11 |
link |
Video Event Restoration Based on Keyframes for Video Anomaly Detection |
|
42 |
2023-06-01 |
link |
DISC: Learning from Noisy Labels via Dynamic Instance-Specific Selection and Correction |
|
42 |
2023-03-09 |
link |
Diversity-Measurable Anomaly Detection |
|
42 |
2023-06-01 |
link |
Change-Aware Sampling and Contrastive Learning for Satellite Images |
|
42 |
2022-12-27 |
link |
Interactive Segmentation of Radiance Fields |
|
41 |
2023-01-30 |
link |
Shape-Aware Text-Driven Layered Video Editing |
|
41 |
2022-12-09 |
link |
Ego-Body Pose Estimation via Ego-Head Pose Estimation |
|
41 |
2022-12-15 |
link |
DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients |
|
41 |
2023-04-13 |
link |
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing |
|
41 |
2023-03-25 |
link |
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning |
|
41 |
2023-06-01 |
link |
All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations |
|
41 |
2023-04-17 |
link |
Neural Map Prior for Autonomous Driving |
|
41 |
2023-06-01 |
link |
TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments |
|
41 |
2023-03-20 |
link |
Zero-Shot Noise2Noise: Efficient Image Denoising without any Data |
|
41 |
2023-03-22 |
link |
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection |
|
41 |
2023-03-06 |
link |
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision |
|
41 |
2022-12-31 |
link |
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models |
|
40 |
2022-11-21 |
link |
Vision Transformer with Super Token Sampling |
|
40 |
2022-12-01 |
link |
Hyperbolic Contrastive Learning for Visual Representations beyond Objects |
|
40 |
2023-03-05 |
link |
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes |
|
40 |
2023-04-03 |
link |
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation |
|
40 |
2022-12-08 |
link |
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection |
|
40 |
2022-12-14 |
link |
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics |
|
40 |
2023-03-25 |
link |
DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields |
|
40 |
2022-06-16 |
link |
MoDi: Unconditional Motion Synthesis from Diverse Data |
|
40 |
2022-11-25 |
link |
Fine-Grained Face Swapping Via Regional GAN Inversion |
|
40 |
2023-03-18 |
link |
DeAR: Debiasing Vision-Language Models with Additive Residuals |
|
39 |
2022-12-22 |
link |
Re-basin via implicit Sinkhorn differentiation |
|
39 |
2023-03-20 |
link |
3D Concept Learning and Reasoning from Multi-View Images |
|
39 |
2023-03-23 |
link |
Backdoor Defense via Adaptively Splitting Poisoned Dataset |
|
39 |
2023-04-16 |
link |
SeaThru-NeRF: Neural Radiance Fields in Scattering Media |
|
39 |
2023-02-28 |
link |
Turning a CLIP Model into a Scene Text Detector |
|
39 |
2022-12-21 |
link |
PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields |
|
39 |
2023-06-01 |
link |
Boundary-enhanced Co-training for Weakly Supervised Semantic Segmentation |
|
38 |
2023-03-31 |
link |
Zero-shot Referring Image Segmentation with Global-Local Context Features |
|
38 |
2023-02-18 |
link |
Temporal Interpolation is all You Need for Dynamic Neural Radiance Fields |
|
38 |
2023-03-30 |
link |
NeRF-Supervised Deep Stereo |
|
38 |
2023-06-01 |
link |
Multi-Level Logit Distillation |
|
38 |
2023-05-04 |
link |
LayoutDM: Transformer-based Diffusion Model for Layout Generation |
|
38 |
2023-03-30 |
link |
Mixed Autoencoder for Self-Supervised Visual Representation Learning |
|
38 |
2023-03-23 |
link |
Masked Image Training for Generalizable Deep Image Denoising |
|
38 |
2022-03-27 |
link |
UV Volumes for Real-time Rendering of Editable Free-view Human Performance |
|
38 |
2023-05-09 |
link |
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator |
|
38 |
2023-04-10 |
link |
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos |
|
38 |
2022-05-23 |
link |
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection |
|
38 |
2022-12-08 |
link |
MIME: Human-Aware 3D Scene Generation |
|
38 |
2023-04-06 |
link |
Continual Detection Transformer for Incremental Object Detection |
|
38 |
2023-04-12 |
link |
Hard Patches Mining for Masked Image Modeling |
|
38 |
2023-06-01 |
link |
Slimmable Dataset Condensation |
|
38 |
2023-03-25 |
link |
CFA: Class-Wise Calibrated Fair Adversarial Training |
|
38 |
2023-03-20 |
link |
Make Landscape Flatter in Differentially Private Federated Learning |
|
37 |
2023-01-18 |
link |
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis |
|
37 |
2023-02-07 |
link |
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking |
|
37 |
2023-06-01 |
link |
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style |
|
37 |
2022-12-09 |
link |
PIVOT: Prompting for Video Continual Learning |
|
37 |
2022-09-04 |
link |
Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection |
|
37 |
None |
link |
Clothing-Change Feature Augmentation for Person Re-Identification |
|
37 |
2022-02-15 |
link |
Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis |
|
37 |
2023-04-20 |
link |
NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering |
|
37 |
2023-03-24 |
link |
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining |
|
37 |
2022-12-15 |
link |
CLIPPO: Image-and-Language Understanding from Pixels Only |
|
37 |
2023-03-23 |
link |
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment |
|
37 |
2023-03-25 |
link |
Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images |
|
37 |
2023-04-02 |
link |
Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning |
|
37 |
2023-01-03 |
link |
Understanding Imbalanced Semantic Segmentation Through Neural Collapse |
|
37 |
2023-01-26 |
link |
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring |
|
37 |
2022-12-05 |
link |
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification |
|
37 |
2023-03-29 |
link |
Self-Positioning Point-Based Transformer for Point Cloud Understanding |
|
36 |
2023-04-23 |
link |
Evading DeepFake Detectors via Adversarial Statistical Consistency |
|
36 |
2023-04-09 |
link |
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model |
|
36 |
2023-06-01 |
link |
BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration |
|
36 |
2023-03-08 |
link |
X-Avatar: Expressive Human Avatars |
|
36 |
2022-07-04 |
link |
PVO: Panoptic Visual Odometry |
|
36 |
2023-03-06 |
link |
Continuous Sign Language Recognition with Correlation Network |
|
36 |
2023-04-10 |
link |
PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning |
|
36 |
2022-06-22 |
link |
A Simple Baseline for Video Restoration with Grouped Spatial-Temporal Shift |
|
36 |
2023-03-30 |
link |
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer |
|
36 |
2023-06-01 |
link |
Diffusion-Based Signed Distance Fields for 3D Shape Generation |
|
36 |
2023-06-01 |
link |
Efficient RGB-T Tracking via Cross-Modality Distillation |
|
36 |
2023-06-01 |
link |
ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients |
|
36 |
2022-11-16 |
link |
T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection |
|
36 |
2023-03-13 |
link |
Twin Contrastive Learning with Noisy Labels |
|
36 |
2022-12-08 |
link |
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution |
|
36 |
2022-11-21 |
link |
SceneComposer: Any-Level Semantic Image Synthesis |
|
35 |
2023-01-18 |
link |
Behind the Scenes: Density Fields for Single View Reconstruction |
|
35 |
2022-11-16 |
link |
A Generalized Framework for Video Instance Segmentation |
|
35 |
2023-03-24 |
link |
Class-Incremental Exemplar Compression for Class-Incremental Learning |
|
35 |
2023-02-17 |
link |
MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs |
|
35 |
2023-06-01 |
link |
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points |
|
35 |
2023-03-23 |
link |
Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection |
|
35 |
2022-06-09 |
link |
Simple Cues Lead to a Strong Multi-Object Tracker |
|
35 |
2022-03-23 |
link |
Deep Frequency Filtering for Domain Generalization |
|
35 |
2023-06-01 |
link |
MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation |
|
35 |
2023-04-17 |
link |
OVTrack: Open-Vocabulary Multiple Object Tracking |
|
35 |
2023-03-28 |
link |
Improving the Transferability of Adversarial Samples by Path-Augmented Method |
|
35 |
2023-03-08 |
link |
X-Pruner: eXplainable Pruning for Vision Transformers |
|
35 |
2023-01-05 |
link |
HierVL: Learning Hierarchical Video-Language Embeddings |
|
35 |
2022-12-06 |
link |
Learning Neural Parametric Head Models |
|
35 |
2023-06-01 |
link |
Color Backdoor: A Robust Poisoning Attack in Color Space |
|
35 |
2023-05-04 |
link |
Contrastive Mean Teacher for Domain Adaptive Object Detectors |
|
35 |
2023-04-09 |
link |
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention |
|
35 |
2022-11-14 |
link |
PMR: Prototypical Modal Rebalance for Multimodal Learning |
|
35 |
2022-10-03 |
link |
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models |
|
35 |
2023-03-21 |
link |
ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals |
|
35 |
2023-04-05 |
link |
Detecting and Grounding Multi-Modal Media Manipulation |
|
34 |
2023-03-06 |
link |
UniHCP: A Unified Model for Human-Centric Perceptions |
|
34 |
2022-08-27 |
link |
TrojViT: Trojan Insertion in Vision Transformers |
|
34 |
2023-06-01 |
link |
GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task |
|
34 |
2023-03-03 |
link |
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo |
|
34 |
2023-03-28 |
link |
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models |
|
34 |
2023-03-21 |
link |
Joint Visual Grounding and Tracking with Natural Language Specification |
|
34 |
2023-03-21 |
link |
Context De-Confounded Emotion Recognition |
|
34 |
2023-02-28 |
link |
Attention-Based Point Cloud Edge Sampling |
|
34 |
2023-06-01 |
link |
Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation |
|
34 |
2023-04-04 |
link |
On the Stability-Plasticity Dilemma of Class-Incremental Learning |
|
34 |
2023-01-12 |
link |
ViTs for SITS: Vision Transformers for Satellite Image Time Series |
|
34 |
2022-11-26 |
link |
Meta Architecture for Point Cloud Analysis |
|
34 |
2022-11-29 |
link |
SparsePose: Sparse-View Camera Pose Regression and Refinement |
|
34 |
2023-06-01 |
link |
Context-aware Alignment and Mutual Masking for 3D-Language Pre-training |
|
34 |
2022-05-26 |
link |
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers |
|
34 |
2022-10-06 |
link |
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning |
|
34 |
2023-04-11 |
link |
Continual Semantic Segmentation with Automatic Memory Sample Selection |
|
34 |
2023-04-10 |
link |
Improved Test-Time Adaptation for Domain Generalization |
|
34 |
2023-01-06 |
link |
End-to-End 3D Dense Captioning with Vote2Cap-DETR |
|
34 |
2023-01-02 |
link |
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory |
|
34 |
2023-06-01 |
link |
Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning |
|
34 |
2023-06-01 |
link |
Rethinking the Correlation in Few-Shot Segmentation: A Buoys View |
|
34 |
2023-04-23 |
link |
TransFlow: Transformer as Flow Learner |
|
34 |
2023-03-21 |
link |
Human Pose as Compositional Tokens |
|
33 |
2022-11-29 |
link |
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers |
|
33 |
2023-05-09 |
link |
PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces |
|
33 |
2023-03-26 |
link |
CelebV-Text: A Large-Scale Facial Text-Video Dataset |
|
33 |
2023-02-28 |
link |
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger |
|
33 |
2022-11-17 |
link |
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information |
|
33 |
2023-06-01 |
link |
Spatial-Frequency Mutual Learning for Face Super-Resolution |
|
33 |
2022-09-30 |
link |
PyPose: A Library for Robot Learning with Physics-based Optimization |
|
33 |
2023-04-06 |
link |
Micron-BERT: BERT-Based Facial Micro-Expression Recognition |
|
33 |
2023-06-01 |
link |
GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection |
|
33 |
2023-03-14 |
link |
I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs |
|
33 |
2023-02-28 |
link |
ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer |
|
33 |
2022-11-19 |
link |
LidarGait: Benchmarking 3D Gait Recognition with Point Clouds |
|
33 |
2022-12-29 |
link |
Efficient Movie Scene Detection using State-Space Transformers |
|
33 |
2022-07-07 |
link |
Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption |
|
33 |
2023-03-27 |
link |
Hi4D: 4D Instance Segmentation of Close Human Interaction |
|
33 |
2022-12-06 |
link |
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning |
|
33 |
2022-11-23 |
link |
Data-Driven Feature Tracking for Event Cameras |
|
33 |
2023-03-09 |
link |
Masked Image Modeling with Local Multi-Scale Reconstruction |
|
33 |
2023-06-01 |
link |
Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation |
|
33 |
2023-03-25 |
link |
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis |
|
33 |
2023-02-25 |
link |
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting |
|
33 |
2023-04-18 |
link |
Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection |
|
33 |
2023-03-21 |
link |
Data-Efficient Large Scale Place Recognition with Graded Similarity Supervision |
|
33 |
2023-03-20 |
link |
Feature Alignment and Uniformity for Test Time Adaptation |
|
33 |
2023-04-09 |
link |
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification |
|
33 |
2023-03-23 |
link |
Detecting Backdoors in Pre-trained Encoders |
|
33 |
2023-04-17 |
link |
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection |
|
32 |
2023-03-29 |
link |
Fair Federated Medical Image Segmentation via Client Contribution Estimation |
|