We downsample high-resolution images so that the shorter side is 1024 pixels (MetaQuery_Instruct_2.4M) or 512 pixels (MetaQuery_Instruct_2.4M_512res)
Xichen Pan PRO
xcpan
AI & ML interests
None yet
Recent Activity
upvoted a paper about 2 hours ago
Playful Agentic Robot Learning upvoted a paper about 2 hours ago
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer upvoted a paper 4 days ago
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space