TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

by gmays | View on Hacker News