Wednesday, 10 June 2026
| 15:45 | Join in |
| 16:00 | Welcome, Motivation & Introduction |
| 16:05 | Introduction to Vision–Language Models |
| Overview of core concepts behind models like CLIP and how they connect images and text | |
| 16:50 | Live Demo: ONiT Explorer in Action |
| Explore a real-world application and see similarity search with image–text embeddings | |
| 17:20 | Break |
| 17:35 | Hands-On Session: Build Your Own Similarity Pipeline |
| Implement image–text matching and test text query functionality Bring your own images or use the Ottoman Nature in Travelogues dataset | |
| 18:20 | Wrap-Up & Discussion |
| 18:30 | End of course |