Request for Public Eye-Tracking Datasets to Train a Vision Transformer with Human-Like Attention
Hi everybody. I am conducting a research project on human visual attention and Vision Transformer (ViT) networks. In the first phase, I compared human visual attention with ViT attention using images of handcrafted objects. Participantsβ gaze fixations were recorded with an eye tracker (data such as βgaze_positions_on_surfaceβ), generating human attention heatmaps. These were compared with ViT-generated heatmaps, using metrics like KL divergence to quantify similarities and differences. The results showed convergence in some areas but also significant differences.
In the current phase, my goal is to modify or train the ViT so its attention aligns more closely with human patterns. I plan to use human gaze fixations as guidance during fine-tuning, so the model learns to mimic human attention on new images.
For this, I need a large public eye-tracking dataset with human gaze fixations on images, ideally including raw gaze coordinates or preprocessed heatmaps. My dataset from the first phase is too small for training a model capable of generalizing.
Question: Do you know of any large, publicly available datasets with eye-tracking data (human gaze/fixations on images) suitable for training a Vision Transformer to replicate human visual attention? Recommendations of datasets with clear annotations and compatible formats for deep learning would be greatly appreciated.
Thanks!