Hello folks! We in my lab are trying to evaluate the impact of different pupil segmentation methods on gaze estimation using offline calibration of data recorded in an HMD. This involves first recording a calibration and some gaze behavior to targets at known locations, then we will go to pupil player and re-process using offline pupil identification, offline calibration (via reference locations), and offline gaze mapping, with the goal of calculating the angular error between the L/R gaze vectors and the virtual targets. I believe that this means that we need to use the data generated by Pupil Player's offline calibration process with data from Unity about the target’s position either in world space or in the head’s local space. Can you please provide any insight into how we might do this? My understanding is that offline calibration will tell me the gaze vector within the (real-world) eye camera space. How is the HMD-Eyes package transforming this gaze vector into the virtual gaze vector within the Unity environment?
Also, for offline processing, should I replace the eye intrinsics file with the vive_200hz_eye1.intrinsics and vive_200hz_eye0.intrinsics files provided with pupil core version 2.4?
You only need to replace the intrinsics if the recording was upgraded from a version that did not record any eye intrinsics yet. You can simply make sure and replace the eye intrinsics everywhere.
Ok, we're using a new version, so I take this as a no, I do not need to update the intrinsics. Thanks.
My understanding is that offline calibration will tell me the gaze vector within the (real-world) eye camera space. hmd-eyes sends calibration locations in Unity-main-camera coordinates to Capture during calibration. Therefore, Capture's gaze mapping maps pupil data to gaze in Unity-main-camera coordinates. The unity main camera represent's the subject's field of view.
How is the HMD-Eyes package transforming this gaze vector into the virtual gaze vector within the Unity environment? See this on how to transform unity-main-camera coordinates to unity-world coordinates https://github.com/pupil-labs/hmd-eyes/blob/master/docs/Developer.md#map-gaze-data-to-vr-world-space (Not necessary for your experiment)
Approach A) I think the simplest solution would be to adjust hmd-eyes calibration marker visualization to use the normal calibration markers. This way you can use the built-in reference location detector in Player.
Approach B) Capture will store the matched reference data in the calibration.data notification which is also recorded. You should be able to write some custom code that extracts these and converts them to reference location format. This is much more work than approach A)
...and, thank you very much for the prompt and thorough reply!
I think what you're implying is that approach A would allow me to map to normalized screen space of the main camera using Pupil Player, and then I could use the code that you shared with me as an example of how to convert this into Unity world coordinates. Is this what you were implying? My only concern / source of confusion is that if I'm defining reference locations post-hoc for an offline calibration, then these reference locations will be in a different format than is necessary for this approach. This concern is motivated by your comment that "hmd-eyes sends calibration locations in Unity-main-camera coordinates to Capture during calibration." If I'm working offline in pupil player and marking reference locations in the screen cast, then the coordinates will not be in main-camera coordinates.
To measure accuracy, you do not need unity world coordinates. You can measure it in unity-main-camera coordinates with cyclopian 3d gaze directions. The linked code is just a linear transformation of these vectors into unity-world coordinates. Pupil software calculates accuracy by transforming 2d image-plane coordinates into 3d scene-camera coordinates (cyclopian 3d gaze directions). In case of hmd-eyes, the unity-main-camera coordinate system is the scene camera coordinate system. Screencast sends a projection of the unity-main-camera's field of view and the necessary camera projection matrix in order to perform (un)projections.
So summarized, there are 3 coordinate systems: (A) Unity world, (B) unity main camera, (C) projected scene image. Since B does not have distortion, all transformations between these coordinate systems are linear. Only issue for A<->B is that the transformation matrix depends on the user's head-pose at a specific time point t
while the B<->C is static/fixed.
3d angle differences can only be calculated in A and B. Automatic marker detection in Player can only be done in C. But Player converts C to B by default.
During calibration, hmd-eyes sends reference locations in B to Capture. Capture optimizes gaze estimation in A though. For that, it transforms B to A internally.
This all makes some sense, but I want to make sure that I'm not losing some of the critical information needed to go from C->B when turning off the stored calibration in pupil player so that I can recalibrate in an offline manner. After that, I agree that calculating angular error can happen in head space (B) or world space (A) with equivalent results.
hmd-eyes does not send any information in C to Capture. Only B coordinates.
C is being used exclusively within Unity to visualize e.g. gaze within the unity scene/world
You can image C alternatively as an AOI/surface/head-pose-tracker-model. The concept is always the same.
C is not relevant for gaze estimation accuracy as ~~Pupil software does not optimize for it~~ as it is independent of Pupil software's gaze estimation. In the worst case, C<->B introduces an additional error source, e.g. due to incorrectly estimating an AOI location. In case of unity, this error is the error of the VR system's head pose tracking.
Ok, thank you for your insights! I'm going to crunch on this for a while / share it with my team and see if I can figure things out.
You are welcome. Let me know if there are any specifics that need clarifications.
Ok, here's the source of confusion. Scene coordinates (e.g. reference location coordinates) are in 2D. You suggest that I convert these to a gaze direction within 3D Unity main cam / scene cam space. I was concerned that to go from 2D image coordinates to 3D scene camera coordinates requires that one account for camera intrinsics, and we had not discussed this. However, I realize that I get this "for free" from your use of the 3D eye model fit.
So, what this means is that I can specify image features in 2D image coordinates, and Pupil Player will automatically convert these into cyclopean agze directions within camrea/head space.
100% correct, using the built-in validation feature in the post-hoc calibration plugin
account for camera intrinsics That is the B<->C transformation matrix. You get it for free but not from the 3d eye model fit but from unity itself.
Urhm, .... the inverse projection matrix?
The projection matrix is B->C. What I need is C->B
...and that requires knowledge of camera intrinsics. (Sorry, we're spinning in circles a bit, here!)
The screencast sends the projection matrix (B<->C) with each frame. The backend saves it https://github.com/pupil-labs/pupil/blob/master/pupil_src/shared_modules/video_capture/hmd_streaming.py#L171 and uses it to instantiate a pinhole camera model https://github.com/pupil-labs/pupil/blob/master/pupil_src/shared_modules/video_capture/hmd_streaming.py#L209 which supports transforming coordinates between both coordinate systems, in both directions!
Ahhhhhhh. Ahhhhh.... ahhh. OK! THis is the missing information .
The inverse is calculated by OpenCV (reference: https://github.com/pupil-labs/pupil/blob/master/pupil_src/shared_modules/camera_models.py#L621-L692)
Thank you very much! Now I have all I need to reconsider our approach. Thanks. I also have one bit of thought to offer you that is related to this, but not really necessary for my own progress. more something for your team to think about.
If I'm a neuroscientist that is perhaps interested in something like peak saccadic velocity, or saccadic amplitude then I need to know how far / fast the biological eye is rotating in the biological head. This requires knowledge of the state of the biological eye, and its gaze normal. You guys are not reporting that. You are assuming that the geometry of the biological eye and the HMD's image (after the influence of optics) has the same geometry as the virtual camera and the near clipping plane.
If the real eye is actually closer or further from the screen than this pinhole camera suggests, then this pinhole camera model will be inaccurate, as will the gaze kinematics / dynamics derived from the model.
Does that concern make some sense? I'm not suggesting a solution, just pointing out a potential issue that perhaps should be clarified in the documentation, since many of your users are neuroscientists.
...another way to describe it is that if the true FOV is not matched to the virtual FOV, there will be scaling errors (at the least). One could also imagine the eye being offcenter, which would result in mismatches that reflect shearing.
You guys are not reporting that. We are reporting the state of the eye via the 3d eye model. You can calculate rotation velocity based on it quite easily. This can be measured in eye camera coordinates, independent of any assumptions about the scene. Calculating eye rotation from a cyclopian gaze vector within the scene camera is just an approximation (as the fixation detector does it)
Another way to put it: the projection matrix in Unity that you guys are using to calculate gaze normals assumes a pinhole camera model which is a model of the real world projection of the image upon the eye.
This model may not be correct. So, what you're saying is true, assuming that this projection matrix accurately accounts for the real world eye/screen geometry.
...I'm just pointing out that it may not be true. When I put on the HMD, my eye may be closer to the screen than yours, but I don't believe your system would account for that.
...because the projection matrix would not change between users (Unity has no idea where your eye is in the HMD)
Not exactly. We use the scene camera image as an approximation for the subject's field of view (which cannot be recorded perfectly anyway). Gaze is estimated within the scene camera as this is what is being recorded. After 3d calibration, eye positions (eye_center0/1) and gaze normals originating from there (gaze_normal0/1) are reported in the 3d gaze datum in B-coordinates (scene camera coordinates)
gaze_point_3d is composed based on the intersection of the gaze normals and represents the cyclopian gaze in scene camera coordinates, originating in the scene camera.
The further the scene camera is away from the subject's field of view, the bigger the parallax error gets that is included in the cyclopian gaze estimation
Yes, this is the issue I'm emphasizing - it could lead to inaccurate description of biological kinmatics/dynamics. How much? I'm not sure. We could probably model that error as a function of deviations of FOV from the assumed FOV.
When I put on the HMD, my eye may be closer to the screen than yours, but I don't believe your system would account for that. Actually, this is exactly what the 3d calibration estimates.
Hurmn. Ok! I think this means that I need to reconsider what the information going into the fitting procedure is, and what comes out. Thanks, papr!
It really depends what you want to measure and which data you use for it. If you want to calculate eye kinematics, then you should use 3d pupil data. If you are interested in gaze behavior in a scene context use gaze.
The eye cameras are fixed to the hmd. The 3d calibration estimates the transformation between eye cameras (D1 and D2) and scene (C), resulting in C<->D1 and C<->D2. pupil data is measured in D1/2 and transformed into C using these matrices
Gotcha. Is there a flow-chart somewhere that describes the coordinate transformations of data making its way through the HMD-Eyes pipeline?
Unfortunately, not. But I think that would be great content for a new blog post. I will forward this idea to the team.
Thanks again. You're a king!
No problem.
Does anybody have experience with using the eye tracker with only one eye ? Pupil lab took the option to calibrate with only one eye in version 2.6 and we don't have the knowledge to write a custom 3d hmd gazer, a custom hmd calibration client and a custom choreography class (as suggested by pupil lab).
Yo! has anyone ever tried to use the vr/ar lens inserts using the Vive wireless adapter? Is there a way to send the signal wirelessly? There's a spare USB input on the Vive wireless adapter, but when I plug the pupil lab inserts into that nothing happens. Not sure if anyone has ever tried this. Any help would be very much welcome 🙂
Hi, I can see that 2d tracking is no longer supported in hmd-eyes unity package, with 3d tracking I never achieved comparable results, is there any recommended way of going back to 2d version?
Only by using past hmd-eyes releases (< v1.0)
also what is the purpose of https://github.com/pupil-labs/hmd-eyes/blob/4d74e79903083c849701f5f34241691ccbff7aa9/plugin/Scripts/CalibrationController.cs#L257 looks like norming x y to square fov?
Thanks. Are there any pros of doing calibration in case when position of eye cameras relative to tracking origin is known? My understanding is that 3d calibration tries to figure out eye pos in scene coordinate but coordinates relative to eye cameras are available based on model without any calibration.
Yes, calibration also compensates for the individual differences between visual and optical axes in subjects.
Hi, I've been setting up the eye-tracker in the unity package, but I keep failing the calibration. I tried some debugging to see what the reason for failure was. The dictionary from pupilservice outputs for the key Reason: Not sufficient data available. how should I interpret this error?
Before you calibrate, please ensure that the participant's pupils are robustly detected and tracked. You can check this by looking at the eye windows in Pupil Capture. There should be red ellipse overlaying each pupil
Hey guys, don't know if I'm missing something, but I was wondering if we can generate heatmaps in VR using unity. I know that heatmaps can be generated by defining surfaces using AprilTag markers, however I do not know if this applies to creating heatmaps for VR environments, and if so, how to do this. Any information would be appreciated.
I'm trying to get the pupil headset to work with the Epson Moverio. Has anyone tried this before? Has anyone can help me whay I can connect pupil labs and moverio bt 300.
Hi all, I want to use the Pupil labs HMD mounted tracker for research purposes. I got it up and running, but it seems that I cannot get good quality data (especially from the left eye). The camera images are sharp, but the circle estimating where the pupil should be, is all over the place. Am I correct to think that the green circle should be approximately the size of your eyeball? In my case it is very variant (between very small and very large) which seems to influence the pupil dilation estimate (which is an absolute no-go when using pupil dilation for research). Is there any guide on how to get good quality data from this tracker in particular?
In addition to the comments from @user-73b616, if the pupils are robustly detected and tracked, you will see a red ellipse overlaying the pupil, like this.
did you adjust Pupil Detector 2D properties and potentially set ROI? once you adjust them it is also good just to reset 3d model https://docs.pupil-labs.com/core/software/pupil-capture/#fine-tuning-pupil-detection
@user-44af31 other steps you can take to improve pupil detection are to ask the wearer to roll their eyes to stabilise the model before calibrating. You can also try to improve the contrast between the pupil and the remaining eye regions, by 1) manually adjust exposure time in the Pupil Capture eye process: Click Video Source, and exposure time is under Sensor Settings; 2) Adjust gain, brightness, and contrast in UVC settings
Thanks, both :)! Will fiddle around a bit more.