In the last few months, in my free time, I have been developing a small application to elaborate RGBD datasets capturing people. In particular, my goal is to create 3D scans of heads. I do not expect these models to be well-made or even usable without some processing, but I wonder if I can transform this data at least to starter models.
I first started to work with RGBD cameras during my internship at Altair. At the time, I built a small pipeline to reconstruct models based on Kinect Fusion.
Sadly, Kinect Fusion is an online method. The advantage is that it will give immediate feedback if it loses the camera tracking. But the disadvantage is that it needs a pretty powerful GPU. I have one on my desktop, but I took all my datasets using laptops.
Also, in my experience, getting a usable dataset with Kinect Fusion requires several attempts and time (the acquisition must proceed very slowly), which, generally speaking, is not always compatible with… people 😄️. They might move, or lose patience, etc etc…
The novelty of Kinect Fusion was how it joined other proposals from the literature in an online pipeline. In particular, the main blocks are ICP (Iterative Closest Point) for camera tracking and volumetric reconstruction for merging the frames.
But these algorithms also work offline, and the Open3D open-source library includes them. However, I encountered two major problems when I tried to use them.
First, ICP is a local algorithm: it can refine the alignment between two following frames very well, but only as long as they are already roughly aligned. In practice, this is achieved with slow and limited camera movements. Moreover, Kinect Fusion helps by giving immediate feedback if this local alignment fails.
The second problem is that sensors, firmware, and software are imperfect. Noise apart, sometimes the color and depth stream are not in sync, or images are very blurry. Therefore, some frame pairs should just be discarded. Going slowly during the acquisition also helps in this respect.
My solution is to first cherry-pick frames by visually inspecting the color data overlapped to a depth heatmap. Then, I align them manually before running ICP.
Therefore, I wrote a small program to visualize the point clouds and interactively transform them (rotate and translate). I wrote it in C++, with some basic OpenGL for visualization and Dear ImGui for the controls.
In its first implementation, this program saved the manual alignment to a JSON file, and then I ran ICP with Open3D through a Python script that read and updated the same file.
It worked, but switching from one to the other was not a great UX, and eventually, I ended up implementing everything in the C++ application after also adding Open3D as a dependency.
I published the source code on GitHub, but it is just a toy project. Therefore, it did not grow with the best practices, and some choices were made for simplicity’s sake.
At the moment, it provides everything I described above. It makes it possible to choose the frames, align them pairwise, merge them, and export the result as a point cloud or a triangle mesh.
The latter is built with marching cubes. Therefore, it does not have a good topology and could be simplified a lot.
I had some promising results with Instant Meshes, but it loses color information. However, it is not a big deal because we have the RGB images that we could use as texture. My current idea is to add another step, to reload a processed mesh, and add the UV coordinates by matching vertices with the scan data. However, I still have not figured out a nice UX to resolve overlaps. This will likely be the direction of my future developments.