How Pixels to Points Works
The Pixels to Points™ tool in Global Mapper’s Lidar Module uses a process of Automated Aerial Triangulation to reconstruct the 3D scene present in overlapping images. This computationally intensive process may seem like magic, but it relies on basic concepts of vision and photogrammetry. Photogrammetry is the science of taking real-world measurements from photographs. Let’s pull back the curtain to reveal how this process works.
Based on photogrammetry techniques, the location, size, and shape of objects can be derived from photographs taken from different angles. By combining views from multiple images, the location of distinct parts of the image are triangulated in 3D space. This is similar to how depth perception works with two eyes; since the object in front of you is viewed from two slightly different angles, the brain can perceive how far away the object is.
In traditional photogrammetry with stereo-image pairs, the two angles of the image allow the photogrammetrist to measure objects in the image and determine their real world size. With automated techniques using many overlapping images, the entire 3-dimensional nature of the scene being photographed can be reconstructed.
Automated Aerial Triangulation involves a number of steps to get from the original images to 3D point clouds, terrain models, textured 3D models, and orthoimages. The first step is to detect distinct features in each image, and then match those features across the adjacent images. The challenge is to automatically detect distinct features that may be at different scales and rotations in each of the images.
After the features are tracked through the images, the initial reconstruction begins with a process called Structure from Motion (SfM). In the context of mapping technology, the structure of the 3D scene is revealed based on the motion of the camera. This process calculates the precise orientation of the cameras relative to each other and to the scene, and builds the basic surface structure of the scene. This is the point where the selected Analysis Method is applied. The Incremental Analysis Method starts with a set of the best matching photos, and incrementally adds the features from subsequent images into the scene to build the 3D reconstruction. This works well for drone-collected images collected over a large area in a grid pattern. The reconstruction will typically start somewhere near the center of the scene, and work outwards. The Global Method, by contrast, takes information from all of the images together and builds the scene all at once. This makes for a faster process, but it also requires a higher degree of overlap between adjacent images. This is recommended if the images are collected focusing on an object of interest, such as a building, especially when all of the images focus on that central area or object. The result of the Structure from Motion analysis is a sparse point cloud that builds the basic structure of the scene, and a set of precisely oriented cameras that show where and in what direction the images were taken relative to each other
The final step of the Automated Aerial Triangulation process involves filling in additional details from each image that was calibrated as part of the scene. This process is called Multi-view Stereo. It involves calculating the depth of each part of the image (i.e. how far away it is from the camera), and then fusing those depth maps to keep the points that appear in multiple images.
This process generates the final dense 3D point cloud. Based on the options selected, there may be further processing to convert the point cloud into a refined mesh surface (3D Model) that is photo-textured by projecting the images onto it. This option also produces the highest quality orthoimage, removing relief distortions based on the 3D mesh surface.
An important initial step in the Pixels to Points process is removing the lens distortion in the image. While the photograph may appear as a flat image capture of the target area to the untrained eye, most photographs contain some distortion, particularly towards the edge of the image, where you can see the effect of the curvature of the camera lens. Pixels to Points will remove distortion in the image based on the Camera Type setting. Most standard cameras need correction for the basic radial lens distortion in order to create an accurate 3D scene. The default camera type setting, ‘Pinhole Radial 3’, corrects for the radial lens distortion (using 3 factors). In some cases it might be beneficial to use the ‘Pinhole Brown 2’ camera model, which accounts for both radial distortion and tangential distortion, where the lens and sensor are not perfectly parallel.
Some cameras have the ability to perform a calibration, which automatically removes distortion in the image. If the Pixels to Points tool detects from the image metadata that the images have been calibrated, it will switch to the ‘Pinhole’ camera model. If you know your images have already had the distortion removed either by the camera, or some other software, choose the ‘Pinhole’ camera model, which will not apply any additional distortion removal. The final two Camera Type options account for the more extreme distortion of Fisheye or Spherical lenses. Select these options if appropriate for your camera.
An important part of transferring the information in the image into a real world scale is knowing some basic camera and image information. The focal length and sensor width values allow for a basic calculation of how large objects are in the image, and thus how far away they are from the camera. What is calculated using these values is a ratio between a known real world size (the sensor width) and the pixel equivalent of that size in the image. This is a starting point for reconstructing the 3D scene. Focal Length information is typically stored in the image metadata. Global Mapper includes a database of sensor widths based on the camera model, however, you may be prompted for this value if your camera is not in the database. You can obtain this information from the device manufacturer.
The basic position of each camera is typically stored in the image metadata (EXIF tags). With a standard camera this location is derived from GPS, from which average horizontal accuracy is within a few meters. There are a few ways to improve the accuracy of the resulting data based on the desired accuracy, and decisions about cost vs. time spent.
The GPS sensors contained in most cameras may have sufficient horizontal accuracy for some applications. However, the corresponding height values are usually less accurate and are based on an ellipsoidal height model. A basic height correction can be performed using the options for Relative Altitude. This will anchor the output heights based on the ground height where the drone took off (the height of the ground in the first image). You can enter a specific value, or Global Mapper can automatically derive the value from loaded terrain data or online references (USGS NED or SRTM).
One way to correct the position of the output data is through the use of Ground Control Points. This is a set of surveyed points with known X,Y,Z locations that should be evenly distributed throughout the scene. The measured ground control point locations need to be visually identifiable throughout the corresponding images, so it’s common to use a set of crosshairs or targets placed on the ground throughout the collection area before the images are captured.
Ground Control Points can be loaded into the Pixels to Points tool and the corresponding locations identified in multiple input images. This will align the scene based on the control points taking precedence over the camera positions. This procedure is a more time-intensive option, but is streamlined through a process whereby the images containing each point are highlighted, It is also possible to use Ground Control Points after the output files have been generated. Global Mapper provides various tools for this, including 3D rectification and the Lidar QC tool, which can also provide accuracy assessment information.
Hardware manufacturers provide options for improving the accuracy of the positional information by communicating with a reference base station in addition to satellites, and by performing additional corrections based on available information at the time of the image collection. This includes both Real-Time Kinematic and Post-Processing Kinematic options. With some systems, higher accuracy positioning information is written into image metadata, which can be used directly in the Pixels to Points tool. Other systems may save the higher accuracy positions in a text file, in which case you will want to load your images into the Pixels to Points tool and use the option to Load Image Positions from External File.
Understanding the variables and data requirements for the Pixels to Points tool and other SfM processes will help you to collect images better suited for processing. In turn, this will create higher quality results for further geospatial analysis.
The latest version of the Global Mapper Lidar Module includes several enhancements, many of which apply to the Pixels to Points tool for generating point clouds and 3D meshes from drone-captured images. If this blog piqued your interest and you’d like to find out if the Lidar Module of Global Mapper is the right application for you, download a 14-day free trial and request a demo today!