This paper presents a 3D pointing interface application to signal a UAV's target in a large-scale environment. This system enables UAVs equipped with a monocular camera to determine which window of a building is selected by a human user in large-scale indoor or outdoor environments. The 3D pointing interface consists of three parts: YOLO, OpenPose, and ORB-SLAM. YOLO detects the target objects, e.g., windows, OpenPose extracts the user pose, and ORB-SLAM builds a scale-dependent 3D map, a set of 3D sparse feature points. To obtain the visual scale, it performs a calibration step with the user standing in front of the UAV at a certain distance. We detail how we chose the gesture, localize and detect objects, and transform between coordinate systems. The real-world experiment results showed that the 3D pointing interface obtained a 0.73 F1-score average and a 0.58 F1-Score at the maximum distance of 25 meters between UAV and building.