Augmented Reality is a promising interaction paradigm for learning applications. It has the potential to improve learning outcomes by merging educational content with spatial cues and semantically relevant objects within a learner's everyday environment. The impact of such an interface could be comparable to the method of loci, a well known memory enhancement technique used by memory champions and polyglots. However, using Augmented Reality in this manner is still impractical for a number of reasons. Scalable object recognition and consistent labeling of objects is a significant challenge, and interaction with arbitrary (unmodeled) physical objects in AR scenes has consequently not been well explored. To help address these challenges, we present a framework for in-situ object labeling and selection in Augmented Reality, with a particular focus on language learning applications. Our framework uses a generalized object recognition model to identify objects in the world in real time, integrates eye tracking to facilitate selection and interaction within the interface, and incorporates a personalized learning model that dynamically adapts to student's growth. We show our current progress in the development of this system, including preliminary tests and benchmarks. We explore challenges with using such a system in practice, and discuss our vision for the future of AR language learning applications.