PhD student, Carnegie Mellon University
My work focuses on developing algorithms that learn from natural language interactions with humans
When humans learn about a new concept or phenomenon, they rely on rich forms of supervision, including explanations, examples and interactive dialogue. In contrast, modern computers learn through techniques from AI and machine learning, which traditionally depend on large databases of informations (colloquially called big data). If we wish to make computer learning as efficient as human learning, we need to develop methods that can learn from natural language interactions. In my research, I argue that learning from language is a viable paradigm for automated machine learning systems, which presents several advantages that can enable more efficient learning.
First, language can be used to naturally frame new learning tasks, such as by describing relevant features that convey a human understanding of a domain. For example, in predicting the risk of heart attack, a doctor can say: `Check if the patient's BMI is more than 25'. Such a question can parsed by a language interpreter to a structured query, and answered from a patient's health record, thus defining an important attribute to be considered for each new patient.
Second, natural language explanations are also often rich in information that can guide the working of machine learning models, minimizing the need for labeled data. For example, everyday language contains quantification expressions (such as 'all', 'some', 'rarely', 'usually', etc.) that are explicit denoters of generality. Similarly, natural language often conveys explicit declarative knowledge about a domain that may be difficult to learn from data alone (e.g., If a female is above 70 years and has a BMI more than 30, she is definitely at risk of heart disease). Such explanations can be used to guide machine learning models by constraining them to emulate the teacher's advice.
Finally, language allows a natural medium for proactive dialog on the part of a computer, such as by seeking clarifications about specific examples, validating its predictions, asking questions for filling its information gaps, etc. Computers could levarage such interactions with humans to simplify and validate their learning.
The goal of my research is to develop computer algorithms that can leverage such information, and hence provide a conceptual interface for guiding machine learning algorithms using natural language advice. Such interfaces that can be taught by a non-expert could bring the potential of machine learning to the masses.
Abstract: Characterizing relationships between people is fundamental for the understanding of narratives. In this work, we address the problem of inferring the polarity of relationships between people in narrative summaries. We formulate the problem as a joint structured prediction for each narrative, and present a model that combines evidence from linguistic and semantic features, as well as features based on the structure of the social community in the text. We also provide a clustering-based approach that can exploit regularities in narrative types. e.g., learn an affinity for love-triangles in romantic stories. On a dataset of movie summaries from Wikipedia, our structured models provide more than a 30% error-reduction over a competitive baseline that considers pairs of characters in isolation.
Pub.: 30 Nov '15, Pinned: 05 Aug '17
Abstract: This paper presents an approach to classify documents in any language into an English topical label space, without any text categorization training data. The approach, Cross-Lingual Dataless Document Classification (CLDDC) relies on mapping the English labels or short category description into a Wikipedia-based semantic representation, and on the use of the target language Wikipedia. Consequently, performance could suffer when Wikipedia in the target language is small. In this paper, we focus on languages with small Wikipedias, (Small-Wikipedia languages, SWLs). We use a word-level dictionary to convert documents in a SWL to a large-Wikipedia language (LWLs), and then perform CLDDC based on the LWL's Wikipedia. This approach can be applied to thousands of languages, which can be contrasted with machine translation, which is a supervision heavy approach and can be done for about 100 languages. We also develop a ranking algorithm that makes use of language similarity metrics to automatically select a good LWL, and show that this significantly improves classification of SWLs' documents, performing comparably to the best bridge possible.
Pub.: 13 Nov '16, Pinned: 05 Aug '17
Abstract: Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.
Pub.: 25 Apr '17, Pinned: 05 Aug '17
Abstract: To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%.
Pub.: 24 Jul '17, Pinned: 05 Aug '17