Slav Petrov, Google Inc., USA
Title: Towards Universal Syntactic Processing of Natural Language
Abstract: In this talk I will describe some ongoing work towards a universal representation of morphology and syntax that makes it possible to model different languages in a consistent way. I will then describe some approaches for learning accurate and fast syntactic parsers from treebanks, and also how to effective leverage additional unlabeled data, especially in the context of high-capacity neural network learners. Finally, I will highlight some examples of how we have successfully used syntax at Google to improve downstream applications like question answering and machine translation. This work is carried out at Google by teams in New York, Mountain View and London.
Bio: Slav Petrov is a researcher in Google's New York office, leading a team that works on syntactic parsing and its applications to information extraction, question answering and machine translation. He holds a PhD degree from UC Berkeley, where he worked with Dan Klein. Before that he completed a Master's degree at the Free University of Berlin and was a member of the FU-Fighters team that won the RoboCup world championship in 2004. His work on fast and accurate multilingual syntactic analysis was recognized with best paper awards at ACL 2011 and NAACL 2012. He received the John Atanasoff Award by the President of Bulgaria in 2014. Slav also teaches Statistical Natural Language Processing at New York University.
Mirella Lapata, University of Edinburgh, UK
Title: Large-scale Semantic Parsing as Graph Matching
Abstract: Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them. Semantic parsing addresses the specific task of learning to map natural language to machine interpretable formal meaning representations. Traditionally, sentences are converted into logical form grounded in the symbols of some fixed ontology or relational database. Approaches for learning semantic parsers have been for the most part supervised, using manually annotated training data consisting of sentences and their corresponding logical forms. More recently, methods which learn from question-answer pairs have been gaining momentum as a means of scaling semantic parsers to large, open-domain problems. In this talk, I will present an approach to semantic parsing that does not require example annotations or question-answer pairs but instead learns from a large knowledge base and web-scale corpora. Our semantic parser exploits Freebase, a large community-authored knowledge base that spans many sub-domains and stores real world facts in graphical format, and parsed sentences from a large corpus. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. We convert the output of an open-domain combinatory categorial grammar (CCG) parser into a graphical representation and subsequently map it onto Freebase, guided by denotations as a form of weak supervision. Evaluation experiments on two benchmark datasets show that our semantic parser improves over state-of-the-art approaches.
Joint work with Siva Reddy and Mark Steedman
Daniel Zeman, Charles University, Prague
Title: From the Jungle to the Park: Harmonizing Annotations across Languages
Abstract: In this talk I will describe my work towards universal representation of morphology and dependency syntax in treebanks of various languages. Not only is such harmonization advantageous for linguists-users of corpora, it is also a prerequisite for cross-langauge parser adaptation techniques such as delexicalized parsing. I will present Interset, an interlingua-like tool to translate morphosyntactic representations between tagsets; I will also show how the features from Interset are used in a recent framework called Universal Dependencies. Some experiments with delexicalized parsing on harmonized data will be presented. Finally, I will discuss the extent to which various morphological features are important in the context of statistical dependency parsing.