A Study on Cross-Lingual Dependency Parsing

Author(s): Miriam Wanner

Mentor(s): Antonis Anastasopoulos, Computer Science

Abstract

Previous studies showed that the dependency parsing models, when trained on many languages, are able to perform well on languages that have not been seen by the model, and that monolingually, isomorphic graph structures between the training and testing data have an effect on the performance of the models. We aim to understand why these cross lingual algorithms work and why graph isomorphism is a predictor of parser performance. We search for new techniques which take advantage of our findings and improve performance on low resource languages.

Video Transcript

Hello! I am Nate Krasner, a Senior at George Mason University studying computer science. And I’m Miriam Wanner, a third year at the University of Virginia studying math and computer science. This project is a part of George Mason’s educational data mining REU. Our project is focused in Natural Language Processing, specifically dependency parsing, and our mentor on this project is Antonis Anastasopoulos. In 2019, a group of researchers trained a dependency parsing model on 75 languages. This model performed surprisingly well on languages which it had never seen before. We want to understand why dependency parsers are able to transfer across languages and how we can utilize that to make tools for low resource languages. We need these dependency parsing tools to analyze language and provide meaningful features. We want to develop these tools for all languages in the world, including low resource languages, so that these technologies are available to people who speak any language. Understanding what influences the performance of these models helps us improve them. A common use for dependency parsing is for rule-based information extraction over dependency trees. For example, a dialogue agent for booking flights might ask “Where would you like to fly?” If someone answers “I would like to fly to San Francisco,” the system would know that the desired destination is San Francisco based on the information extracted from a dependency tree. Dependency trees are a way to represent a sentence with arcs between words that indicate head-dependent relations. These dependency relations are labelled with tags, such as nominal subject, direct object, and other tags. It is easier to read the sentence when it is flattened like this, but we usually look at them like this as it better shows the shape. When training a parsing model, there is a training set and a testing set. The model is allowed to see and learn from the training set, but not the testing set. This is because we want our tests to show how well the model performs on data or examples it has not seen before. We hypothesize that leakage plays a role in the performance of dependency parsing models, not only within a single language, but also across multiple languages. We call graphs that have the same shape isomorphic. When we are computing leakage, we are examining how many trees are isomorphic between the training and testing set. We calculate leakage as the number of trees in the testing set that are isomorphic to those in the training set divided by the total number of trees in the testing set. One consideration we make when calculating leakage is whether or not to take the labels into account. These labels include the dependency relations as well as the parts of speech. We are also interested in subtree leakage as another possible predictor for the performance of the parser. Subtree leakage is the same as normal leakage, but calculated across smaller subtrees, rather than the whole tree. For our research we used data from Universal Dependencies, which has data for over 100 languages. For any given language, the data size ranges from less than 1,000 sentences to several million sentences. The amount of data we have is the most significant predictor in whether a model performs well or not. It becomes hard to create language processing tools for languages with a small amount of data. We call these low resource languages. Our goal is to find other ways to improve parser performance on these low resource languages without needing to increase the amount of data. Anders Søgaard showed that leakage has an effect on models in a monolingual setting. From our data in a multilingual setting, it seems to be a negative effect when looking at full tree leakage, but a positive effect when looking at subtree leakage. The lines in this graph are weighted by the leakage between each language. It’s clear from the thickness of the connections between related languages that leakage is also a good indicator of the relationships between languages. There exist many models for dependency parsing, and it seems that leakage explains the performance of some, but not all of these models. It typically explains the performance of a model on unseen languages very well. We plan to continue training and testing in cross-lingual and multilingual settings. We want to train on languages that are high resource, and want to see how well it performs on low resource languages and why it performs the way it does. We also want to see if the parser generalizes globally or locally, in other words, whether it can generalize substructures without needing to have seen a specific structure before. We hope that our research provides a platform for future research in low resource languages for creating parsers which can accurately predict dependency trees using only the limited data which exists for certain languages.

2 replies on “A Study on Cross-Lingual Dependency Parsing”

Leave a Reply