SIGMORPHON - Special Interest Group on Computational Morphology and Phonology

Kenny Smith (Edinburgh)
Title: Simplicity and expressivity in the evolution of linguistic systems
Abstract: Language is a product of learning in individuals, and universal structural features of language presumably reflect properties of the way in which we learn. But language is not necessarily a direct reflection of properties of individual learners: languages are culturally-transmitted systems, which persist in populations via a repeated cycle of learning and use, where learners learn from linguistic data which represents the communicative behaviour of other individuals who learnt their language in the same way. Languages evolve as a result of this cycle of learning and use, and are therefore the product of a potentially complex interplay between the biases of human language learners, the communicative functions which language serves, and the ways in which languages are transmitted in populations. In this talk I will focus on the observation that natural languages appear to achieve a near-optimal tradeoff between simplicity (i.e. of the underlying grammatical system) and expressivity (their communicative utility); in several domains, natural languages seem to be the simplest of the highly expressive systems, and the most highly expressive of the simple systems. I’ll present a series of computational models and experiments with human participants showing how this tradeoff can be explained as a consequence of the cycle of learning and use by which natural languages persist, with learning and use imposing distinct pressures which jointly produce the observed simplicity-expressivity trade-off. I’ll end by discussing some recent work addressing more puzzling cases of apparently non-functional complexity, and whether these can be explained in the same framework.
- Kenny Smith is based in the Centre for Language Evolution in the School of Philosophy, Psychology and Language Sciences at the University of Edinburgh. He uses computational and experimental methods to study the evolution of language and the human capacity for language. He is particularly interested in how languages are shaped by their repeated learning and use, and how this cultural evolutionary process in turn shapes the cognitive capacities underpinning language learning. He has an MA in Linguistics and Artificial Intelligence, an MSc in Cognitive Science, and a PhD in Linguistics, all from the University of Edinburgh. His first faculty position was in Psychology at Northumbria University in 2006. He returned to Edinburgh as a lecturer in 2010, and was promoted to professor in 2017. He is the Chair of the Cognitive Science Society (https://cognitivesciencesociety.org), and serves on the Executive Committee of the Cultural Evolution Society (https://culturalevolutionsociety.org/).
Kristine Yu (Amherst)
Title: Building Phonological Trees
Abstract: Computational perspectives from string grammars have richly informed our understanding of phonological patterns in natural language in the past decade. However, a prevailing theoretical assumption of phonologists since the 1980s has been that phonological patterns and processes are computed on trees built with prosodic constituents such as syllables, feet, and prosodic words. This talk explores how perspectives from tree grammars can provide insight into our understanding of prosodic representations, including different ways in which tones can enter the grammar. properties of inflectional paradigms as formal indicators (McWhorter, 2011, among others). However, several studies have shown that such approaches do not allow for principled ways of comparing the morphological complexities of languages with different morphological properties (e.g., Bonami and Henri, 2010). A decade ago, information-theoretic approaches were introduced, whose aim it was to take into account system-wide properties of inflectional systems (Blevins 2013). While some approaches chose to assess the overall informational content of morphological descriptions (Sagot and Walther, 2011), others studied the predictability of inflected forms based on other forms (Ackerman et al. 2009). Although these approaches are not incompatible – as we will discuss – we will also illustrate ways in which they are all subject to a number of limitations, some fatal (Sagot, 2018). Building on the newer information-theoretic approaches, more recent work has extended the question of purely formal morphological complexity to issues involving speaker-related complexity. Here, interest has shifted towards complexity in morphological processing and learning (Ackerman and Malouf, 2013). This work, which focuses on the relation between paradigm entropy and cognitive cost, has also been successfully complemented by experimental approaches that highlight how morphological systems are not learned and processed in isolation. They participate in an intricate linguistic system, where subsystem interactions are learned and drawn upon by speakers (Filipović Ðurđević and Milin, 2018). In return, we will also show how these system-wide interactions can be captured formally through observable cross-dependencies between morphological and syntactic patterns that challenge the boundaries traditionally drawn between morphology and other linguistic subfields.
- Kristine is an Associate Professor in the Department of Linguistics at University of Massachusetts Amherst. Her primary research interests are the production, computation, and processing of prosody, with a focus on tonal and intonational phenomena and interfaces. She integrates experimental and computational approaches and fieldwork, including work on African American English and Samoan.
Reut Tsarfaty (Bar-Ilan)
Title: More Than Morphs: Getting More Out of UniMorph
Abstract: Morphological processes such as inflection and reinflection are studied and evaluated in NLP nowadays with the help of UniMorph (UM), a large collection of labeled inflection tables of over a hundred topologically different languages. In this talk we look closely at the current version of UniMorph and assess its design and content. Specifically, we ask whether UM is a necessary component of morphological reinflection (or would minimal supervision be enough), whether the current version of UM is sufficient for morphological reinflection (or whether there are some aspects missing), and, importantly, whether the word forms in UM provide the right level of granularity for annotating morphology (as opposed to, for instance, phrase-level or clause-level). We derive answers to these questions from both theoretical arguments and empirical evidence, and conclude with concrete suggestions on steps that may be taken to push UM to the next level of studying computational morphology, in accord with contextualized embeddings and downstream tasks.
- Reut Tsarfaty is Associate Professor at Bar-llan University, leading the Open Natural Language Processing research lab (The ONLP Lab). Her research focuses on natural language parsing broadly interpreted to cover morphological, syntactic and semantic phenomena, extended for the analysis of typologically different languages. She is a founder and instigator of the SPMRL community and shared tasks, a member of the UD steering committee, and as of recently, also a UniMorph enthusiast. Applications Reut has worked on include (but are not limited to) natural language programming, natural language navigation, automated essay scoring, analysis and generation of social media content, and more. Reut’s research is funded by an ERC-Starting-Grant #677352 and an ISF grant #1739/26.
Ekaterina Vylomova (Melbourne)
Title: The Secret Life of Words: Exploring Regularity and Systematicity
Abstract: In the 1960s, Hockett proposed a set of essential properties that are unique to human language such as displacement, productivity, duality of patterning, and learnability. Regardless of the language we use, these features allow us to produce new utterances and infer their meanings. Still, languages differ in the way they express meanings, or as Jacobson put it, “Languages differ essentially in what they must convey and not in what they may convey”. From a typological point of view, it is crucial to describe and understand the limits of cross-linguistic variation. In this talk, I will focus on cross-lingual annotation and regularities in inflectional morphology. More specifically, I will discuss the UniMorph project, an attempt to create a universal (cross-lingual) annotation schema, with morphosyntactic features that would occupy an intermediate position between the descriptive categories and comparative concepts. UniMorph allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and a bundle of universal morphological features defined by the schema. Since 2016, the UniMorph database has been gradually developed and updated with new languages, and SIGMORPHON shared tasks served as a platform to compare computational models of inflectional morphology. During 2016–2021, the shared tasks made it possible to explore the data-driven systems’ ability to learn declension and conjugation paradigms as well as to evaluate how well they generalize across typologically diverse languages. It is especially important, since elaboration of formal techniques of cross-language generalization and prediction of universal entities across related languages should provide a new potential to the modeling of under-resourced and endangered languages. In the second part of the talk, I will outline certain challenges we faced while converting the language-specific features into UniMorph (such as case compounding). In addition, I will also discuss typical errors made by the majority of the systems, e.g. incorrectly predicted instances due to allomorphy, form variation, misspelled words, looping effects. Finally, I will provide case studies for Russian, Tibetan, and Nen.
- Ekaterina Vylomova is a Lecturer and a Postdoctoral Fellow at the University of Melbourne. Her research is focused on compositionality modelling for morphology, models of inflectional and derivational morphology, linguistic typology, diachronic language models, and neural machine translation. She co-organized SIGTYP 2019 – 2021 workshops and shared tasks and the SIGMORPHON 2017 – 2021 shared tasks on morphological reinflection.