Twenty-first SIGMORPHON Workshop on Computational Phonology, Morphology, and Phonetics

Twenty-first SIGMORPHON Workshop on Computational Phonology, Morphology, and Phonetics

All times are Mexico City time (GMC-6)

Workshop to be held hybrid online and at NAACL on June 20, 2024

9:25 – 9:30 Opening Statements

9:30 – 10:30 - Invited Talk: Jian Zhu

10:30–11:00 - Coffee Break

11:00 - 12:00 - Session 1

  • 11:00 - 11:20: J-UniMorph:Japanese Morphological Annotation through the Universal Feature Schema

    Kosuke Matsuzaki, Masaya Taniguchi, Kentaro Inui and Keisuke Sakaguchi

  • 11:20 - 11:40: Ye Olde French:Effect of Old and Middle French on SIGMORPHON-UniMorph Shared Task Data

    William Kezerian, Lam An Wyner, Sandro Ansari, and Kristine M. Yu

  • 11:40 - 12:00: VeLePa:a Verbal Lexicon of Pame

    Borja Herce

12:00 – 13:00 Lunch

13:00 – 14:00 - Session 2

  • 13:00 - 13:20: Acoustic barycenters as exemplar production targets

    Frederic Mailhot and Cassandra L. Jacobs

  • 13:20 - 13:40: Japanese Rule-based Grapheme-to-phoneme Conversion System and Multilingual Named Entity Dataset with International Phonetic Alphabet

    Yuhi Matogawa, Yusuke Sakai, Taro Watanabe and Chihiro Taguchi

  • 13:40 - 14:00: Decomposing Fusional Morphemes with Vector Embeddings

    Michael Ginn and Alexis Palmer

14:00 – 15:00 Invited Talk: Naomi Feldman

15:00 – 15:30 - Session 3 (ACL Findings)

  • 15:00 - 15:15 - Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies

    Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter and Rahul Gupta

  • 15:15 - 15:30 - Low-resource neural machine translation with morphological modeling

    Antoine Nzeyimana

15:30 – 16:00 Coffee Break

16:00 – 17:00 - Session 4

  • 16:00 - 16:20: Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

    Catherine Arnett, Tyler Chang and Sean Trott

  • 16:20 - 16:40 The Effect of Model Capacity and Script Diversity on Subword Tokenization for Sorani Kurdish

    Ali Salehi and Cassandra L. Jacobs

  • 16:40 - 17:00 More than Just Statistical Recurrence:Human and Machine Unsupervised Learningof M¯aori Word Segmentation across Morphological Processes

    Ashvini Varatharaj and Simon Todd