Twenty-first SIGMORPHON Workshop on Computational Phonology, Morphology, and Phonetics

All times are Mexico City time (GMC-6)

Workshop to be held hybrid online and at NAACL on June 20, 2024

9:25 – 9:30 Opening Statements

9:30 – 10:30 - Invited Talk: Jian Zhu

10:30–11:00 - Coffee Break

11:00 - 12:00 - Session 1

11:00 - 11:20: J-UniMorph:Japanese Morphological Annotation through the Universal Feature Schema

Kosuke Matsuzaki, Masaya Taniguchi, Kentaro Inui and Keisuke Sakaguchi
11:20 - 11:40: Ye Olde French:Effect of Old and Middle French on SIGMORPHON-UniMorph Shared Task Data

William Kezerian, Lam An Wyner, Sandro Ansari, and Kristine M. Yu
11:40 - 12:00: VeLePa:a Verbal Lexicon of Pame

Borja Herce

12:00 – 13:00 Lunch

13:00 – 14:00 - Session 2

13:00 - 13:20: Acoustic barycenters as exemplar production targets

Frederic Mailhot and Cassandra L. Jacobs
13:20 - 13:40: Japanese Rule-based Grapheme-to-phoneme Conversion System and Multilingual Named Entity Dataset with International Phonetic Alphabet

Yuhi Matogawa, Yusuke Sakai, Taro Watanabe and Chihiro Taguchi
13:40 - 14:00: Decomposing Fusional Morphemes with Vector Embeddings

Michael Ginn and Alexis Palmer

14:00 – 15:00 Invited Talk: Naomi Feldman

15:00 – 15:30 - Session 3 (ACL Findings)

15:00 - 15:15 - Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies

Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter and Rahul Gupta
15:15 - 15:30 - Low-resource neural machine translation with morphological modeling

Antoine Nzeyimana

15:30 – 16:00 Coffee Break

16:00 – 17:00 - Session 4

16:00 - 16:20: Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Catherine Arnett, Tyler Chang and Sean Trott
16:20 - 16:40 The Effect of Model Capacity and Script Diversity on Subword Tokenization for Sorani Kurdish

Ali Salehi and Cassandra L. Jacobs
16:40 - 17:00 More than Just Statistical Recurrence:Human and Machine Unsupervised Learningof M¯aori Word Segmentation across Morphological Processes

Ashvini Varatharaj and Simon Todd

Twenty-first SIGMORPHON Workshop on Computational Phonology, Morphology, and Phonetics

Twenty-first SIGMORPHON Workshop on Computational Phonology, Morphology, and Phonetics

Workshop to be held hybrid online and at NAACL on June 20, 2024

9:25 – 9:30 Opening Statements

9:30 – 10:30 - Invited Talk: Jian Zhu

10:30–11:00 - Coffee Break

11:00 - 12:00 - Session 1

11:00 - 11:20: J-UniMorph:Japanese Morphological Annotation through the Universal Feature Schema

Kosuke Matsuzaki, Masaya Taniguchi, Kentaro Inui and Keisuke Sakaguchi

11:20 - 11:40: Ye Olde French:Effect of Old and Middle French on SIGMORPHON-UniMorph Shared Task Data

William Kezerian, Lam An Wyner, Sandro Ansari, and Kristine M. Yu

11:40 - 12:00: VeLePa:a Verbal Lexicon of Pame

Borja Herce

12:00 – 13:00 Lunch

13:00 – 14:00 - Session 2

13:00 - 13:20: Acoustic barycenters as exemplar production targets

Frederic Mailhot and Cassandra L. Jacobs

13:20 - 13:40: Japanese Rule-based Grapheme-to-phoneme Conversion System and Multilingual Named Entity Dataset with International Phonetic Alphabet

Yuhi Matogawa, Yusuke Sakai, Taro Watanabe and Chihiro Taguchi

13:40 - 14:00: Decomposing Fusional Morphemes with Vector Embeddings

Michael Ginn and Alexis Palmer

14:00 – 15:00 Invited Talk: Naomi Feldman

15:00 – 15:30 - Session 3 (ACL Findings)

15:00 - 15:15 - Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies

Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter and Rahul Gupta

15:15 - 15:30 - Low-resource neural machine translation with morphological modeling

Antoine Nzeyimana

15:30 – 16:00 Coffee Break

16:00 – 17:00 - Session 4

16:00 - 16:20: Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Catherine Arnett, Tyler Chang and Sean Trott

16:20 - 16:40 The Effect of Model Capacity and Script Diversity on Subword Tokenization for Sorani Kurdish

Ali Salehi and Cassandra L. Jacobs

16:40 - 17:00 More than Just Statistical Recurrence:Human and Machine Unsupervised Learningof M¯aori Word Segmentation across Morphological Processes

Ashvini Varatharaj and Simon Todd