The shared task has concluded! Thanks to all those who participated. All data (including the test sets) will be hosted on this site. Please read here for a detailed analysis of submitted systems and the results.
The shared task consists of two sub-tasks. Sub-task 1 asks participants to inflect word forms based on labeled examples. Sub-task 2 asks participants to fill partially filled out paradigms based on a limited number of full paradigms seen as training data.
Systems may compete in either or both of these sub-tasks. Training examples and development examples will be provided for each sub-task. For each language, the possible inflections are taken from a finite set of morphological tags.
Data Quantity
For each sub-task, varying amounts of labeled training data (low/medium/high) are given to assess systems’ capability of generalizing in both low and high-resource scenarios. Performance is evaluated independently under each of the three data quantity conditions.
Sub-Task 1 - Inflection
Given a lemma (the dictionary form of a word) with its part-of-speech, generate a target inflected form.
Example
Source form and features: release V;NFIN
Target tag: V;V.PTCP;PRS
Target form: releasing
Sub-Task 2 - Paradigm Cell Filling
Given a lemma, a part-of-speech tag, and an incomplete partial paradigm, complete the remaining paradigm cells. Note that for languages with smaller paradigms, it is often the case that no additional forms (other than the lemma) are observed.
Example
Source lemma and part of speech: release V
Incomplete (covered) paradigm:
release release V;NFIN
release -- V;3;SG;PRS
release -- V;V.PTCP;PRS
release released V;PST
release -- V;V.PTCP;PST
Target (uncovered) paradigm:
release release V;NFIN
release releases V;3;SG;PRS
release releasing V;V.PTCP;PRS
release released V;PST
release released V;V.PTCP;PST
Data Format
The training and development data is provided in a simple utf-8 encoded text format where each line in a file is an example that consists of word forms and corresponding morphosyntactic descriptions (MSDs) provided as a set features, separated by semicolons. The fields on a line are TAB-separated. For sub-task 1, the fields are: lemma, target form, MSD. An example from the English training data:
touch touching V;V.PTCP;PRS
In the training data, all three fields are given. During the test phase, field 2 is omitted. In sub-task 2, entire inflection tables are given in the same format. For example:
touch touch V;NFIN
touch touches V;3;SG;PRS
touch touching V;V.PTCP;PRS
touch touched V;PST
touch touched V;V.PTCP;PST
In the training data, full tables are provided as in the above example; in the test phase, some entries in the second column are omitted and need to be filled in. Thus, we provide a “covered” and an “uncovered” file, as discussed above.
Evaluation
In sub-task 1, systems should predict a single string for each test example. In sub-task 2, systems should predict a single string for each missing paradigm cell.
We will distribute an evaluation script for your use on the development data. The script will report (for both sub-task 1 and sub-task 2):
- Accuracy = fraction of correctly predicted forms
- Average Levenshtein distance between the prediction and the truth across all predictions
The official evaluation script that will be used for our internal evaluation is provided here. You are encouraged to do ablation studies to measure the advantage gained from particular innovations. You should perform these studies on the development data and report the findings in your system description paper.
We will use the same script to evaluate your system’s output on the test data. If multiple correct answers are possible, we will accept any string from the correct set. For example, in Sub-Task 1, the two senses of English lemma hang
have different Past
forms, hung
and hanged
.
We will evaluate on each language separately. An aggregate evaluation will weight all languages equally (by macro-averaging), including the surprise languages released later during the development period.
In an overview paper for the shared task, we will compare the performance of submitted systems in detail. We will evaluate
- which systems are significantly different in performance, especially in low-resource scenarios
- which examples were hard and which types of systems succeeded on them
- which systems would provide complementary benefit in an ensemble system
Rules and External Resources
For the sake of fair competition, no external resources are permitted for the main competition track, e.g., training models on multiple languages at once or using external corpora. However, we do not wish to stymie creativity. Thus, any system that does use data from multiple languages at once, or other external resources, may be submitted to the shared task alongside a restricted submission. All those submission employing external resources will be evaluated in a second, separate track. Importantly, we will not consider these additional submissions when selecting a winning system. However, we will report on their performance and give credit for interesting innovations.
For systems that make use of external monolingual corpora, we have provided a list of approved external corpora (Wikipedia text dumps) for use in semi-supervised approaches. We ask that only these corpora be used for such techniques. The links to the Wikipedia dumps (please use those from March 1st, 2017) are provided in the next section.
Also, we ask that participants only train models on the provided training data and withhold the development data for tuning hyperparameters to allow for apples-to-apples comparisons among all systems.
Submission of Shared Task Results
We will release the test data on May 20th. It will be in the same format as the training and dev data. Please run your system for each language and each task for which you wish to submit an entry into the competition. The output format should be a text file identical to the train and dev files for the given task. You will be adding the missing last column of answers to the test files. To see an example of the submission format, see https://github.com/sigmorphon/conll2017/tree/master/evaluation, where we have run the baseline system and dumped the output. You will have until May 30th to submit your answers. However, please also work on your final write-ups during the testing extension, as it is harder for us to extend CoNLL-related deadlines.
Email the resulting text files (as an archive) to conll.sigmorphon.2017@gmail.com with the subject in the format: INSTITUTION–XX–Y, where you should replace institution with the name of your institution and XX with an integral index (in case of multiple systems from the same institution). In the case of multiple institutions, please place a hyphen between each name. If there are any additional details you would like us to know about your system or resources you used, please write a short description in the body of the email. Finally, Y should specify which data the system uses: choose Y = 0 for the basic setting (no external data, e.g., external corpora or cross-lingual data) and Y = 1 for extended setting. As example, we consider the submission title JOHNSHOPKINS-01-0, which would the submission without external data from Johns Hopkins University, team 01.
Please name your solution files “LANG-SIZE-out”, for example “finnish-low-out”, etc, and place the output files in task1 and task2, respectively, depending on the task. For example, task1/finnish-low-out should contain the output for the task 1 low-resource Finnish setting and task2/finnish-low-out should contain the output for the task 2 low-resource Finnish setting. In other words, we are following the directory structure in this folder: https://github.com/sigmorphon/conll2017/tree/master/evaluation/sample-output, which you may use as a model. Please archive the entire directory structure (it will only contain two folders: task1 and task2) and email it to the address above. Each group may submit as many systems as they like (just change the XX value), but please send one email per unique setting of the variables in INSTITUTION–XX–Y.
System Description Paper
Each team that submits a system is invited to submit a system description paper. The paper should follow the CoNLL 2017 guidelines, which may be found at http://www.conll.org/cfp-2017. The system description papers will constitute a separate volume of the CoNLL 2017 proceedings, entitled “Proceedings of the CoNLL–SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection.” Each paper should be no longer than 8 pages, not including references. Shorter papers are certainly welcome! The paper is due May 30 and the submission link may be found at https://www.softconf.com/acl2017/conll-sigmorphon-st2017.
In the paper, the participants are encouraged to provide the details necessary to replicate their system, and to discuss their results. Comparison to previous work is also encouraged, as is insightful error analysis and potential ways problems might be fixed in future work.
Two members of the organizing committee will review each paper. All papers that meet the basic requirements for a CoNLL submission and provide an adequate system description will be accepted. Each accepted paper will be presented as a poster at CoNLL. Furthermore, the organizers will write a summary paper that distills the key ideas from the individual system descriptions and highlight which systems worked best in which scenarios, along with any interesting innovations.
@InProceedings{cotterell-conll-sigmorphon2017,
author = {Cotterell, Ryan and Kirov, Christo and Sylak-Glassman, John and Walther, G{\'e}raldine and Vylomova, Ekaterina and Xia, Patrick and Faruqui, Manaal and K{\"u}bler, Sandra and Yarowsky, David and Eisner, Jason and Hulden, Mans},
title = {The {CoNLL-SIGMORPHON} 2017 Shared Task: Universal Morphological Reinflection in 52 Languages},
booktitle = {Proceedings of the CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection},
month = {August},
year = {2017},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics}
}
Meet The Languages!
Note 2018-02-17: The Wikipedia dumps no longer exist, so the links have been removed.
- Indo-European/Celtic/Scottish Gaelic (gd) [wiki dump]
- Indo-European/Celtic/Welsh (cy) [wiki dump]
- Indo-European/Germanic/Danish (da) [wiki dump]
- Indo-European/Germanic/Dutch (nl) [wiki dump]
- Indo-European/Germanic/English (en) [wiki dump]
- Indo-European/Germanic/Icelandic (is) [wiki dump]
- Indo-European/Germanic/Faroese (fo) [wiki dump]
- Indo-European/Germanic/German (de) [wiki dump]
- Indo-European/Germanic/Swedish (sv) [wiki dump]
- Indo-European/Germanic/Nynorsk (nn) [wiki dump]
- Indo-European/Indo-Aryan/Hindi (hi) [wiki dump]
- Indo-European/Indo-Aryan/Urdu (ur) [wiki dump]
- Indo-European/Albanian (sq) [wiki dump]
- Indo-Europea/Iranian/Persian (fa) [wiki dump]
- Kartvelian/Georgian (ka) [wiki dump]
- Na-Dené/Athabaskan/Navajo (nv) [wiki dump]
- Quechuan/Quechua (qu) [wiki dump]
- Indo-European/Romance/Catalan (ca) [wiki dump]
- Indo-European/Romance/French (fr) [wiki dump]
- Indo-European/Romance/Italian (it) [wiki dump]
- Indo-European/Romance/Portuguese (pt) [wiki dump]
- Indo-European/Romance/Spanish (es) [wiki dump]
- Afro-Asiastic/Semitic/Arabic (ar) [wiki dump]
- Afro-Asiastic/Semitic/Hebrew (he) [wiki dump]
- Indo-European/Slavic/Bulgarian (bg) [wiki dump]
- Indo-European/Slavic/Czech (cs) [wiki dump]
- Indo-European/Slavic/Lower Sorbian (dsb) [wiki dump]
- Indo-European/Slavic/Macedonian (mk) [wiki dump]
- Indo-European/Slavic/Polish (pl) [wiki dump]
- Indo-European/Slavic/Russian (ru) [wiki dump]
- Indo-European/Slavic/Serbo-Croatian (sr) [wiki dump]
- Indo-European/Slavic/Slovak (sk) [wiki dump]
- Indo-European/Slavic/Slovene (sl) [wiki dump]
- Indo-European/Slavic/Ukrainian (uk) [wiki dump]
- Uralic/Finnic/Finnish (fi) [wiki dump]
- Uralic/Ugric/Hungarian (hu) [wiki dump]
- Uralic/Sami/Northern Sami (se) [wiki dump]
- Indo-European/Baltic/Latvian (lv) [wiki dump]
- Turkic/Turkish (tr) [wiki dump]
- Indo-European/Armenian (hy) [wiki dump]
- Sino-Tibetan/Kiranti/Khaling (kha)
- Isolate/Haida (hai)
- Isolate/Basque (eu) [wiki dump]
- Indo-European/Italic/Latin (la) [wiki dump]
- Indo-European/Indo-Aryan/Bengali (bn) [wiki dump]
- Indo-European/Celtic/Irish (ga) [wiki dump]
- Indo-European/Romance/Romanian (ro) [wiki dump]
- Indo-European/Iranian/Sorani (Central Kurdish) (ckb) [wiki dump]
- Indo-European/Iranian/Kurmanji (Northern Kurdish) (ku) [wiki dump]
- Indo-European/Baltic/Lithuanian (lt) [wiki dump]
- Indo-European/Germanic/Norwegian (Bokmål) (no) [wiki dump]
- Uralic/Finnic/Estonian (et) [wiki dump]
Bibliography
- Malin Ahlberg, Markus Forsberg and Mans Hulden. Paradigm Classification in Supervised Learning of Morphology. In NAACL 2015.
- Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner and Mans Hulden. The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of SIGMORPHON 2016.
- Ryan Cotterell, Nanyun Peng and Jason Eisner. Modeling Word Forms Using Latent Underlying Morphs and Phonology. TACL 2015.
- Markus Dreyer and Jason Eisner. Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model. In EMNLP 2011.
- Markus Dreyer and Jason Eisner. Graphical Models over Multiple Strings. In EMNLP 2009.
- Markus Dreyer, Jason Smith and Jason Eisner. Latent-Variable Modeling of String Transductions with Finite-State Methods. In EMNLP 2008.
- Greg Durrett and John DeNero. Supervised Learning of Complete Morphological Paradigms. In NAACL 2013.
- Ramy Eskander, Nizar Habash and Owen Rambow. Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora. In EMNLP 2013.
- Mans Hulden, Markus Forsberg and Malin Ahlberg. Semi-supervised Learning of Morphological Paradigms and Lexicons. In EACL 2014.
- Katharina Kann and Hinrich Schütze. The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection. In Proceedings of SIGMORPHON 2016.
- Garret Nicolai, Colin Cherry and Grzegorz Kondrak. Inflection Generation as Discriminative String Transduction. In NAACL 2015.
- John Sylak-Glassman, Christo Kirov, David Yarowsky and Roger Que. A Language-Independent Feature Schema for Inflectional Morphology. In ACL 2015.
- Manaal Faruqui, Yulia Tsvetkov, Graham Neubig and Chris Dyer. Morphological Inflection Generation Using Character Sequence to Sequence Learning. In NAACL 2016.