Overview

The shared task has concluded! Thanks to all those who particpated. All data (including the test sets) will be hosted on this site. Please read Cotterell et al. (2016) for a detailed analysis of submitted systems and the results.

In 2015-2016, SIGMORPHON is hosting a shared task on morphological reinflection. An example of English reinflection is the conversion of ran to its present participle, running.

To participate in the shared task, you will build a system that can learn to solve reinflection problems. All submitted systems will be compared on a held-out test set.

You will be invited to describe your system in a short paper for the SIGMORPHON 2016 workshop. The task organizers will write an overview paper that describes the task and summarizes the different approaches taken and their results.

If you plan to participate, please sign up for the shared task Google group! Just send an email to sigmorphon-2016-shared-task@googlegroups.com and ask to join.

Citation

@InProceedings{cotterell-sigmorphon2016,
  author    = {Cotterell, Ryan and Kirov, Christo and Sylak-Glassman, John and Yarowsky, David and Eisner, Jason and Hulden, Mans},
  title     = {The SIGMORPHON 2016 Shared Task---Morphological Reinflection},
  booktitle = {Proceedings of the 2016 Meeting of SIGMORPHON},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics}
}

Results

Track 1 Task 1: Accuracy
Track 1 Task 2: Accuracy
Track 1 Task 3: Accuracy
Track 2 Task 1: Accuracy
Track 2 Task 2: Accuracy
Track 2 Task 3: Accuracy
Track 3 Task 1: Accuracy
Track 3 Task 2: Accuracy
Track 3 Task 3: Accuracy

Shared Task Paper

Please submit the shared-task description papers at https://www.softconf.com/acl2016/sigmorphon16/ by May 15th.

Submission

We have released the test data! It is in the same format as the training and dev data with the exception that the last column has been omitted. Please run your system for each language and each task for which you wish to submit an entry into the competition. The output format should be a text file identical to the train and dev files for the given task. Essentially, you will be adding the missing last column of answers to the test files. Note that you may submit multiple predictions for a given row and we will measure mean reciprocal rank. If you do submit mutiple ordered guesses, please output multiple lines with differing last columns; the order in the file will be the order in which we rank them.

Email the resulting text files to sigmorphon.sharedtask.2016@gmail.com with the subject in the format: INSTITUTION--XX--Y, where you should replace institution with the name of your institution and XX with an integral index (in case of multiple systems from the same institution). In the case of multiple institutions, please place a hyphen between each name. If there are any additional details you would like us to know about your system or resources you used, please write a short description in the body of the email. The Y should specify either 1, 2, or 3, depending on which data you are using to solve the task. These three categories are:

Please name your solution files "LANG-task#-solution", for example "finnish-task1-solution", etc. We encourage participants to send one email per category, with a single attached archive file containing the solutions for all languages and tasks solved. So, if you are solving all tasks with approach "Standard" (1), all the solutions can be communicated with one email with all your "LANG-task#-solution" files as an archive.

Submissions are due at 11.59pm (anywhere in the world) on April 28, 2016 (Extended).

Dates

Venue

The results of the shared task will be presented at the SIGMORPHON workshop held at ACL 2016 in Berlin.

Downloads

Inflectional Morphology

A word's form reflects syntactic and semantic features that are expressed by the word. For example, each English count noun has both singular and plural forms (robot/robots, process/processes), known as the inflected forms of the noun. A Polish verb may have nearly 100 inflected forms.

NLP systems must be able to analyze and generate all of these inflected forms. Fortunately, inflected forms tend to be systematically related to one another. This is why English speakers can usually predict the singular form from the plural and vice-versa, even for words they have never seen before.

The Tasks

There are actually three similar tasks. Your system may compete on any or all of the three tasks. Training examples and development examples will be provided for each task. For each language, the possivble inflections are named by a given finite set of morphological tags.

Task 1 – Inflection

Given a lemma (the dictionary form of a word) with its part-of-speech, generate a target inflected form.

English example

Source lemma: run
Target tag: Present participle

Output: running

Recent high-performance systems include Hulden et al. (2014) and Nicolai, Cherry and Kondrak (2015).

Task 2 – Reinflection

Given an inflected form and its current tag, generate a target inflected form.

English example

Source tag: Past
Source form: ran
Target tag: Present participle

Output: running

Task 2 is a harder case of Task 1, since the source tag is no longer guaranteed to be Lemma.

Task 3 – Unlabeled Reinflection

Given an inflected form without its current inflection, generate a target inflected form.

English example

Source tag: not given
Source form: ran
Target tag: Present participle

Output: running

Task 3 is a harder case of Task 2, since the source tag is no longer provided.

When solving a task, participants may use training data for lower-numbered tasks without it being considered to be a bonus resource. That is, when solving task 2, using task 1 data is permitted. Likewise, when solving task 3, both task 2 and task 1 training data can be used. We encourage participants to, if possible, run various systems, and report which training data they have used for task 2 and 3. Knowing how well task 2 (or 3) can be solved using only task 2 (or 3) data as opposed to also using data from lower-numbered tasks is valuable extra information.

Some Possible Strategies

All of these are sequence-to-sequence mapping problems. If you have a general supervised method for learning such mappings, you can simply throw it at all of these tasks.

Alternatively, you can solve the tasks in sequence. For instance, reduce Task 2 to Task 1 by recovering an inflected form's lemma given its tag, and then reduce Task 3 to Task 2 by recovering an inflected form's tag.

An inflectional paradigm is a table that lists all inflected forms for some lemma. Rather than treating the training examples as independent, you could assemble them into partial paradigms based on shared input or output forms. You could then jointly analyze the partial paradigms to better discover latent structure in the observed forms and to better extrapolate to unobserved forms (Dreyer and Eisner 2009, 2011; Durrett and DeNero 2013; Hulden et al. 2014; Nicolai et al. 2015).

A Baseline System

We provide a baseline system that can be used as a starting point for experiments, or simply for comparison. The system implements a discriminative string transduction, similar in spirit to other recent approaches such as Durrett and DeNero (2013) and Nicolai et al. (2015). The implementation and a description is available here.

Bonus Resources

When evaluated on a given (task, language) pair, your system is permitted to consult the provided training data for that pair. Your system is also permitted to consult the following additional resources, but no other resources. Participants need to clearly indicate if they are using the unlabeled corpora in their approach. We want to separate participation into two categories - those that only use the example inflection data, and those that take advantage of unlabeled data as well.

Note that, as described above, using lower-numbered task training data is not considered a bonus resource. Task 2 may use task 1 data, and task 3 may use task 1 and 2 training data.

It is not required to use these bonus resources. They are permitted in order to make the task more realistic, to allow more freedom to develop interesting approaches, and because it would be difficult to exclude their use.

We encourage participants to experiment with various approaches and to document clearly which training data and bonus resources were used.

Evaluation

Your system should predict a single string for each test example. Optionally, you may also produce a ranked list of up to 20 predictions for each test example.

We will distribute an evaluation script for your use on the development data. The script will report:

The script will also provide some analysis of errors, e.g., according to whether the correct output appears in the monolingual corpus.

You are encouraged to do ablation studies to measure the advantage that you gained from using bonus resources or from particular innovations. You should perform these studies on the development data and report the findings in your paper.

We will use the same script to evaluate your system's output on the test data. If multiple answers are correct, we will use the answer that gives you the higher score. For example, in Task 1, the two senses of English lemma hang have different Past forms, hung and hanged. In Task 3, the English verb lay could be a Present or Past form, of different verbs whose Past participle forms are respectively laid and lain.

We will evaluate on each language separately. An aggregate evaluation will weight all languages equally, including the 2 surprise languages.

In the overview paper, we will also compare the systems to one another. We will evaluate

The Languages

We have chosen a diverse set of 10 languages, mostly languages with rich inflection. All of the datasets have been scraped from Wiktionary and undergone additional processing at the Center for Language and Speech Processing at Johns Hopkins University. The data are formatted according to the schema described in Sylak-Glassman et al. (2015).

For all languages, the data consist of orthographic strings (written spellings), not phonological strings (pronunciations).

Data format

The training and development data is provided in a simple utf-8 encoded text format where each line in a file is an example that consists of word forms and corresponding morphosyntactic descriptions (MSDs) provided as a set feature/value pairs. The fields on a line are TAB-separated.

Task 1

For task 1, the fields are: lemma, MSD, target form. An example from the Spanish training data:

hablar  pos=V,mood=IND,polite=FORM,tense=FUT,per=3,num=SG       hablará

Task 2

In task 2, the fields are: source MSD, source form, target MSD, target form. For example:

pos=V,mood=IND,tense=PRS,per=1,num=SG,aspect=IPFV/PFV   hablo   pos=V,tense=PST,gen=MASC,num=PL hablados

Task 3

In task 3, the fields are: source form, target MSD, target form. For example:

hablo   pos=V,tense=PST,gen=MASC,num=PL hablados

When evaluating, the purpose in all tasks is to reconstruct the word form in the last field, given information in the previous fields.

Future Shared Tasks

The 2016 shared task omits some interesting aspects of the problem. In future, it might be fruitful to consider some extensions:

Bibliography

Organizers

Please direct all correspondence regarding the shared task to sigmorphon-shared-task-2016-organizers@googlegroups.com.