News

LREC-2012: SALTMIL-AfLaT Workshop on “Language technology for normalisation of less-resourced languages”

Attention: open in a new window. PDFPrintE-mail


A full-day workshop at LREC 2012
Tuesday, 22 May 2012.
Lütfi Kirdar Istanbul Exhibition and Congress Centre, Istanbul, Turkey

SALTMIL: http://ixa2.si.ehu.es/saltmil/
AfLaT: http://AfLaT.org/
LREC 2012: http://www.lrec-conf.org/lrec2012/

WORKSHOP PROGRAMME

09:15–09:30 Welcome / Opening Session
09:30–10:30 Invited Talk: Sjur Moshagen Nørstebø. How to build language technology resources for the next 100 years
10:30–11:00 Coffee Break
11:00–13:00 Oral papers: Resource Creation

  • Elaine Uí Dhonnchadha, Alessio Frenda and Brian Vaughan, Issues in Designing a Spoken Corpus of Irish.
  • Wondwossen Mulugeta and Michael Gasser, Learning Morphological Rules for Amharic Verbs Using Inductive LogicProgramming
  • Krist ́n Bjarnadottir, The Database of Modern Icelandic Inflection
  • Fadoua Ataa Allah and Siham Boulaknadel, Natural Language Processing for Amazigh Language: Challenges and Future Directions

13:00–14:00 Lunch Break
14:00–16:00 Oral papers: Resource Use

  • Tommi A. Pirinen and Francis M. Tyers. Compiling Apertium morphological dictionaries with HFST and using them in HFST applications.
  • Borbóla Siklósi, György Orosz, Attila Novák and Gábor Prószéky. Automatic structuring and correction suggestion system for Hungarian clinical records.
  • Linda Wiechetek. Constraint Grammar based Correction of Grammatical Errors for North Sàmi.
  • Michael Gasser, Toward a Rule-Based System for English-Amharic Translation.

16:00–16:30   Coffee Break
16:30–17:30   Poster Session

  • Emmanuel Cartier and Paola Carrion Gonzalez, Technological Tools for Dictionary and Corpora Building for Minority Languages: Example of the French-based Creoles.
  • Denys Duchier, Brunelle Magnana Ekoukou, Yannick Parmentier, Simon Petitjean and Emannuel Schang, Describing Morphologically-rich Languages using Metagrammars: a Look at Verbs in Ikota.
  • Tjerk Hagemeijer, Iris Hendrickx, Abigail Tiny and Haldane Amaro, A Corpus of Santomé.
  • Sigrún Helgad ́ ttir, Asta Svavarsdóttir, Eiríkur Rögnvaldsson, Kristín Bjarnadóttir and Hrafn Loftsson, The Tagged Icelandic Corpus (MM).
  • Laurette Pretorius and Sonja Bosch, Semi-automated extraction of morphological grammars for Nguni with special reference to Southern Ndebele.
  • Björn Gambäck, Tagging and Verifying an Amharic News Corpus.
  • Guy De Pauw, Gilles-Maurice de Schryver and Janneke van de Loo. Resource-Light Bantu Part-of-Speech Tagging.
  • Gulshan Dovudov, Vít Suchomel and Pavel Smerk, POS Annotated 50M Corpus of Tajik Language.


CONTEXT AND FOCUS

The 8th International Workshop of the ISCA Special Interest Group on Speech and Language Technology for Minority Languages (SALTMIL, http://ixa2.si.ehu.es/saltmil) and the 4th Workshop on African Language Technology (AfLaT2012) will be held as a joint effort in Istanbul, in May 2012, as part of the 2012 International Language Resources and Evaluation Conference (LREC 2012).

Entitled "Language technology for normalisation of less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002) and Lisbon (2004), Genoa (2006), Marrakech (2008) and Malta (2010) and the series of AfLaT workshops, held in Athens (EACL2009), Malta (LREC2010) and Addis Ababa (AGIS11).

The Istanbul 2012 workshop aims to share information on tools and best practices, so that isolated researchers will not need to start from scratch. An important aspect will be the forming of personal contacts, which can minimize duplication of effort. There will be a balance between presentations of existing language resources, and more general presentations designed to give background information needed by all researchers.

While less-resourced languages and minority languages often struggle to find their place in a digital world dominated by only a handful of commercially interesting languages, a growing number of researchers are working on alleviating this linguistic digital divide, through localisation efforts, the development of BLARKs (basic language resource kits) and practical applications of human language technologies. The joint SALTMIL/AfLaT workshop on "Language technology for normalisation of less-resourced languages" provides a unique opportunity to connect these researchers and set up a common forum to meet and share the latest developments in the field.

ORGANIZERS (SALTMIL and AfLaT)

* Mikel L. Forcada (SALTMIL): Machine Translation Group, School of Computing, Dublin City University, Dublin, Ireland
* Guy De Pauw (AfLaT): CLiPS - Computational Linguistics Group, University of Antwerp, Antwerp, Belgium
* Gilles-Maurice de Schryver(AfLaT): African Languages and Cultures, TshwaneDJe HLT, South Africa & Ghent University, Belgium
* Kepa Sarasola(SALTMIL): Dept. of Computer Languages, University of the Basque Country
* Francis M. Tyers(SALTMIL), Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Spain
* Peter Waiganjo Wagacha(AfLaT): School of Computing & Informatics, University of Nairobi, Nairobi, Kenya

PROGRAMME COMMITTEE

* Iñaki Alegria: University of the Basque Country
* Núria Bel, Universitat Pompeu Fabra, Barcelona, Spain
* Lars Borin, Göteborgs universitet, Sweden
* Sonja Bosch, University of South Africa, South Africa
* Khalid Choukri (ELRA,ELDA, France)
* Mikel L. Forcada, Universitat d’Alacant
* Dafydd Gibbon, University of Bielefeld, Germany
* Girish Nath Jha, Jawaharlal Nehru University, India
* Hrafn Loftsson,  Reykjavik University
* Guy De Pauw, CLiPS, Universiteit Antwerpen
* Laurette Pretorius, University of South Africa, South Africa
* Lori Levin, Carnegie Mellon University, USA
* Odetunji Odejobi, Obafemi Awolowo University, Nigeria
* Benoît Sagot, INRIA Paris Rocquencourt & Université Paris 7, France
* Felipe Sánchez-Martínez, Universitat d'Alacant
* Kepa Sarasola, University of the Basque Country
* Kevin Scannell, Saint Louis University, USA
* Gilles-Maurice de Schryver, Universiteit Gent
* Trond Trosterud, Universitetet i Tromsø, Norway
* Francis M. Tyers, Universitat d'Alacant
* Peter Waiganjo Wagacha, University of Nairobi


REGISTRATION

See Registration in LREC 2012 site

 

CFP. LREC-2012: SALTMIL-AfLaT Workshop on “Language technology for normalisation of less-resourced languages”

Attention: open in a new window. PDFPrintE-mail

CALL FOR PAPERS  (Deadline for submission extended to 5 March 2012)
Workshop on “Language technology for normalisation of less-resourced languages”
8th SALTMIL Workshop on Minority Languages and the 4th workshop on African Language Technology (AfLaT2012).

A full-day workshop at LREC 2012
Tuesday, 22 May 2012.
Lütfi Kirdar Istanbul Exhibition and Congress Centre, Istanbul, Turkey

SALTMIL: http://ixa2.si.ehu.es/saltmil/
AfLaT: http://AfLaT.org/
LREC 2012: http://www.lrec-conf.org/lrec2012/
Paper submission: https://www.softconf.com/lrec2012/Less-RessourcedLang2012/

Papers are invited for the above full-day workshop, in the format outlined below. Most submitted papers will be presented in poster form, though some authors may be invited to present in lecture format.

CONTEXT AND FOCUS

The 8th International Workshop of the ISCA Special Interest Group on Speech and Language Technology for Minority Languages (SALTMIL, http://ixa2.si.ehu.es/saltmil) and the 4th Workshop on African Language Technology (AfLaT2012) will be held as a joint effort in Istanbul, in May 2012, as part of the 2012 International Language Resources and Evaluation Conference (LREC 2012).

Entitled "Language technology for normalisation of less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002) and Lisbon (2004), Genoa (2006), Marrakech (2008) and Malta (2010) and the series of AfLaT workshops, held in Athens (EACL2009), Malta (LREC2010) and Addis Ababa (AGIS11).

The Istanbul 2012 workshop aims to share information on tools and best practices, so that isolated researchers will not need to start from scratch. An important aspect will be the forming of personal contacts, which can minimize duplication of effort. There will be a balance between presentations of existing language resources, and more general presentations designed to give background information needed by all researchers.

While less-resourced languages and minority languages often struggle to find their place in a digital world dominated by only a handful of commercially interesting languages, a growing number of researchers are working on alleviating this linguistic digital divide, through localisation efforts, the development of BLARKs (basic language resource kits) and practical applications of human language technologies. The joint SALTMIL/AfLaT workshop on "Language technology for normalisation of less-resourced languages" provides a unique opportunity to connect these researchers and set up a common forum to meet and share the latest developments in the field.

TOPICS

The workshop takes an inclusive approach to the word “normalisation”, considering it to include both technologies that help make languages more “normal” in society and everyday life, as well as technologies that normalise languages, i.e. help create or maintain a written standard or support diversity in standards. We particularly focus on the challenges less-resourced and minority languages face in the digital world. Papers are invited that describe research and development in the following areas in the area of technologies for language normalisation, including (but not limited to) topics such as:
* Keyboard layouts and entry methods
* Standardisation in machine readable lexicons/dictionaries
* Computer-aided language learning (CALL)
* Dealing with language variants in NLP
* Automatic identification of varieties, dialects
* Corpus construction and annotation
* Terminology development and management
* MT between varieties of the same language
* Spelling correction/normalisation
* Machine translation (MT)
* Morphological analysers
* Part-of-speech taggers and parsers
* Speech recognition and synthesis
* Information extraction/retrieval
* Localisation efforts
* Mobile phones as a platform for HLT

ORGANIZERS (SALTMIL and AfLaT)

* Mikel L. Forcada (SALTMIL): Machine Translation Group, School of Computing, Dublin City University, Dublin, Ireland
* Guy De Pauw (AfLaT): CLiPS - Computational Linguistics Group, University of Antwerp, Antwerp, Belgium
* Gilles-Maurice de Schryver(AfLaT): African Languages and Cultures, TshwaneDJe HLT, South Africa & Ghent University, Belgium
* Kepa Sarasola(SALTMIL): Dept. of Computer Languages, University of the Basque Country
* Francis M. Tyers(SALTMIL), Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Spain
* Peter Waiganjo Wagacha(AfLaT): School of Computing & Informatics, University of Nairobi, Nairobi, Kenya

PROGRAMME COMMITTEE

* Iñaki Alegria: University of the Basque Country
* Núria Bel, Universitat Pompeu Fabra, Barcelona, Spain
* Lars Borin, Göteborgs universitet, Sweden
* Sonja Bosch, University of South Africa, South Africa
* Khalid Choukri (ELRA,ELDA, France)
* Mikel L. Forcada, Universitat d’Alacant
* Dafydd Gibbon, University of Bielefeld, Germany
* Girish Nath Jha, Jawaharlal Nehru University, India
* Hrafn Loftsson,  Reykjavik University
* Guy De Pauw, CLiPS, Universiteit Antwerpen
* Laurette Pretorius, University of South Africa, South Africa
* Lori Levin, Carnegie Mellon University, USA
* Odetunji Odejobi, Obafemi Awolowo University, Nigeria
* Benoît Sagot, INRIA Paris Rocquencourt & Université Paris 7, France
* Felipe Sánchez-Martínez, Universitat d'Alacant
* Kepa Sarasola, University of the Basque Country
* Kevin Scannell, Saint Louis University, USA
* Gilles-Maurice de Schryver, Universiteit Gent
* Trond Trosterud, Universitetet i Tromsø, Norway
* Francis M. Tyers, Universitat d'Alacant
* Peter Waiganjo Wagacha, University of Nairobi




SUBMISSIONS

We expect short papers of max 6,000 words (up to 6 pages) describing research addressing one of the above topics, to be submitted as PDF documents by using the LREC2012 START conference management system (URL: https://www.softconf.com/lrec2012/Less-RessourcedLang2012//).

Submissions should be anonymized. When submitting a paper through the START page, authors will be kindly asked to provide relevant information about the resources that have been used for the work described in their paper or that are the outcome of their research. For further information on this initiative, please refer to http://www.lrec-conf.org/lrec2012/?LRE-Map-2012. Authors will also be asked to contribute to the Language Library, the new initiative of LREC2012.

Submissions of papers should follow the same style as the papers for the main LREC conference (an Author's Kit made of specific guidelines and downloadable templates will be published on the conference web site in due time). All contributions (including invited papers) will be included in the workshop proceedings (CD). They will also be published on the SALTMIL website.

IMPORTANT DATES


* 27 February 2012 Extended to   5 March 2012: Deadline for submission
* 14 March 2012 Extended to 23 March 2012: Notification
* 28 March 2012 Extended to 2 April 2012: Final version
* 22 May 2012: Workshop
REGISTRATION

Registration details will be announced in due course

   

Language Endangerment: Documentation, Pedagogy, and Revitalization (Cambridge ,2011)

Attention: open in a new window. PDFPrintE-mail

Language Endangerment: Documentation, Pedagogy, and  Revitalization
University of Cambridge, Friday, 25 March 2011
Location: CRASSH, 17 Mill Lane, Cambridge
http://www.crassh.cam.ac.uk/events/1332/
Call for Papers Deadline: abstracts due 26 November, 2010.

   

Summary of the discussion on "Less resourced languages and Language technology. Short- and medium-term objectives"

Attention: open in a new window. PDFPrintE-mail

Summary of the discussion on "Less resourced languages and Language technology. Short- and medium-term objectives"

Valletta, Malta, 23 may 2010

After a brief presentation by Mikel Forcada (see slides), an interesting discussion took place.

One of the problems that was underlined is the difficulties in convincing politicians to fund the creation of language resources (LR) for less-resourced languages (LRL). Per Langgård suggested that it would be necessary to build a scheme to assist developers to have success in that endeavour; Khalid Choukri said that even for large European languages it was also difficult to convince European Union politicians to fund R&D in the field, and that we needed to give politicians a larger picture and something they can sell to the media. Along the same lines, Igor Leturia mentioned that we should convince politicians that we do not only do research but that we produce products that politicians can see.

Trond Trosterud proposed that the threshold to access LRs or tools should be as low as possible, and that friendly ways to disseminate should be put in place (one-click, grab-and-go interfaces). Benoît Sagot underlined the importance of finding ways to reduce the costs of developing LRs. Mikel Forcada suggested that the Recaptcha! idea (a test used in webpages to test that they are being accessed by a human) could be adapted to generate or test
lexical language resources collectively. In connection with this, it was suggested that linguists should be given status and credit that is equal to the one computer scientists or engineers get, and that where tested this happened to work well.

A participant mentioned that perhaps it was better to create a single, well annotated and well evaluated resource; in the case of Arabic, the Qur'an could be collaboratively annotated with different layers of linguistic annotation in the hopes that that knowledge could be then extrapolated to the whole language.

Khalid Choukri mentioned that one of the problems with scientific literature dealing with language resources for LRLs is that many research groups created resources that were already available, and that ignorance about other research in the field was indeed a problem. He advised us to always make sure that our research is available to everyone, with an open license, and that the best practices in the field should always be applied when creating new resources. He also mentioned the importance of talking to publishers of printed books to build text resources, by offering them a business model that may be mutually beneficial.

Benoît Sagot insisted that LRs should be made as free as possible so that they are widely used, and mentioned that catalogues of language resources such as ELRA or LDC contained many non-free resources. Khalid Choukri responded that ELRA pricing and conditions just reflect the will of the authors of those resources, and that authors can obviously place free resources in the ELRA catalog. Mikel Forcada mentioned that free/open-source LRs offered an excellent way to ensure reproducibility in LR research, which is crucial to R&D, and also to be aware that R&D grows in the directions that the society, through their funding agencies, decide, and that it is sometimes hard to convince decision-makers about the importance of less-resourced languages.

Kepa Sarasola mentioned the example of Basque: orthographical standardization was adopted in 1968, and spelling checkers developed in the 90s have been one of the most powerful ways of disseminating and promoting the standard orthography of unified Basque.

At 13.55, Mikel Forcada closed the panel session by thanking the audience for their participation in the panel and in the whole workshop.

   

LREC10. SALTMIL workshop. CFP: "Creation and use of basic lexical resources for less-resourced languages"

Attention: open in a new window. PDFPrintE-mail

7th SaLTMiL Workshop on
"Creation and use of basic lexical resources for less-resourced languages"

A half-day workshop at LREC 2010

Sunday May 23, 2010, 09.00-14.00.

Mediterranean Conference Center, Valetta, Malta

Context and focus

The 7th International Workshop of the ISCA Special Interest Group on Speech and Language Technology for Minority Languages (SaLTMiL: see http://ixa2.si.ehu.es/saltmil), will be held in Malta, on a date between May 17 and May 23, 2010 to be announced, as part of the 2010 International Language Resources and Evaluation Conference (LREC). Entitled "Creation and use of basic lexical resources for less-resourced languages", the workshop is intended to continue the series of SALTMIL/LREC workshops on computational language resources for minority languages, held in Granada (1998), Athens (2000), Las Palmas de Gran Canaria (2002), Lisbon (2004), Genoa (2006) and Marrakech (2008). The Malta 2010 workshop aims to share information on tools and best practice, so that isolated researchers will not need to start from scratch. An important aspect will be the forming of personal contacts, which can minimize duplication of effort. There will be a balance between presentations of existing language resources, and more general presentations designed to give background information needed by all researchers.

Programme

Proceedings (pdf)

09.00 Registration
09.30 Opening
09.45 Invited talk: Marc Kemps-Snijders. LAT team at the Max Planck Institute at Nijmegen. "ELAN and RELISH project"
10.30 Coffee break
11.00 Invited talk: Antton Gurrutxaga and Igor Leturia: Elhuyar Foundation. "Exploiting Internet to build language resources for less resourced languages"
11.45 Oral papers (20+5 min.):

Tommi A Pirinen and Krister Lindén: "Finite-State Spell-Checking with Weighted Language and Error Models–Building and Evaluating Spell-Checkers with Wikipedia" as Corpus

Aric Bills, Lori S. Levin, Lawrence D. Kaplan, and Edna Agheak MacLean: "Finite-State Morphology for Iñupiaq"
12.35 Poster session

Marco Passarotti: "Leaving Behind the Less-Resourced Status. The Case of Latin through the Experience of the Index Thomisticus Treebank"

Anna Björk Nikulásdóttir and Matthew Whelpton: "Extraction of Semantic Relations as a Basis for a Future Semantic Database for Icelandic"

Gábor Prószéky, Attila Novák, István Endrédy, Beatrix Oszkó, László Fejes, Sándor Szeverényi, Zsuzsa Várnai and Beáta Wagner-Nagy: "Nganasan – Computational Resources of a Language on the Verge of Extinction"

Géraldine Walther and Benoît Sagot: "Developing a large-scale lexicon for a less-resourced language: Sorani Kurdish"

Hrafn Loftsson, Jökull Yngvason, Sigrún Helgadóttir and Eiríkur Rögnvaldsson: "Developing a PoS-tagged corpus using existing tools"
13.20

Panel: Less resourced languages and Language technology. Short- and medium-term objectives (SaLTMiL)

14.00 Closing

Organisers

  • Mikel L. Forcada: Machine Translation Group, School of Computing, Dublin City University, Dublin, Ireland
  • Kepa Sarasola: Dept. of Computer Languages, University of the Basque Country
  • Francis M. Tyers, Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Spain

Programme committee

  • Mikel L. Forcada: Dublin City University, Ireland
  • Kepa Sarasola: University of the Basque Country
  • Francis M. Tyers: Universitat d'Alacant, Spain
  • Trond Trosterud, Universitetet i Tromsø, Norway
  • Núria Bel, Universitat Pompeu Fabra, Barcelona, Spain
  • Kevin Scannell, Saint Louis University, USA
  • Hrafn Loftsson, University of Reykjavik
  • Felipe Sánchez-Martínez, Universitat d'Alacant
  • Iñaki Alegria: University of the Basque Country
  • Lars Borin, Göteborgs universitet

Additional referees

  • Per Langgård: Oqaasileriffik (Language Secretariat, Nuuk, Greenland)
  • Paul Meurer: Universitet i Bergen
  • Sjur Moshagen: Divvun (Norwegian Sámi Parliament)
  • Eva Navas: University of the Basque Country

Important dates

28 February 2010 4 March 2010 Deadline for submission
22 March 2010    Notification
29 March 2010 Final version
23 May 2010      Workshop

Registration

Registration form available in LREC 2010 site

   

Page 1 of 3