From MATMT 2008

Jump to: navigation, search

MATMT2008 workshop:

"Mixing Approaches to Machine Translation"

Korta Research Center
Avda. de Tolosa 72
University of the Basque Country
Donostia-San Sebastian , Thursday February 14th 2008

CALL FOR PARTICIPATION

Contents

Proceedings

Iñaki Alegria, Lluís Màrquez, Kepa Sarasola (ed.) 2008
Mixing Approaches to Machine Translation. MATMT2008. Proceedings.
Euskal Herriko Unibertsitatea. ISBN 978-612-2224-7

Background

This workshop is organized within the framework of the OpenMT research project (TIN2006-15307-C03-01, Spanish Ministry of Education and Science).

The aim of the workshop "Mixing Approaches To Machine Translation" is to promote practical hybrid approaches to MT, combining resources and algorithms coming from rule-based, example-based or statistical approaches.

This workshop is organized within the framework of the OpenMT research project (TIN2006-15307-C03-01, Spanish Ministry of Education and Science)

The boundaries between the three principal approaches to MT (rule-based, example-based, statistical) are becoming narrower:

  • Phrase based SMT models are incorporating morphology, syntax and semantics into their systems.
  • Rule based systems are using parallel corpora to enrich their lexicons and grammars, and to create new methods for disambiguation.
  • Previous ASR/ALT projects have shown that in a MT system benefits can be realized by a simple combination of different MT approaches in a Rover architecture.

Data-driven Machine Translation (example-based or statistical) is nowadays the most prevalent trend in Machine Translation research.

Translation results obtained with this approach have now reached a high level of accuracy, especially when the target language is English. But these Data-driven MT systems base their knowledge on aligned bilingual corpora, and the accuracy of their output depends heavily on the quality and the size of these corpora. Large and reliable bilingual corpora are unavailable for many language pairs.

Workshop topics

We are particularly interested in papers describing research and development in the following areas:

  • Comparing different approaches for developing MT
  • Methods to compare and integrate translation outputs obtained with different MT approaches.
  • MT evaluation methods, especially those suitable for languages with rich morphology.
  • Morphology-, syntax- or semantic-augmented SMT models
  • Research developed using OpenSource language resources for developing hybrid MT

Program

University of the Basque Country, Faculty of Computer Science
Lardizabal 1, Donostia

February 14th

Keynote speakers:

9.00-9.30: Registration

9.30-10.15: Invited talk - P. Koehn Moses: Moving Open Source MT towards Linguistically Richer Models

10.15-11.05: Regular talks - Evaluation (20' and 5' for questions)

  • A Method of Automatically Evaluating Machine Translations Using a Word-alignment-based Classifier. K. Kotani, T. Yoshimi, H Isahara (National Institute of Information and Communications Technology); T. Kutsumi, I. Sata (Sharp Corporation)
  • Diagnosing Human Judgments in MT Evaluation: an Example based on the Spanish Language. Olivier Hamon, Djamel Mostefa, Victoria Arranz (ELDA)

11.05-11.40: Coffee break

11.40-13.00: Regular talks - Mixed methods 1 (20' and 5' for questions)

  • Mixing Approaches to MT for Basque: Selecting the best output from RBMT, EBMT and SMT I. Alegria, A. Casillas, A. Diaz de Ilarraza, J. Igartua, G. Labaka, M. Lersundi, A. Mayor, K. Sarasola (Univ. of the Basque Country); X. Saralegi, G. Aranburu (Elhuyar); B. Laskurain (Eleka)
  • Statistical Post-Editing: A Valuable Method in Domain Adaptation of RBMT Systems for Less-Resourced Languages. Arantza Diaz de Ilarraza, Gorka Labaka, Kepa Sarasola (Univ. of the Basque Country)
  • From free shallow monolingual resources to machine translation systems: easing the task. Helena M. Caseli, Maria das Graças V. Nunes (University of Sao Paulo), Mikel L. Forcada (Universitat d' Alacant)

13.00-14.00: Lunch

14.00-14.45: Invited talk - M. Federico Recent Advances in Spoken Language Translation

14.45-16.00: Regular talks - Mixed methods 2 (20' and 5' for questions)

  • Exploring Spanish-morphology effects on Chinese-Spanish SMT. Rafael B. Banchs (Barcelona Media Innovation Centre); Haizhou Li (Institute for Infocomm Research, Singapore)
  • Linguistic Categorisation in Machine Translation using Stochastic Finite State Transducers. Jorge González and Francisco Casacuberta (Universidad Politécnica de Valencia)
  • Vocabulary Extension via POS Information for SMT. Germán Sanchis, Joan A. Sánchez (Universidad Politécnica de Valencia)

16.00-16.30: Coffee break

16.30-18.00:

  • Invited talk - A. Way Combining Approaches to Machine Translation: the DCU Experience
  • Conclusions. Moderator: David Farwell

Registration

The registration fee is 50 €, when registering before February 4th.

Late registration will still be possible, but at 60 €.

Online registration is open.

The fee includes proceedings, lunch and coffee/cookies

The steps for on-line registration are the following:

  • create an account in the system (REGISTER), including your email
  • you will receive your identification by e-mail
  • enter the system (ENTER)
  • confirm ackowledge of the fee (ACCEPT) (2 times)
  • choose electronic payment (ACCEPT)
  • select your credit card company
  • enter information about your credit card (secure connection)

Going to the registration process

If you have any problem please contact to i.alegria [at] ehu.es

Optional Dinner

An optional dinner will be organized at a Cider House (Sagardotegi)

We´ll go to an authentic cider house where we´ll taste the best cider and eat the traditional menu: codfish omelette, fried codfish with peppers, grilled beef T-bone and local cheese with quince and walnuts. An experience that you will never forget. The price will be about 35 € including the transport (to be paid on registration desktop)

Venue and Travel

Important Dates

  • Paper submission deadline: Nov 26, 2007
  • Notification of acceptance: Jan 9, 2007
  • Camera-ready papers: Jan 20, 2007
  • Workshop: Feb 14, 2008

Paper submission

Papers should be written in English and no longer than 8 pages.

Use the same file template as was used for the TMI-07 conference

Papers should be sent via e-mail to i.alegria@ehu.es

All contributions will be published in the workshop proceedings.

Programme committee

  • Iñaki Alegria (University of the Basque Country, Donostia)
  • Kutz Arrieta (Vicomtech, Donostia)
  • Núria Castell (Technical University of Catalonia, TALP, Barcelona)
  • Arantza Diaz de Ilarraza (University of the Basque Country, Donostia)
  • David Farwell (Technical University of Catalonia, TALP, Barcelona)
  • Mikel Forcada (University of Alacant, Alicante)
  • Philipp Koehn (University Of Edinburgh, UK)
  • Lluis Marquez (Technical University of Catalonia, Barcelona) (Co-chair)
  • Hermann Ney (Rheinisch-Westfälische Technische Hochschule, Aachen)
  • Kepa Sarasola (University of the Basque Country, Donostia) (Co-chair)

Local organization

IXA Group, University of the Basque Country

  • Alegria I., Casillas A., Díaz de Ilarraza A., Igartua J., Labaka G., Lersundi M., and Sarasola K.

Elhuyar Fundazioa

  • Gurrutxaga A., Leturia,I., and Saralegi X.

About OpenMT project

The main goal of /OpenMT /project is the development of Open Source Machine Translation Architectures based on hybrid models and advanced semantic processors. These architectures will be open-source systems combining the three main Machine Translation frameworks —Rule-Based MT (RBMT), Statistical MT (SMT) and Example-Based MT (EBMT)— into hybrid systems. Defined architectures and results of the project will be Open Source, so it will allow rapid development and adaptation of new advanced Machine Translations systems for other languages. We will test the functionality of this system with different languages: English, Spanish, Catalan and Basque. Corpora are easily available for English and Spanish, but not so for the remaining languages. While the structure of some of those languages is very similar (Catalan and Spanish), others are very different (English and Basque). Basque is an agglutinative language with a very rich morphology, unlike English, Catalan and Spanish.

The main innovative points of OpenMT project are:

  • The design of hybrid systems combining traditional linguistic rules, example-based methods and statistical methods.
  • That it is an Open Source Initiative
  • The use of advanced syntactic and semantic processing in MT

We gratefully acknowledge the financial support from the OpenMT project and from the Government of the Basque Country. The OpenMT project is funded by the Spanish Ministry of Education and Science (OpenMT: Open Source Machine Translation using hybrid methods, TIN2006-15307-C03-01) and the Local Government of the Basque Country (AnHITZ 2006: Language Technologies for Multilingual Interaction in Intelligent Environments, IE06-185).

Some photos of the event