EusEduSeg: syntax-based text segmentation tool for Basque    

Contact: mikel.iruskieta at ehu.eus

In the framework of the Rhetorical Structure Theory (RST by Mann and Thompson, 1987), this segmenter was developed as a first step towards an automatic rhetorical analysis for Basque. The segmenter uses the parser MALTIXA (Diaz de Ilarraza et al. 2005) and our purpose is to automatically detect the Elementary Discourse Units (EDUs) or discourse segments (propositions). EDU segmentation is defined in Iruskieta (2014). In future, this segmentation will be the basis for building automatically the corresponding RST tree or other many NLP aplications.

      RST Treebank:
Basque
Multilingual
Portuguese
Chinese



NOTE: With the aim of preserving the paragraphs, this tool considers every line break as a paragraph.

format:
text:


The propper way to cite EusEduSeg is the following:
Iruskieta M., Zapirain B. 2015. EusEduSeg: a Dependency-Based EDU Segmentation for Basque. In Actas del XXXI Congreso de la Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN 2015), pp. 41-48. Alicante (España).

References:
Mann, W.C. Thompson, S.A. 1987. Rhetorical Structure Theory: A Theory of Text Organization. Text 8.243-281.
Diaz de Ilarraza, A. Gojenola, K. Oronoz, M. 2005. Design and Development of a System for the Detection of Agreement Errors in Basque. In Computational Linguistics and Intelligent Text Processing, 793-802. Springer.
Iruskieta, M. 2014. Pragmatikako erlaziozko diskurtso-egitura: deskribapena eta bere ebaluazioa hizkuntzalaritza konputazionalean. Doktore-tesia. EHU. Informatika Fakultatea.