On this page: Introduction / Download / Source code / License / How to cite / Installation / How to use / Contact

ixa-pipe-dep-eu

ixa-pipe-dep-eu is a dependency parser for Basque written documents. Currently we distribute two versions of this tool. The first version (v1.0.0), which is simpler but faster, is based on the graph-based version of Mate parser. The second version (v2.0.0) is based on the combination of the analyses obtained by different parsers. More precisely, Mate and MaltParser parsers are used to obtain the analyses, and MaltBlender tool is used to choose the best combination of those analyses. Both versions are implemented in Java programming language.

The tool takes a document in NAF format. This input document should contain lemmas, PoS tags and morphological annotations. The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.

Download

You can download the package that contains the executable file for each of the two stable versions from the following link:
    [v2.0.0] ixa-pipe-dep-eu-v2.0.0.tar.gz
    [v1.0.0] ixa-pipe-dep-eu-v1.0.0.tar.gz

Linguistic resources

This tool needs some other linguistic tools and resources and you can download them from the following links (each version needs its own resources):
    [v2.0.0] dep-eu-resources-v2.0.0.tgz
    [v1.0.0] dep-eu-resources-v1.0.0.tgz

Source code

Source code for the latest development version can be downloaded or cloned from this Github page.

License

All the original code produced for ixa-pipe-dep-eu is licensed under GPL v3 free license.

This software uses some external tools, and they are distributed with the source code and the resources. These tools have their own copyright owner and license:

[v2.0.0]

[v1.0.0]

These tools also use some other libraries. See the NOTICE file of these tools.

How to cite

If you use ixa-pipe-dep-eu tool, please cite one of the following papers (depending on the version you use) in your academic work:

[v2.0.0]

Iakes Goenaga, Koldo Gojenola, Nerea Ezeiza. Combining Clustering Approaches for Semi-Supervised Parsing: the BASQUE TEAM system in the SPRML 2014 Shared Task. Workshop on Statistical Parsing of Morphologically Rich Languages SPRML 2014 Shared Task, Dublin, COLING Workshop. 2014
[bibtex]

[v1.0.0]

Arantxa Otegi, Nerea Ezeiza, Iakes Goenaga, Gorka Labaka. A Modular Chain of NLP Tools for Basque. In Proceedings of the 19th International Conference on Text, Speech and Dialogue - TSD 2016, Brno, Czech Republic, volume 9924 of Lecture Notes in Artificial Intelligence, pp. 93-100. 2016
[bibtex]

Installation

Once you download the package that contains the executable file, decompress the file. The executable will be ready to use, without any installation, but you have to follow the next steps in order to make the required resources usable:

Besides, Java should be installed in your computer. Also Perl in order to be able to use MaltBlender (only for v2.0.0).

How to use

The ixa-pipe-dep-eu-X.X.X.jar executable is used to run the ixa-pipe-dep-eu tool. The only required argument (-b) is the path of the resources directory available on the download section. The full command syntax of ixa-pipe-dep-eu-X.X.X.jar is

> java -jar ixa-pipe-dep-eu-X.X.X.jar [-h] -b RESOURCES_DIR [-c CONLL_FILE]

arguments:
   -h     show this help message and exit
   -b RESOURCES_DIR     [Required] Specify the path of the downloaded resource directory.
   -c CONLL_FILE     [Optional] If you want to save the output also in CONLL format, specify the path of the output file.

A executable script run.sh is provided to run the tool (this script calls to the ixa-pipe-dep-eu-X.X.X.jar executable with all the needed arguments explained above). You can use it, but before running it, update the rootDir and baliabideak variables on this script as specified on the installation section.

This tool reads from standard input. It should be UTF-8 encoded NAF format, containing lemmas, PoS tags and morphological annotations (text and terms elements of NAF). The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.

Therefore, you can obtain syntactic dependencies of a plain text file using the following command (in a single command-line):
> cat test.txt | sh ixa-pipe-pos-eu/ixa-pipe-pos-eu.sh | sh ixa-pipe-dep-eu/run.sh

The output is written to standard output and it is in UTF-8 encoding and NAF format. In the NAF output document the syntactic dependencies will be marked by deps elements as it is shown in the example below (the input sentence of the example is this one: "Donostiako Zinemaldiko sail ofizialean lehiatuko da Handia filma."):
<deps>
   <!--ncmod(Zinemaldiko, Donostiako)-->
   <dep from="t2" to="t1" rfunc="ncmod" />
   <!--ncsubj(da, Zinemaldiko)-->
   <dep from="t6" to="t2" rfunc="ncsubj" />
   <!--ncmod(lehiatuko, sail)-->
   <dep from="t5" to="t3" rfunc="ncmod" />
   <!--ncmod(sail, ofizialean)-->
   <dep from="t3" to="t4" rfunc="ncmod" />
   <!--xpred(da, lehiatuko)-->
   <dep from="t6" to="t5" rfunc="xpred" />
   <!--ncpred(da, Handia)-->
   <dep from="t6" to="t7" rfunc="ncpred" />
   <!--ncmod(da, filma)-->
   <dep from="t6" to="t8" rfunc="ncmod" />
   <!--PUNC(filma, .)-->
   <dep from="t8" to="t9" rfunc="PUNC" />
</deps>

Contact

Arantxa Otegi, arantza.otegi@ehu.eus
Iakes Goenaga, iakes.goenaga@ehu.eus