IXA pipes: ready to use NLP tools

IXA pipes is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology for several languages. It offers robust and efficient linguistic annotation to both researchers and non-NLP experts with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs. The ixa pipes can be used or exploit its modularity to pick and change different components. The tools are developed by the IXA NLP Group of the University of the Basque Country.

ixa pipes

If you use the ixa pipes tools or the models, please cite this paper:

Rodrigo Agerri, Josu Bermudez and German Rigau (2014): "IXA pipeline: Efficient and Ready to Use Multilingual NLP tools", in: Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), 26-31 May, 2014, Reykjavik, Iceland. PDF paper

ixa-pipe-tok: Tokenizer and Segmenter for several languages.

ixa-pipe-pos: Statistical POS tagging and Lemmatizer for Basque, Dutch, English, French, Galician, German, Italian and Spanish.

ixa-pipe-nerc: Named Entity Recognition tagger for Basque, Spanish, English, German, Dutch and Italian; Opinion Target Extraction (OTE) for English.

ixa-pipe-chunk: Probabilistic chunker for Basque and English.

ixa-pipe-parse: Probabilistic constituent parser for Spanish and English.

Every ixa pipe can be up an running after two simple steps. The tools require Java 1.7+ to run and are designed to come with all batteries included, which means that it is not required to do any system configuration or install any third-party dependencies. The modules will run on any platform as long as a JVM 1.7+ is available.

IXA pipes are just a set of processes chained by their standard streams, in a way that the output of each process feeds directly as input to the next one. The Unix pipes metaphor has been applied for NLP tools by adopting a very simple and well known data centric architecture, in which every module/pipe is interchangeable by any other tool as long as it reads and writes the required data format via the standard streams.

The data format in which both the input and output of the modules needs to be formatted to represent and pipe linguistic annotations is NAF. Our Java modules all use the kaflib library for easy NAF integration.

Licensing

ixa-pipes are distributed under the Apache License 2.0 (APL 2.0).

Third party tools

The ixa pipes are extended with third party tools for other linguistic annotations, such as word sense disambiguation, semantic role labelling, named entity disambiguation and wikification against the DBpedia, and coreference resolution. Go to the third party tools page for information about how to download and use each tool.

Notice!!

If you are still using release 1.0.0 please update (there is a claim that 1.0.0 might contain GPL code). Please update also ixa-pipe-tok to version 1.8.+ which is non-controversially APL 2.0.

Troubleshooting

Please read carefully the documentation provided for each tool. If you still have problems, checkout the FAQ section.
If you still have problems send a mail to the forum users list:
ixa-pipes-users@googlegroups.com. Note that it is required to join the users google group ixa-pipes-users forum before posting (anyone can join).