UKB: Graph Based Word Sense Disambiguation and Similarity

UKB is a collection of programs for performing graph-based Word Sense Disambiguation and lexical similarity/relatedness using a pre-existing knowledge base.

UKB has been developed by the IXA group in the University of the Basque Country.

UKB applies the so-called Personalized PageRank on a Lexical Knowledge Base (LKB) to rank the vertices of the LKB and thus perform disambiguation. The details of the method and its application to wordNet are described in [1,8]. It has also been applied on WSD on specific domains [2]. The algorithm can be used to calculate lexical similarity/relatedness of words/sentences [3,4] and to improve Information Retrieval Algorithms [6].

UKB has also been used on the medical domain, using the UMLS meta-thesaurus [5,7].

News:

Mailing List

Please, pose any questions/problems you may have using the following mailing list: UKB mailing list

Source code repository

the git source code repository is at github using git, you can get the whole repository running:

Selected graphs

Click here to get graph relations of some versions of the English WordNet.
Click here to get graph relations of some versions of the Spanish WordNet.
Click here to get graph relations for English Wikipedia (04 April 2013 dump).

References

[1] Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009). Athens, Greece. (PDF)

[2] Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2009. Knowledge-based WSD and specific domains: performing over supervised WSD. Proceedings of IJCAI. Pasadena, USA.  (PDF)

[3] Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca and Aitor Soroa. 2009. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Proceedings of NAACL-HLT 09. Boulder, USA.  (PDF)

[4] Eneko Agirre, Montse Cuadros, German Rigau and Aitor Soroa. 2010.  Exploring Knowledge Bases for Similarity. Proceedings of LREC 2010. Valletta, Malta.  (PDF)

[5] Eneko Agirre, Aitor Soroa, Mark Stevenson. 2010. Graph-based Word Sense Disambiguation of Biomedical Documents. Bioinformatics, Oxford University Press. Bioinformatics Vol. 26(22) pp: 2889-2896

[6] Arantxa Otegi, Xabier Arregi, Eneko Agirre. 2011. Query Expansion for IR using Knowledge-Based Relatedness. Proceedings of the 5th International Joint Conference on Natural Language Processing, pp 1467--1471 Thailand. ISBN 978-974-466-564-5.

[7] Mark Stevenson, Eneko Agirre and Aitor Soroa 2012. Exploiting Domain Information for Word Sense Disambiguation of Medical Documents. Journal of the American Medical Informatics Association. Vol. 19, Issue 2.DOI10.1136/amiajnl-2011-000415

[8] Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2013. Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics. 40:1. ISSN 0891-2017. doi:10.1162/COLI_a_00164

[9] Eneko Agirre, Ander Barrena and Aitor Soroa. 2015. Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation. http://arxiv.org/abs/1503.01655 (See README for instructions to replicate results)


Acknowledgments

This work has been partially funded by European Community in the framework of ERA-NET CHIST-ERA Commission (project READERS) and and the European Commission (QTLEAP FP7-ICT-2013.4.1-610516).

IXA group Readers QTleap