UKB: Graph Based Word Sense Disambiguation and Similarity

UKB is a collection of programs for performing graph-based Word Sense Disambiguation and lexical similarity/relatedness using a pre-existing knowledge base.

UKB has been developed by the IXA group in the University of the Basque Country.

UKB applies the so-called Personalized PageRank on a Lexical Knowledge Base (LKB) to rank the vertices of the LKB and thus perform disambiguation. The details of the method and its application to wordNet are described in [1,8]. It has also been applied on WSD on specific domains [2]. The algorithm can be used to calculate lexical similarity/relatedness of words/sentences [3,4] and to improve Information Retrieval Algorithms [6].

UKB has also been used on the medical domain, using the UMLS meta-thesaurus [5,7].

News:

Mailing List

Please, pose any questions/problems you may have using the following mailing list: UKB mailing list

Source code repository

the git source code repository is at github using git, you can get the whole repository running:

Wordnet dumps

Click here to get graph relations of some versions of the English WordNet.
Click here to get graph relations of some versions of the Spanish WordNet.

References

[1] Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009). Athens, Greece. (PDF)

[2] Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2009. Knowledge-based WSD and specific domains: performing over supervised WSD. Proceedings of IJCAI. Pasadena, USA.  (PDF)

[3] Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca and Aitor Soroa. 2009. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Proceedings of NAACL-HLT 09. Boulder, USA.  (PDF)

[4] Eneko Agirre, Montse Cuadros, German Rigau and Aitor Soroa. 2010.  Exploring Knowledge Bases for Similarity. Proceedings of LREC 2010. Valletta, Malta.  (PDF)

[5] Eneko Agirre, Aitor Soroa, Mark Stevenson. 2010. Graph-based Word Sense Disambiguation of Biomedical Documents. Bioinformatics, Oxford University Press. Bioinformatics Vol. 26(22) pp: 2889-2896

[6] Arantxa Otegi, Xabier Arregi, Eneko Agirre. 2011. Query Expansion for IR using Knowledge-Based Relatedness. Proceedings of the 5th International Joint Conference on Natural Language Processing, pp 1467--1471 Thailand. ISBN 978-974-466-564-5.

[7] Mark Stevenson, Eneko Agirre and Aitor Soroa 2012. Exploiting Domain Information for Word Sense Disambiguation of Medical Documents. Journal of the American Medical Informatics Association. Vol. 19, Issue 2.DOI10.1136/amiajnl-2011-000415

[8] Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2013. Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics. 40:1. ISSN 0891-2017. doi:10.1162/COLI_a_00164


Acknowledgments

This work has been partially funded by European Community in the framework of ERA-NET CHIST-ERA Commission (project READERS) and Spanish Research Department (KNOW2 TIN2009-14715-C04-01).

IXA group
Readers