UKB: Graph Based Word Sense Disambiguation and Similarity
UKB is a collection of programs for performing graph-based Word
Disambiguation and lexical similarity/relatedness using a pre-existing
UKB has been developed by the IXA
group in the University of the Basque Country.
UKB applies the
so-called Personalized PageRank on a Lexical Knowledge Base (LKB) to
rank the vertices of the LKB and thus perform disambiguation. The
details of the method and its application to wordNet are described in [1,8]. It has also been applied
on WSD on specific domains . The algorithm can be used to
calculate lexical similarity/relatedness of words/sentences [3,4]
and to improve Information Retrieval Algorithms .
UKB has also been used on the medical domain, using the UMLS meta-thesaurus [5,7].
03.06.2015 Version of UKB 2.1 is
released. Click here to download UKB
(unix/linux version only).
05.18.2010 Precompiled Personalized PageRank
for all WordNet lemmas (around 1.2G), useful to speed up similarity calculations. Click here
05.18.2010 Version of UKB 0.1.5 is
released, including scripts for similarity calculations. Click here to download UKB (unix/linux version
Click here to download older versions of
Please, pose any questions/problems you may have using
the following mailing
Source code repository
the git source code repository is at github
using git, you can get the whole repository running:
- git clone https://github.com/asoroa/ukb.git
Click here to get graph relations of some versions of the English WordNet.
Click here to get graph relations of some versions of the Spanish WordNet.
Click here to get
graph relations for English Wikipedia (04 April 2013 dump).
- English WordNet 3.0 plus gloss relations: here
- English WordNet 1.7 plus eXtended WordNet relations: here
- WordNet 3.0 ILI version with dictionaries in English, Spanish
and Basque: here
- English Wikipedia: here
- Basque Wikipedia: here
 Eneko Agirre and Aitor Soroa. 2009.
Personalizing PageRank for Word Sense Disambiguation. Proceedings of
12th conference of the European chapter of the Association for
Linguistics (EACL-2009). Athens,
Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2009.
Knowledge-based WSD and specific domains: performing over supervised
WSD. Proceedings of IJCAI. Pasadena, USA. (PDF)
 Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius
Pasca and Aitor Soroa. 2009. A Study on Similarity and Relatedness
Distributional and WordNet-based Approaches. Proceedings of NAACL-HLT
09. Boulder, USA. (PDF)
 Eneko Agirre, Montse Cuadros, German Rigau and Aitor Soroa.
2010. Exploring Knowledge Bases for Similarity. Proceedings of
LREC 2010. Valletta, Malta. (PDF)
 Eneko Agirre, Aitor Soroa, Mark Stevenson. 2010. Graph-based Word
Sense Disambiguation of Biomedical Documents. Bioinformatics, Oxford
University Press. Bioinformatics Vol. 26(22) pp: 2889-2896
 Arantxa Otegi, Xabier Arregi, Eneko Agirre. 2011. Query Expansion
for IR using Knowledge-Based Relatedness. Proceedings of the 5th
International Joint Conference on Natural Language Processing, pp
1467--1471 Thailand. ISBN 978-974-466-564-5.
 Mark Stevenson, Eneko Agirre and Aitor Soroa 2012. Exploiting
Domain Information for Word Sense Disambiguation of Medical Documents.
Journal of the American Medical Informatics Association. Vol. 19,
 Eneko Agirre, Oier Lopez de Lacalle and Aitor Soroa. 2013. Random
Walks for Knowledge-Based Word Sense Disambiguation. Computational
Linguistics. 40:1. ISSN
 Eneko Agirre, Ander Barrena and Aitor Soroa. 2015. Studying the
Wikipedia Hyperlink Graph for Relatedness and Disambiguation.
http://arxiv.org/abs/1503.01655 (See README for instructions to replicate results)
This work has been partially funded by European Community in the
framework of ERA-NET CHIST-ERA Commission (project READERS) and
and the European Commission (QTLEAP FP7-ICT-2013.4.1-610516).