Erasmus Mundus Master in Language
and Communication Technologies (LCT)


Introduction

Features

Planning your stay

Programme

Calendar

Sholarships

Companies

Institutional website

HAP-LAP

Gallery




            ooo
Language & communication technologies

University of the Basque Country

Corpus linguistics

In this course we will study the use of corpora in computational linguistics. We will start with a general introduction to the field of corpus linguistics and corpus based linguistics, including linguistic annotations and annotation schemas. We will then analyze different ways to extract information from corpora, such as collocation or keyword extraction, using both statistical and linguistic based approaches. In the end of the course we will study the XML language for corpous annotation. During the course the student will work with corpus in several languages.

Syllabus

  1. Introduction to Corpus Linguistics
  2. Corpus characteristics and types
  3. Corpus examples
  4. Corpus annotation
    1. Usual marks and analysis levels
    2. standards for linguistic representation (TEI, NAF, AWA)
  5. XML

  6. ← program Hizkuntzaren Azterketa eta Prozesamendua