ProgramAll courses are offered every year with basically stable content. The 1st semester lasts from September to the end of Jaunary, the 2nd semester goes from Febraury to late June.
The course is divided into two parts: i) Introduction to fundamental statistics for natural language processing. Concepts of descriptive and inferential statistics.
ii) Survey of the field of Corpus-based Natural Language Processing and Corpus Linguistics. Ways of representing linguistic information and exploitation. Annotation on different linguistic levels: morphology, syntax, semantic, etc. Main approaches to corpus-based analysis, including distributional and pattern-based techniques. As examples, this section will use leading projects for Basque, Catalan, Spanish and English.
Theme 1: Fundamentals of statistics.
Theme 2: Machine learning in language processing
Theme 3: Paradigms and applications.
Theme 4: Corpora. What are corpora? Use of corpora in language research. Introduction to corpus linguistics.
Theme 5: Markup languages and standards of representation. Examples from representative corpora in various languages
Theme 6: Retrieving the information contained in a corpus. Distributional and rule-based methods.
The course presents an introduction to the fundamentals of language in phonology, morphology, syntax, semantics, and pragmatics. It addresses these Themes from the perspective of traditional linguistics but in the context of Language Technologies.
Theme 1: Human
language: The languages versus The Language.
The course has a twofold goal:
i) To present the basic concepts and computational models for the treatment of morphology (regular expressions, finite automata, computational morphology). During the course the student will have the opportunity to practice with foma, a free and open source finite-state toolkit.
ii) To present the computational formalisms for the treatment of syntax: N-grams, basic context-free grammars, probabilistic context-free grammars, and dependency syntax. The course will also examine the implementation of formal grammars providing an overview of frameworks, such as Categorical Grammars (CG), Lexical Functional Grammars (LFG), and Head-driven Phrase Structure Grammar (HPSG). The Constraint Grammar Formalism will be presented in detail and leading projects for Basque and Finnish will be used for illustration. In addition, the course will overview tagging, chunking and parsing processes. Special attention will be paid to the treatment of Basque, given that it is a morphologically rich language.
Theme 1: Dependency
The aim of the course is to familiarize students with basic software tools used in natural language processing. The course includes a brief introduction to the Perl programming language, a review of the standard techniques of representing linguistic information, and an overview of practical issues regarding distributed processing.
Theme 1: Basic
CSM2_6 Automata, Computability and Complexity Theory. [9 ECTS]    (LAP6) (Not offered this year)
In this course students will study the theoretical foundations of computer science, i.e., the theory of computation. The course has two parts:
i) Automata Theory. Abstract models called automata are at the core of all computers. This part of the course provides a close examination of automata, formal languages and grammars, and classifies them according to the Chomsky Hierarchy.
ii) Computability and Complexity Theory. This part of the course examines what kinds of problems can be solved using algorithms (computability) and, in the case of computable problems, how inherently difficult they are to solve (complexity).
Theme 1: Mathematical concepts and basic formal reasoning. Sets, relations, functions, character strings, and languages. Demonstrations.
Theme 2: Regular languages: Finite automata, regular grammars, regular expressions, and the applications of these. How to prove that a language is not regular.
Theme 3: Context-free languages: push-down automata, context-free grammars, and the applications of these.
Theme 4: Recursive languages and recursively enumerable languages: Turing machines and unrestricted grammars.
Theme 5: Turing machines, deterministic and non-deterministic.
Theme 6: Computable languages. Diagonalization and reduction for proving the non-computability of a language.
Theme 7: Recursively enumerable languages. Relationship with computable languages. Rices Theorem.
Theme 8: Complexity theory. The classes P and NP. The study of NP-complete problems.
CSM4_13 Artificial Intelligence and Advanced User Interaction. [4,5 ECTS]    (LAP13) (Not offered this year)
An introductory course on artificial intelligence that focuses on the basic concepts and methods, covering issues related to knowledge representation, search algorithms, agents, and multimodal interfaces.CSM3_8 Machine Learning. [4.5 ECTS]    (LAP8)
Theme 1: Introduction
to Artificial Intelligence: History, challenges, applications.
This course focuses on a range of techniques inspired by artificial intelligence and classical statistics. In the last decade, these fields have experienced a boom, particularly with regard to problems related to large volumes of data for which the mathematical, statistical, or classical operations research have been unable to offer effective or efficient solutions. The applications of machine learning cover fields as diverse as bioinformatics, finance, and natural language. The student will study the most common major techniques for data mining, as well as acquire skills in the use of free software packages that implement these techniques. This will be linked to the study and demonstration of real applications of these techniques.
- Theme 1. A short introduction to the world of Data Science: business and big data, the open data concept, big data and humanitarian projects, data visualization, software resources, methodologies for project management, the "big data" concept and applications...
The course covers the following topics:
i) Computational semantics. The goal of this part of the course is to present the basic concepts in semantics, covering the issues related to syntax-semantics interface: formal representation of the meaning of a sentence, computational approaches for formal semantics, essential resources needed for the computational treatment of semantics, and fundamental statistical approaches to word sense disambiguation.
ii) Computational pragmatics and Discourse covering. This part covers subjects like: a) the study of theories that formalize the rhetorical structure of a text (e.g. RST), b) the problem of coreference and the identification of coreferential chains, and c) constructing models of speech acts in dialogue.
Theme 1: Automated
reasoning in propositional, first-order, temporal and
The course contains three parts:
i) Machine translation: Existing MT paradigms will be presented in detail. The student will have the opportunity of studying real cases for a SMT system from English to Basque and for doing practice projects. In addition, the course will overview the need for and practical ways of combining classical paradigms.
ii) Education and NLP: This part of the course is dedicated to the study of techniques for reusing NLP tools and resources in the teaching/learning task, as applied to the process of learning a second language, or as applied to the process of learning about a general subject;
iii) Web searching and text mining: this part covers topics related to web searching and text mining. In addition to issues in web searching and information retrieval, this part also explores other text mining methods for information extraction, as well as discussing many other ways of processing and analyzing free-text data and their use in various scientific and business applications. We start with very basic notions in information retrieval, explain some of the fundamental algorithms for text-based information systems, and eventually touch upon the research frontiers.
Theme 1: Language learning, NLP and intelligent tutors. Study of the technologies of instruction and learning; intelligent computerized learning environments.
Theme 2: Searching and extracting monolingual and multilingual information. Information searches. Information retrieval (IR): models, techniques, evaluation, examples, NLP and IR.
Theme 3: Machine translation. Techniques and paradigms in machine translation.
This course has the following objectives:
i) To review the main resources used in semantic analysis. The resources include general lexical repositories like WordNet, Multilingual Central Repository, VerbNet, FrameNet, Ontonotes and DbPedia, and also some domain-specific repositories. The course also reviews related corpora annotated (manually and automatically) with semantic information. We will cover resources in English, Basque and other languages.
ii) To present advanced techniques for dealing with the meaning of words, including disambiguation and similarity. We will closely examine disambiguation algorithms for content words (Word Sense Disambiguation) and proper nouns (Named Entity Disambiguation), as well as investigate algorithms for computing the semantic similarity between words and text (Semantic Textual Similarity).
Theme 1: Semantic resources.
Theme 2: Disambiguation methods.
Theme 3: Distributional semantics.
The main objective of this course is to analyze the fundamental methods and techniques for the advanced treatment of large volumes of textual data (e.g. text on the Internet). We will review three main approaches: statistical methods, machine learning methods, and knowledge-based methods.
Theme 1: Introduction to corpus analysis.
Theme 2: Statistical methods.
Theme 3: Machine learning methods.
Theme 4: Knowledge-based methods.
The course presents the fundamentals of speech processing techniques, as well as introducing students to state-of-the-art methodologies, software toolkits, and resources used in speech technology. The course also reviews the different fields of speech processing, including speech synthesis and speech and speaker recognition.
Theme 1: Speech signal: production and perception
CSM_18 Seminars on Language Technologies. Deep Learning. [4.5 ECTS](officially shown as:CSM3_7 Compiler Design) (LAP18)
Deep Learning neural network models have been successfully applied to natural language processing. These models are able to infer a continuous representation for words and sentences, instead of using hand-engineered features as in other machine learning approaches. The seminar will introduce the main deep learning models used in natural language processing, allowing the students to gain hands-on understanding and implementation of them in Tensorflow.