Introduction

The success of text analytics and big data industry depends on the quality of text-processing tools, as provided by Natural Language Processing (NLP) research groups. While industry and administration need to process text coming from different genres and domains, the best NLP tools have been trained on a limited amount of texts laboriously tagged by human annotators, typically News, causing dramatic decreases in performance in other genres and domains. For the text analytics industry in general, and SMEs in particular, this means that a significant amount of resources would need to be devoted to manually annotating training data whenever new text genres or application domains are to be targeted. This challenge limits the expansion of companies to emerging business opportunities. Furthermore, it impedes the adoption of NLP technology by the public administration for important applications. These include Clinical Decision Support in the Medical Domain and the automatic monitoring of Tourist activities. In the Medical Domain, this implies helping health professionals to make clinical decisions, deal with medical data about patients or with the knowledge of medicine necessary to interpret such data. In the Tourist Domain, professionals need unambiguous and structured knowledge to be able to understand the market, to make accurate analysis and take decisions to identify and recommend destinations, tourism content, activities and business policies based on the habits, behaviours and recommendations of users expressed in social media. Thus, in such domains NLP technology is crucial to extract accurate, complete, relevant, interoperable and timely structured knowledge from large amounts of unstructured multilingual text to make informed decisions. TUNER will address these needs through the research and development of domain adaptation techniques to apply them to the NLP technology that will be developed within the project. In particular, both general, automatic and domain-specific techniques will be explored and leveraged to induce the information required to build NLP tools for different tasks when no available or scarce training data exists for a particular domain. TUNER will therefore develop domain-oriented cross-lingual content enabling systems that will provide deep semantic capabilities to process large quantities of multilingual data. TUNER will process documents in English, Spanish, Catalan, Basque and Galician using Big Data processing techniques to provide the required information in a timely manner.