RST Treebank

TERM31_A1.rs3 (51)

Left unit

Sense

Right unit

Relation type

Relation name

Tagger

rhetdb

Notes

In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model.

-->

The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined.

background

N-S

In recent years work has begun to develop instruments in several languages for automatic terminology extraction in technical texts, though human intervention is still required to make the final selection from the terms automatically chosen. As an example we can cite the following instruments: LEXTER (Bourigault, 92), AT & Tko Terminght (Church & Dagan, 94), TERMS by IBM (Justeson & Katz, 95) and NPtool (Arpper, 95). Their areas of application can be divided into two main groups: information indexing and the making-up of terminological glossaries. In areas where terminology is developing dynamically, such as computer science, it is almost impossible to carry out effective terminological work without an instrument of this type.

-->

If a similar instrument is to be developed for Basque we shall come up against more major drawbacks, because the unifying process of the language has not been completed, research carried out is limited and Basque is an agglutinative language.

background

N-S

Morpho-syntactic models are usually used,

-->

so it is advisable to have the text already analysed or at least labelled.

cause

N-S

In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with:

-->

lemmatisation will be necessary.

cause

N-S

a discrimination between terms must be made,

<--

because some of them may form part of longer units.

cause

N-S

The methods applied vary widely from project to project,

-->

so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95),

cause

N-S

We do not yet have any results, but we believe that the model will be wider than the noun phrase.

<--

In the choice of technical terms, the case of internal declension may prove decisive.

cause

N-S

we shall come up against more major drawbacks,

<--

because the unifying process of the language has not been completed, research carried out is limited and Basque is an agglutinative language.

cause

N-S

The results obtained are not yet those required for absolutely automatic extraction.

-->

A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

cause

N-S

While these tools are being prepared,

-->

we must work on the modelling of technical terms, i.e. we must reduce their characteristics.

circumstance

N-S

In recent years work has begun to develop instruments in several languages for automatic terminology extraction in technical texts,

<--

though human intervention is still required to make the final selection from the terms automatically chosen.

concession

N-S

It is a hard task to obtain a formal, complete definition of a term,

-->

but that is precisely what a major part of this work consists of: defining the characteristics of terms.

concession

N-S

The results are conditioned heavily by the quality of the linguistic tool used.

<--

In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96).

concession

N-S

The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95),

<--

though several probabilistic formulae are generally combined.

concession

N-S

We do not yet have any results,

-->

but we believe that the model will be wider than the noun phrase.

concession

N-S

Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled.

<--

The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96).

concession

N-S

If a similar instrument is to be developed for Basque

-->

we shall come up against more major drawbacks, because the unifying process of the language has not been completed, research carried out is limited and Basque is an agglutinative language.

condition

N-S

As an example we can cite the following instruments: LEXTER (Bourigault, 92), AT & Tko Terminght (Church & Dagan, 94), TERMS by IBM (Justeson & Katz, 95) and NPtool (Arpper, 95).

<--

Their areas of application can be divided into two main groups: information indexing and the making-up of terminological glossaries.

elaboration

N-S

<--

As an example we can cite the following instruments: LEXTER (Bourigault, 92), AT & Tko Terminght (Church & Dagan, 94), TERMS by IBM (Justeson & Katz, 95) and NPtool (Arpper, 95). Their areas of application can be divided into two main groups: information indexing and the making-up of terminological glossaries.

elaboration

N-S

Lemmatisation is linked to morphological analysis and the removal of ambiguities.

<--

In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary.

elaboration

N-S

Linguistic knowledge is also of prime importance in the standardisation of terminology:

<--

a discrimination between terms must be made, because some of them may form part of longer units.

elaboration

N-S

Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary.

<--

Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units.

elaboration

N-S

Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units.

<--

Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary.

elaboration

N-S

Linguistic techniques are used basically to make the initial selection of terms. Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units.

<--

Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96).

elaboration

N-S

In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction.

<--

To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

elaboration

N-S

A balance must be found between recall and precision.

<--

In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

elaboration

N-S

<--

2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units. 2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined. 2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

elaboration

N-S

The IXA Group intends to develop a tool of this type for Basque.

<--

The morphological analyser is already being prepared (Alegria et al, 96), the lemmatizer/labeller is almost completed (Aduriz et al, 96) and work has been done on surface level syntax.

elaboration

N-S

To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained.

<--

We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

elaboration

N-S

<--

While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics. To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

elaboration

N-S

-->

In areas where terminology is developing dynamically, such as computer science, it is almost impossible to carry out effective terminological work without an instrument of this type.

evidence

N-S

While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics.

<--

To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

means

N-S

Automatic terminology extraction and its application to Basque

-->

1. Introduction In recent years work has begun to develop instruments in several languages for automatic terminology extraction in technical texts, though human intervention is still required to make the final selection from the terms automatically chosen. As an example we can cite the following instruments: LEXTER (Bourigault, 92), AT & Tko Terminght (Church & Dagan, 94), TERMS by IBM (Justeson & Katz, 95) and NPtool (Arpper, 95). Their areas of application can be divided into two main groups: information indexing and the making-up of terminological glossaries. In areas where terminology is developing dynamically, such as computer science, it is almost impossible to carry out effective terminological work without an instrument of this type. If a similar instrument is to be developed for Basque we shall come up against more major drawbacks, because the unifying process of the language has not been completed, research carried out is limited and Basque is an agglutinative language. 2. Terminology extraction It is a hard task to obtain a formal, complete definition of a term, but that is precisely what a major part of this work consists of: defining the characteristics of terms. To obtain technical terms from the corpus a combination of NLP techniques (based on linguistic knowledge) and statistical techniques is usually used. 2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units. 2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined. 2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%. 3. Application to Basque The IXA Group intends to develop a tool of this type for Basque. The morphological analyser is already being prepared (Alegria et al, 96), the lemmatizer/labeller is almost completed (Aduriz et al, 96) and work has been done on surface level syntax. While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics. To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

preparation

N-S

1. Introduction

-->

preparation

N-S

2.1. Linguistic Techniques

-->

preparation

N-S

2.2. Statistical Techniques

-->

In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined.

preparation

N-S

2.3. Results

-->

The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

preparation

N-S

3. Application to Basque

-->

The IXA Group intends to develop a tool of this type for Basque. The morphological analyser is already being prepared (Alegria et al, 96), the lemmatizer/labeller is almost completed (Aduriz et al, 96) and work has been done on surface level syntax. While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics. To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

preparation

N-S

2. Terminology extraction

-->

It is a hard task to obtain a formal, complete definition of a term, but that is precisely what a major part of this work consists of: defining the characteristics of terms. To obtain technical terms from the corpus a combination of NLP techniques (based on linguistic knowledge) and statistical techniques is usually used. 2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units. 2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined. 2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

preparation

N-S

It is a hard task to obtain a formal, complete definition of a term, but that is precisely what a major part of this work consists of: defining the characteristics of terms.

-->

To obtain technical terms from the corpus a combination of NLP techniques (based on linguistic knowledge) and statistical techniques is usually used.

purpose

N-S

we must work on the modelling of technical terms,

<--

i.e. we must reduce their characteristics.

restatement

N-S

In this balance preference is given to recall,

<--

provided there is a person who can carry out the terminology reduction.

unless

N-S

Segments

Relation type

Relation name

Tagger

rhetdb

Notes

To obtain a recall of 95% precision is usually reduced to 50%,

and for a precision of 85% cover is not reduced even to 35%.

contrast

N-N

because the unifying process of the language has not been completed,

research carried out is limited

and Basque is an agglutinative language.

list

N-N

2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined.

2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

list

N-N

The morphological analyser is already being prepared (Alegria et al, 96),

the lemmatizer/labeller is almost completed (Aduriz et al, 96)

and work has been done on surface level syntax.

list

N-N

2. Terminology extraction It is a hard task to obtain a formal, complete definition of a term, but that is precisely what a major part of this work consists of: defining the characteristics of terms. To obtain technical terms from the corpus a combination of NLP techniques (based on linguistic knowledge) and statistical techniques is usually used. 2.1. Linguistic Techniques Linguistic techniques are used basically to make the initial selection of terms. Morpho-syntactic models are usually used, so it is advisable to have the text already analysed or at least labelled. The results are conditioned heavily by the quality of the linguistic tool used. In any event in some projects neither morphological nor syntactic analysis is carried out (Su et al., 96). Lemmatisation is linked to morphological analysis and the removal of ambiguities. In complex inflected languages poor results will ensue if only the formal aspect of words is dealt with: lemmatisation will be necessary. Linguistic knowledge is also of prime importance in the standardisation of terminology: a discrimination between terms must be made, because some of them may form part of longer units. 2.2. Statistical Techniques In most projects statistical methods have been used to reduce the assumed terms which follow the linguistic model. The methods applied vary widely from project to project, so the simplest idea is to require a minimum absolute frequency (Justeson & Katz, 95), though several probabilistic formulae are generally combined. 2.3. Results The results obtained are not yet those required for absolutely automatic extraction. A balance must be found between recall and precision. In this balance preference is given to recall, provided there is a person who can carry out the terminology reduction. To obtain a recall of 95% precision is usually reduced to 50%, and for a precision of 85% cover is not reduced even to 35%.

3. Application to Basque The IXA Group intends to develop a tool of this type for Basque. The morphological analyser is already being prepared (Alegria et al, 96), the lemmatizer/labeller is almost completed (Aduriz et al, 96) and work has been done on surface level syntax. While these tools are being prepared, we must work on the modelling of technical terms, i.e. we must reduce their characteristics. To that end, basing work on existing technical dictionaries and using statistical techniques, principal models must be obtained. We do not yet have any results, but we believe that the model will be wider than the noun phrase. In the choice of technical terms, the case of internal declension may prove decisive.

list

N-N

Multilingual RST Treebank