Welcome to Cross Language Evaluation Forum

Guidelines

Also see: http://ixa2.si.ehu.es/clirwsd/

Guidelines for Participation in the CLEF 2009 Ad-Hoc Track: Robust WSD Task

In these Guidelines, we provide information on the test collections, the tasks, data manipulation, query construction and results submission for the Robust WSD task of the CLEF 2009 Ad-Hoc track. Guidelines for the other CLEF tracks can be found on the dedicated webpages for these tracks.

MAIN TEST COLLECTION

In CLEF 2009 the Robust WSD monolingual and Robust WSD bilingual tasks use only the English (LA Times '94 and Glasgow Herald '95) collections.

Topics are released using the correct diacritics (according to the language) but may contain occasional spelling errors/inconsistencies, minor formatting deficiencies. We aim to keep these at a minimum.

Ad-hoc collections which were available at CLEF 2001

- LA Times 94 (with WSD data)
- Glasgow Herald 95 (with WSD data)

TASKS

- monolingual IR (English)
- bilingual (Spanish -> English)

Much of the evaluation methodology adopted for these 2 tasks in CLEF is an adaptation of the strategy studied for the TREC ad-hoc task. The instructions given below have been derived from those distributed by TREC. We hope that they are clear and comprehensive. However, please do not hesitate to ask for clarifications or further information if you need it. Send queries to the organizers.

TOPICS

The test and train topics are distributed as follows:

        - CLEF years 2001-2002,2004: for Training
        - CLEF years 2003, 2005-2006: for Testing
        - TREC year 2004: for Training and Testing

The CLEF topics are available in Spanish and English.

The TREC test topics (301-350) are available in English and Spanish, but the TREC train topics (601-700) will be only available in English. Note that we deleted the TREC topics that had no relevant document in the LA Times 94 collection, and thus have 143 topics in the301-350 range (test) and 84 topics in the 601-700 range (train).

The following table summarizes the test/train topics and corresponding target collections are the following:

	Year	Topics	English Documents
Train	CLEF 2001	41-90	LA Times 94	x
Train	CLEF 2002	91-140	LA Times 94	x
Train	CLEF 2004	201-250	x	Glasgow Herald 95
Train	TREC 2004	601-700	LA Times 94	x
Test	CLEF 2003	141-200	LA Times 94	Glasgow Herald 95
Test	CLEF 2005	251-300	LA Times 94	Glasgow Herald 95
Test	CLEF 2006	301-350	LA Times 94	Glasgow Herald 95
Test	TREC 2004	301-450	LA Times 94	x

Test and Training Data for Robust 2009

CONSTRUCTING AND MANIPULATING THE SYSTEM DATA STRUCTURES FOR AD-HOC TRACKS

The system data structures are defined to consist of the original documents, any new structures built automatically from the documents (such as inverted files, thesauri, conceptual networks, etc.), and any new structures built manually from the documents (such as thesauri, synonym lists, knowledge bases, rules, etc.).

1. The system data structures may not be modified in response to the test topics. For example, you cannot add topic words that are not in your dictionary. The CLEF tasks represent the real-world problem of an ordinary user posing a question to a system. In the case of the cross-language tasks, the question is posed in one language and relevant documents must be retrieved whatever the language in which they have been written. If an ordinary user could not make the change to the system, you should not make it after receiving the topics.

2. There are several parts of the CLEF data collections that contain manually-assigned, controlled or uncontrolled index terms. These fields are delimited by SGML (XML-compatible) tags. Since the primary focus of CLEF is on retrieval of naturally occurring text over language boundaries, these manually-indexed terms should not be indiscriminately used as if they are a normal part of the text.

3. Only the following fields may be used for automatic retrieval:

LA TIMES 1994: HEADLINE, TEXT only
Glasgow Herald: HEADLINE, TEXT only

Learning from (e.g. building translation sources from) such fields is permissible.

GUIDELINES FOR CONSTRUCTING THE QUERIES

The queries are constructed from the topics. Each topic consists of three fields: a brief title statement; a one-sentence description; a more complex narrative specifying the relevance assessment criteria. Queries can consist of 1 or more of these fields.

There are many possible methods for converting the supplied topics into queries that your system can execute. We have broadly defined two generic methods, "automatic" and "manual", based on whether manual intervention is used or not. When more than one set of results are submitted, the different sets may correspond to different query construction methods, or if desired, can be variants within the same method. Only automatic runs are allowed in this task.

The manual query construction method includes BOTH runs in which the queries are constructed manually and then run without looking at the results AND runs in which the results are used to alter the queries using some manual operation. The distinction is being made here between runs in which there is no human involvement (automatic query construction) and runs in which there is some type of human involvement (manual query construction). It is clear that manual runs should be appropriately motivated in a CLIR context, e.g. a run where a proficient human simply translates the topic into the document language(s) is not what most people think of as cross-language retrieval.

To further clarify this, here are some example query construction methodologies, and their correct query construction classification. Note that these are only examples; many other methods may be used for automatic or manual query construction.

1. queries constructed automatically from the topics, the retrieval results of these queries sent to the CLEF results server --> automatic query construction
2. queries constructed automatically from the topics, then expanded by a method that takes terms automatically from the top 30 documents (no human involved) --> automatic query construction
3. queries constructed manually from the topics, results of these queries sent to the CLEF results server --> manual query construction
4. queries constructed automatically from the topics, then modified by human selection of terms suggested from the top 30 documents --> manual query construction

Note that by including all types of human-involved runs in the manual query construction method we make it harder to do comparisons of work within this query construction method. We thus only allow automatic runs.

Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD in various ways. Only Title and Description of topics can be used to construct the queries.

WHAT TO DO WITH YOUR RESULTS

Your results must be sent to the CLEF results server (address to be communicated), respecting the submission deadlines (see below).
Results have to be submitted in ASCII format, with one line per document retrieved.
The lines have to be formatted as follows:

10.2452/451-AH	Q0	document.00072	0	0.017416	runidex1
1	2	3	4	5	6

The fields must be separated by ONE blank and have the following meanings:

1) Query identifier. Please use the complete DOI identifier of the topic (e.g. 10.2452/451-AH, not only 451)
INPUT MUST BE SORTED NUMERICALLY BY QUERY NUMBER.

2) Query iteration (will be ignored. Please choose "Q0" for all experiments).

3) Document number (content of the <DOCNO> tag.).

4) Rank 0..n (0 is best matching document. If you retrieve 1000 documents per query, rank will be 0..999, with 0 best and 999 worst). Note that rank starts at 0 (zero) and not 1 (one).
MUST BE SORTED IN INCREASING ORDER PER QUERY.

5) RSV value (system specific value that expresses how relevant your system deems a document to be. This is a floating point value. High relevance should be expressed with a high value). If a document D1 is considered more relevant than a document D2, this must be reflected in the fact that RSV1 > RSV2. If RSV1 = RSV2, the documents may be randomly reordered during calculation of the evaluation measures. Please use a decimal point ".", not a comma. Do not use any form of separators for thousands. The only legal characters for the RSV values are 0-9 and the decimal point.
MUST BE SORTED IN DECREASING ORDER PER QUERY.

6) Run identifier (please chose an unique ID for each experiment you submit). Only use a-z, A-Z and 0-9. No special characters, accents, etc.

The fields are separated by a single space.
The file contains nothing but lines formatted in the way described above.
You are expected to retrieve 1000 documents per query. An experiment that retrieves a maximum of 1000 documents each for 20 queries therefore produces a file that contains a maximum of 20000 lines.

You should know that the effectiveness measures used in CLEF evaluate the performance of systems at various points of recall. Participants must thus return at most 1000 documents per query in their results. Please note that by its nature, the average precision measure does not penalize systems that return extra irrelevant documents at the bottom of their result lists. Therefore, you will usually want to use the maximum number of allowable documents in your official submissions. If you knowingly retrieved less than 1000 documents for a topic, please take note of that and check your numbers with those reported by the system during the submission.

You will have to submit each run through the DIRECT system. An E-mail will be sent to you explaining how to submit your results.

N.B. Please read the following very carefully

In all of the above tasks, in order to facilitate comparison between results, there must be two mandatory runs: Title + Description (task) with and without using WSD annotations. In addition, you can submit four further baseline runs without WSD and four runs using WSD with in various ways.

The deadline for submission of results for the Robust-WSD task is midnight (24.00) Central European Time, 1 st of June. Detailed information on how and where to submit your results will be communicated in due time.

An input checker program, used by TREC and modified to meet the requirements of CLEF, can be accessed here.

WORKING NOTES

A clear description of the strategy adopted and the resources you used for each run MUST be given in your paper for the Working Notes. The deadline for receipt of these papers is 30 August 2009. The Working Notes will be distributed to all participants on registration at the Corfu Workshop (30 September - 2 October 2009). This information is considered of great importance; the point of the CLEF activity is to give participants the opportunity to compare system performance with respect to variations in approaches and resources. Groups that do not provide such information risk being excluded from future CLEF experiments.

-----------------------------------------------------------------------

Current update is 12 May 2009.