Guidelines
Important dates
TBD.
Registration
Organizations wishing to participate in TREC 2024 should submit an application. Participants in previous TRECs who wish to participate in TREC 2024 must submit a new application.
To apply, use the new Evalbase web app. First you will need to create an account and profile, then you can register a participating organization from the main Evalbase page.
Any questions about conference participation should be sent to the general TREC email address, trec (at) nist.gov.
Task definition
In terms of input and output, the movie identification task is relatively straightforward—given an input TOT request, output a ranked list of movies. Each movie must be identified by its Wikipedia page id and the correct movie should be ranked as high as possible. For each query, runs should return a ranked list of 1000 Wikipedia page ids. Runs will be evaluated using IR metrics that are appropriate for IR tasks with one relevant document, such as discounted cumulative gain, reciprocal rank, and success@k.
Datasets
TBD.
Submission and evaluation
Submission form: https://ir.nist.gov/trecsubmit/tot.html.
We will be following a similar format as the ones used by most TREC submissions, which is repeated below. White space is used to separate columns. The width of the columns in the format is not important, but it is important to have exactly six columns per line with at least one space between the columns.
1 Q0 pid1 1 2.73 runid1
1 Q0 pid2 2 2.71 runid1
1 Q0 pid3 3 2.61 runid1
1 Q0 pid4 4 2.05 runid1
1 Q0 pid5 5 1.89 runid1
, where:
- The first column is the query (topic) ID.
- The second column is currently unused and should always be “Q0”.
- The third column is the official identifier of the retrieved document.
- The fourth column is the rank the document is retrieved.
- The fifth column is the score (integer or floating point) that generated the ranking. This score must be in descending (non-increasing) order.
- The sixth column is the ID of the run you are submitting.
The main type of TREC submission is automatic, which means there was not manual intervention in running the test queries. This means you should not adjust your runs, rewrite the query, retrain your model, or make any other sorts of manual adjustments after you see the test queries. The ideal case is that you only look at the test queries to check that they ran properly (i.e. no bugs) then you submit your automatic runs. However, if you want to have a human in the loop for your run, or do anything else that uses the test queries to adjust your model or ranking, you can mark your run as manual. Manual runs are interesting, and we may learn a lot, but these are distinct from our main scenario which is a system that responds to unseen queries automatically.
Runs will be evaluated using metrics appropriate for retrieval scenarios with one relevant document. In particular, our primary evaluation metric for this year’s track will be discounted cumulative gain (DCG) but we may also compute other metrics such as reciprocal rank (RR) and success@k.