Vespa submission to round 1 of TREC-COVID
Reproducing the submission with Vespa python API
This piece will reproduce a possible round 1 TREC-COVID submission generated with the cord19.vespa.ai application.
Connect to the CORD-19 Vespa API.
from vespa.application import Vespa
app = Vespa(url = "https://api.cord19.vespa.ai")
Define the query model used for the submission.
from vespa.query import Query, OR, RankProfile
query_model = Query(
match_phase = OR(),
rank_profile = RankProfile(name="bm25t5")
)
Load the topics provided by the organizers.
import requests
import json
topics = json.loads(requests.get("https://thigm85.github.io/data/covid19/topics-annotated.json").text)
Generate the submissions by querying the Vespa application, and organizing the results according to the TREC output format. We only return 2 hits for each request as an example. Feel free to change that to 1000 when generating your own submmission.
from pandas import DataFrame
submission = []
for t in topics:
id = t['id']
question = t['question']
query = t['query']
narrative = t['narrative']
query = question + ' ' + query + ' ' + narrative
result = app.query(
query=query,
query_model=query_model,
hits = 2,
model = {'defaultIndex': 'allt5'},
summary = 'default',
timeout = '15s',
collapsefield = 'cord_uid',
bolding = 'false'
)
i = 0
for h in result['root']['children']:
i+=1
submission.append(
{"topicid": id,
"Q0": "Q0",
"docid": h["fields"].get('cord_uid'),
"rank": i,
"score": h['relevance'],
"run-tag": query_model.rank_profile.name
})
submission = DataFrame.from_records(submission)
submission