Faster document operations with pyvespa
Document operations such as feed, get, update and delete now use async implementation by default
We will demonstrate that the new default implementation for document operations in pyvespa are now much faster due to async mode.
from vespa.gallery import QuestionAnswering
app_package = QuestionAnswering()
import os
from vespa.package import VespaCloud
# define your own WORK_DIR and VESPA_CLOUD_USER_KEY env variables
disk_folder = os.path.join(os.getenv("WORK_DIR"), "sample_application")
vespa_cloud = VespaCloud(
tenant="vespa-team",
application="pyvespa-integration",
key_content=os.getenv("VESPA_CLOUD_USER_KEY").replace(r"\n", "\n"),
application_package=app_package,
)
app = vespa_cloud.deploy(
instance="msmarco",
disk_folder=os.path.join(os.getenv("WORK_DIR"), "sample_application")
)
import json, requests
sentence_data = json.loads(
requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data_100.json").text
)
batch_feed = [
{
"id": idx,
"fields": sentence
}
for idx, sentence in enumerate(sentence_data)
]
import time
start = time.time()
response = app.feed_batch(schema="sentence", batch=batch_feed)
print("{} seconds".format(time.time() - start))
start = time.time()
response = app.feed_batch(schema="sentence", batch=batch_feed, asynchronous=False)
print("{} seconds".format(time.time() - start))
Basic document operations are now much faster in pyvespa due to the new async implementation. This includes feed, get, update and delete operations.