We will demonstrate that the new default implementation for document operations in pyvespa are now much faster due to async mode.

Define Question Answering app

from vespa.gallery import QuestionAnswering

app_package = QuestionAnswering()

Deploy to Vespa Cloud

import os
from vespa.package import VespaCloud

# define your own WORK_DIR and VESPA_CLOUD_USER_KEY env variables

disk_folder = os.path.join(os.getenv("WORK_DIR"), "sample_application")
vespa_cloud = VespaCloud(
    tenant="vespa-team",
    application="pyvespa-integration",
    key_content=os.getenv("VESPA_CLOUD_USER_KEY").replace(r"\n", "\n"),
    application_package=app_package,
)
app = vespa_cloud.deploy(
    instance="msmarco", 
    disk_folder=os.path.join(os.getenv("WORK_DIR"), "sample_application")
)
Deployment started in run 243 of dev-aws-us-east-1c for vespa-team.pyvespa-integration.msmarco. This may take about 15 minutes the first time.
INFO    [20:02:15]  Deploying platform version 7.452.11 and application version unknown ...
INFO    [20:02:17]  Deployment successful.
INFO    [20:02:17]  Session 43999 for tenant 'vespa-team' prepared and activated.
INFO    [20:02:17]  ######## Details for all nodes ########
INFO    [20:02:17]  h5591e.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:02:17]  --- platform vespa/cloud-tenant-rhel8:7.452.11
INFO    [20:02:17]  --- container-clustercontroller on port 19050 has config generation 43999, wanted is 43999
INFO    [20:02:17]  h5580f.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:02:17]  --- platform vespa/cloud-tenant-rhel8:7.452.11
INFO    [20:02:17]  --- distributor on port 19111 has config generation 43997, wanted is 43999
INFO    [20:02:17]  --- searchnode on port 19107 has config generation 43999, wanted is 43999
INFO    [20:02:17]  --- storagenode on port 19102 has config generation 43997, wanted is 43999
INFO    [20:02:17]  h5577a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:02:17]  --- platform vespa/cloud-tenant-rhel8:7.452.11
INFO    [20:02:17]  --- logserver-container on port 4080 has config generation 43999, wanted is 43999
INFO    [20:02:17]  h5592d.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP
INFO    [20:02:17]  --- platform vespa/cloud-tenant-rhel8:7.452.11
INFO    [20:02:17]  --- container on port 4080 has config generation 43999, wanted is 43999
INFO    [20:02:31]  Found endpoints:
INFO    [20:02:31]  - dev.aws-us-east-1c
INFO    [20:02:31]   |-- https://qa-container.msmarco.pyvespa-integration.vespa-team.aws-us-east-1c.dev.z.vespa-app.cloud/ (cluster 'qa_container')
INFO    [20:02:32]  Installation succeeded!
Finished deployment.

Load sample data

import json, requests

sentence_data = json.loads(
    requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data_100.json").text
)
batch_feed = [
    {
        "id": idx, 
        "fields": sentence
    }
    for idx, sentence in enumerate(sentence_data)
]

Feed data

Asynchronous feeding - new default

import time

start = time.time()
response = app.feed_batch(schema="sentence", batch=batch_feed)
print("{} seconds".format(time.time() - start))
1.146148920059204 seconds

Syncronous feeding - old default

start = time.time()
response = app.feed_batch(schema="sentence", batch=batch_feed, asynchronous=False)
print("{} seconds".format(time.time() - start))
68.66188383102417 seconds

Conclusion

Basic document operations are now much faster in pyvespa due to the new async implementation. This includes feed, get, update and delete operations.