Run search engine experiments in Vespa from python
Three ways to get started with pyvespa
- Connect to a running Vespa application
- Build and deploy with pyvespa API
- Deploy from Vespa config files
- Final thoughts
pyvespa provides a python API to Vespa. The library’s primary goal is to allow for faster prototyping and facilitate Machine Learning experiments for Vespa applications.
There are three ways you can get value out of pyvespa
:
-
You can connect to a running Vespa application.
-
You can build and deploy a Vespa application using pyvespa API.
-
You can deploy an application from Vespa config files stored on disk.
We will review each of those methods.
In case you already have a Vespa application running somewhere, you can directly instantiate the Vespa class with the appropriate endpoint. The example below connects to the cord19.vespa.ai application:
from vespa.application import Vespa
app = Vespa(url = "https://api.cord19.vespa.ai")
We are then good to go and ready to interact with the application through pyvespa
, e.g., to query:
app.query(body = {
'yql': 'select title from sources * where userQuery();',
'hits': 1,
'summary': 'short',
'timeout': '1.0s',
'query': 'coronavirus temperature sensitivity',
'type': 'all',
'ranking': 'default'
}).hits
You can also build your Vespa application from scratch using the pyvespa API. Here is a simple example:
from vespa.package import ApplicationPackage, Field, RankProfile
app_package = ApplicationPackage(name = "sampleapp")
app_package.schema.add_fields(
Field(
name="title",
type="string",
indexing=["index", "summary"],
index="enable-bm25")
)
app_package.schema.add_rank_profile(
RankProfile(
name="bm25",
inherits="default",
first_phase="bm25(title)"
)
)
We can then deploy app_package
to a Docker container (or directly to VespaCloud):
from vespa.package import VespaDocker
vespa_docker = VespaDocker(
disk_folder="/Users/username/sample_app", # chose your absolute folder
container_memory="8G",
port=8080
)
app = vespa_docker.deploy(application_package=app_package)
app
holds an instance of the Vespa class just like our first example, and we can use it to feed and query the application just deployed. We can also go to the Vespa configuration files stored in the disk_folder
, modify them and deploy them directly from the disk using the method discussed in the next section. This can be useful when we want to fine-tune our application based on Vespa features not available through the pyvespa
API.
There is also the possibility to explicitly export app_package
to Vespa configuration files (without deploying them) through the export_application_package
method:
vespa_docker.export_application_package(application_package=app_package)
pyvespa
API provides a subset of the functionality available in Vespa
. The reason is that pyvespa
is meant to be used as an experimentation tool for Information Retrieval (IR) and not for building production-ready applications. So, the python API expands based on the needs we have to replicate common use cases that often require IR experimentation.
If your application requires functionality or fine-tuning not available in pyvespa
, you simply build it directly through Vespa configuration files as shown in many examples on Vespa docs. But even in this case, you can still get value out of pyvespa
by deploying it from python based on the Vespa configuration files stored on disk. To show that, we can clone and deploy the news search app covered in this Vespa tutorial:
!git clone https://github.com/vespa-engine/sample-apps.git
The Vespa configuration files of the news search app are stored in the sample-apps/news/app-3-searching/
folder:
!tree sample-apps/news/app-3-searching/
We can then deploy to a Docker container from disk:
from vespa.package import VespaDocker
vespa_docker_news = VespaDocker(
disk_folder="/Users/username/sample-apps/news/app-3-searching/",
container_memory="8G",
port=8081
)
app = vespa_docker_news.deploy_from_disk(application_name="news")
Again, app
holds an instance of the Vespa class just like our first example, and we can use it to feed and query the application just deployed.
We covered three different ways to connect to a Vespa
application from python using the pyvespa
library. Those methods provide great workflow flexibility. They allow you to quickly get started with pyvespa experimentation while enabling you to modify Vespa config files to include features not available in the pyvespa API without losing the ability to experiment with the added features.