pylivy

https://github.com/acroz/pylivy/workflows/Tests/badge.svg?branch=master https://badge.fury.io/py/livy.svg https://img.shields.io/pypi/pyversions/livy.svg

Livy is an open source REST interface for interacting with Spark. pylivy is a Python client for Livy, enabling easy remote code execution on a Spark cluster.

Installation

$ pip install -U livy

Note that pylivy requires Python 3.6 or later.

Basic Usage

The LivySession class is the main interface provided by pylivy:

from livy import LivySession

LIVY_URL = "http://spark.example.com:8998"

with LivySession.create(LIVY_URL) as session:
    # Run some code on the remote cluster
    session.run("filtered = df.filter(df.name == 'Bob')")
    # Retrieve the result
    local_df = session.download("filtered")

Similarly, batch sessions in Livy can be created and managed with the LivyBatch class:

from livy import LivyBatch

LIVY_URL = "http://spark.example.com:8998"

batch = LivyBatch.create(
    LIVY_URL,
    file=(
        "https://repo.typesafe.com/typesafe/maven-releases/org/"
        "apache/spark/spark-examples_2.11/1.6.0-typesafe-001/"
        "spark-examples_2.11-1.6.0-typesafe-001.jar"
    ),
    class_name="org.apache.spark.examples.SparkPi",
)
batch.wait()

See LivySession.create or LivyBatch.create for the full range of options that can be specified when creating sessions or batches.

Authentication

Authenticate requests sent to Livy by passing any requests Auth object to the LivySession. For example, to perform HTTP basic auth do:

from livy import LivySession
from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth("username", "password")

with LivySession.create(LIVY_URL, auth) as session:
    session.run("filtered = df.filter(df.name == 'Bob')")
    local_df = session.download("filtered")

Custom requests session

pylivy uses requests to make HTTP requests to your Livy server. You can specify your own requests session in order to customise how requests are made to the server.

For example, to add a custom header to all requests make to Livy:

from livy import LivySession
import requests

LIVY_URL = "http://spark.example.com:8998"

requests_session = requests.Session()
requests_session.headers.update(
    {"X-Auth-Token": "MY-SECURITY-TOKEN"}
)

with LivySession.create(
    LIVY_URL,
    requests_session=requests_session
) as session:
    session.run("filtered = df.filter(df.name == 'Bob')")