pylivy

https://github.com/acroz/pylivy/workflows/Tests/badge.svg?branch=master https://badge.fury.io/py/livy.svg https://img.shields.io/pypi/pyversions/livy.svg

Livy is an open source REST interface for interacting with Spark. pylivy is a Python client for Livy, enabling easy remote code execution on a Spark cluster.

Installation

$ pip install -U livy

Note that pylivy requires Python 3.6 or later.

Usage

The LivySession class is the main interface provided by pylivy:

from livy import LivySession

LIVY_URL = 'http://spark.example.com:8998'

with LivySession(LIVY_URL) as session:
    # Run some code on the remote cluster
    session.run("filtered = df.filter(df.name == 'Bob')")
    # Retrieve the result
    local_df = session.read('filtered')

Authenticate requests sent to Livy by passing any requests Auth object to the LivySession. For example, to perform HTTP basic auth do:

from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth('username', 'password')

with LivySession(LIVY_URL, auth) as session:
    session.run("filtered = df.filter(df.name == 'Bob')")
    local_df = session.read('filtered')

Similarly, batch sessions in Livy can be created and managed with the LivyBatch class:

from livy import LivyBatch

LIVY_URL = 'http://spark.example.com:8998'

batch = LivyBatch(
    integration_url,
    file=(
        "https://repo.typesafe.com/typesafe/maven-releases/org/"
        "apache/spark/spark-examples_2.11/1.6.0-typesafe-001/"
        "spark-examples_2.11-1.6.0-typesafe-001.jar"
    ),
    class_name="org.apache.spark.examples.SparkPi",
)
batch.wait()