livy.client¶
-
class
livy.client.
LivyClient
(url, auth=None, verify=True, requests_session=None)[source]¶ A client for sending requests to a Livy server.
- Parameters
url (
str
) – The URL of the Livy server.auth (
Union
[AuthBase
,Tuple
[str
,str
],None
]) – A requests-compatible auth object to use when making requests.verify (
Union
[bool
,str
]) – Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults toTrue
.requests_session (
Optional
[Session
]) – A specificrequests.Session
to use, allowing advanced customisation. The caller is responsible for closing the session.
-
legacy_server
()[source]¶ Determine if the server is running a legacy version.
Legacy versions support different session kinds than newer versions of Livy.
- Return type
bool
-
create_session
(kind, proxy_user=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, spark_conf=None, heartbeat_timeout=None)[source]¶ Create a new session in Livy.
The py_files, files, jars and archives arguments are lists of URLs, e.g. [“s3://bucket/object”, “hdfs://path/to/file”, …] and must be reachable by the Spark driver process. If the provided URL has no scheme, it’s considered to be relative to the default file system configured in the Livy server.
URLs in the py_files argument are copied to a temporary staging area and inserted into Python’s sys.path ahead of the standard library paths. This allows you to import .py, .zip and .egg files in Python.
URLs for jars, py_files, files and archives arguments are all copied to the same working directory on the Spark cluster.
The driver_memory and executor_memory arguments have the same format as JVM memory strings with a size unit suffix (“k”, “m”, “g” or “t”) (e.g. 512m, 2g).
See https://spark.apache.org/docs/latest/configuration.html for more information on Spark configuration properties.
- Parameters
kind (
SessionKind
) – The kind of session to create.proxy_user (
Optional
[str
]) – User to impersonate when starting the session.jars (
Optional
[List
[str
]]) – URLs of jars to be used in this session.py_files (
Optional
[List
[str
]]) – URLs of Python files to be used in this session.files (
Optional
[List
[str
]]) – URLs of files to be used in this session.driver_memory (
Optional
[str
]) – Amount of memory to use for the driver process (e.g. ‘512m’).driver_cores (
Optional
[int
]) – Number of cores to use for the driver process.executor_memory (
Optional
[str
]) – Amount of memory to use per executor process (e.g. ‘512m’).executor_cores (
Optional
[int
]) – Number of cores to use for each executor.num_executors (
Optional
[int
]) – Number of executors to launch for this session.archives (
Optional
[List
[str
]]) – URLs of archives to be used in this session.queue (
Optional
[str
]) – The name of the YARN queue to which submitted.name (
Optional
[str
]) – The name of this session.spark_conf (
Optional
[Dict
[str
,Any
]]) – Spark configuration properties.heartbeat_timeout (
Optional
[int
]) – Optional Timeout in seconds to which session be automatically orphaned if no heartbeat is received.
- Return type
Session
-
get_session
(session_id)[source]¶ Get information about a session.
- Parameters
session_id (
int
) – The ID of the session.- Return type
Optional
[Session
]
-
delete_session
(session_id)[source]¶ Kill a session.
- Parameters
session_id (
int
) – The ID of the session.- Return type
None
-
list_statements
(session_id)[source]¶ Get all the statements in a session.
- Parameters
session_id (
int
) – The ID of the session.- Return type
List
[Statement
]
-
create_statement
(session_id, code, kind=None)[source]¶ Run a statement in a session.
- Parameters
session_id (
int
) – The ID of the session.code (
str
) – The code to execute.kind (
Optional
[StatementKind
]) – The kind of code to execute.
- Return type
Statement
-
get_statement
(session_id, statement_id)[source]¶ Get information about a statement in a session.
- Parameters
session_id (
int
) – The ID of the session.statement_id (
int
) – The ID of the statement.
- Return type
Statement
-
create_batch
(file, class_name=None, args=None, proxy_user=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, spark_conf=None)[source]¶ Create a new batch in Livy.
The py_files, files, jars and archives arguments are lists of URLs, e.g. [“s3://bucket/object”, “hdfs://path/to/file”, …] and must be reachable by the Spark driver process. If the provided URL has no scheme, it’s considered to be relative to the default file system configured in the Livy server.
URLs in the py_files argument are copied to a temporary staging area and inserted into Python’s sys.path ahead of the standard library paths. This allows you to import .py, .zip and .egg files in Python.
URLs for jars, py_files, files and archives arguments are all copied to the same working directory on the Spark cluster.
The driver_memory and executor_memory arguments have the same format as JVM memory strings with a size unit suffix (“k”, “m”, “g” or “t”) (e.g. 512m, 2g).
See https://spark.apache.org/docs/latest/configuration.html for more information on Spark configuration properties.
- Parameters
file (
str
) – File containing the application to execute.class_name (
Optional
[str
]) – Application Java/Spark main class.args (
Optional
[List
[str
]]) – An array of strings to be passed to the Spark app.proxy_user (
Optional
[str
]) – User to impersonate when starting the session.jars (
Optional
[List
[str
]]) – URLs of jars to be used in this session.py_files (
Optional
[List
[str
]]) – URLs of Python files to be used in this session.files (
Optional
[List
[str
]]) – URLs of files to be used in this session.driver_memory (
Optional
[str
]) – Amount of memory to use for the driver process (e.g. ‘512m’).driver_cores (
Optional
[int
]) – Number of cores to use for the driver process.executor_memory (
Optional
[str
]) – Amount of memory to use per executor process (e.g. ‘512m’).executor_cores (
Optional
[int
]) – Number of cores to use for each executor.num_executors (
Optional
[int
]) – Number of executors to launch for this session.archives (
Optional
[List
[str
]]) – URLs of archives to be used in this session.queue (
Optional
[str
]) – The name of the YARN queue to which submitted.name (
Optional
[str
]) – The name of this session.spark_conf (
Optional
[Dict
[str
,Any
]]) – Spark configuration properties.
- Return type
Batch
-
delete_batch
(batch_id)[source]¶ Kill a batch session.
- Parameters
batch_id (
int
) – The ID of the session.- Return type
None
-
get_batch
(batch_id)[source]¶ Get information about a batch.
- Parameters
batch_id (
int
) – The ID of the batch.- Return type
Optional
[Batch
]