Cache accumulators broadcast variables

Pdf File 456.98 KByte, 58 Pages

Cache, Accumulators, Broadcast variables

Spark computes the content of an RDD each time an action is invoked on it

If the same RDD is used multiple times in an application, Spark recomputes its content every time an action is invoked on the RDD, or on one of its "descendants"

This is expensive, especially for iterative applications

We can ask Spark to persist/cache RDDs

3

When you ask Spark to persist/cache an RDD, each node stores the content of its partitions in memory and reuses them in other actions on that RDD/dataset (or RDDs derived from it)

The first time the content of a persistent/cached RDD is computed in an action, it will be kept in the main memory of the nodes

The next actions on the same RDD will read its content from memory

i.e., Spark persists/caches the content of the RDD across operations

This allows future actions to be much faster (often by more than 10x

4

Spark supports several storage levels

The storage level is used to specify if the content of the RDD is stored

In the main memory of the nodes On the local disks of the nodes Partially in the main memory and partially on disk

5

Storage Level

Meaning

MEMORY_ONLY

Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.

MEMORY_AND_DISK

Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on (local) disk, and read them from there when they're needed.

MEMORY_ONLY_SER

Store RDD as serialized Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer, but more CPU-intensive to read.

MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.



6

Download Pdf File