Cache accumulators broadcast variables
Pdf File 456.98 KByte, 58 Pages
Cache, Accumulators, Broadcast variables
Spark computes the content of an RDD each time an action is invoked on it
If the same RDD is used multiple times in an application, Spark recomputes its content every time an action is invoked on the RDD, or on one of its "descendants"
This is expensive, especially for iterative applications
We can ask Spark to persist/cache RDDs
3
When you ask Spark to persist/cache an RDD, each node stores the content of its partitions in memory and reuses them in other actions on that RDD/dataset (or RDDs derived from it)
The first time the content of a persistent/cached RDD is computed in an action, it will be kept in the main memory of the nodes
The next actions on the same RDD will read its content from memory
i.e., Spark persists/caches the content of the RDD across operations
This allows future actions to be much faster (often by more than 10x
4
Spark supports several storage levels
The storage level is used to specify if the content of the RDD is stored
In the main memory of the nodes On the local disks of the nodes Partially in the main memory and partially on disk
5
Storage Level
Meaning
MEMORY_ONLY
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.
MEMORY_AND_DISK
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on (local) disk, and read them from there when they're needed.
MEMORY_ONLY_SER
Store RDD as serialized Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer, but more CPU-intensive to read.
MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.
6
- broadcast spreader application chart 3854
- broadcast sheet decoder mopar
- broadcast television local television market estimates
- cache system for frequently updated data in the cloud
- accumulators and how to fix shifts
- broadcast television hispanic over the air broadcast
- broadcast transition from sdi to ethernet arista
- broadcast calendar 2006 tvb
- broadcast video production i business plan templatefinal
- cache creek park a chaffee county gold rush
- cache valley bank debit card disclosure
- variables constants and controls
- cache mosquito abatement district
- broadcast educational media commission legislative
- cache valley mental health provider directory
- responsive psalter reading psalm 23 trinity cf
- docx
- dd form 293 application for the review of discharge from
- guessing vocabulary in context 1 kau
- 2009 h1n1 influenza in washington state