Spark – Shell

spark_logo

What is Spark Shell?
To learn Spark you don’t have to have a spark application deployed in a spark cluster. For that Spark provides a built-in shell prompt, using which you can learn Spark concepts and try them out. If you are familiar with Scala shell, then you know its a REPL i.e Read Evaluate Print loop. The Spark-shell is nothing but a scala shell with additional spark flavours. Means when the shell is started it creates the SparkContext object automatically for you with the name sc. You don have to create explicitly any SparkContext.

How to run the spark shell?

Download the spark pre built binaries and extract them to a local directory in your OS. This directory is known as SPARK_HOME.The spark-shell is present inside the %SPARK_HOME%\bin directory.

2.Spark_shell_0

To run the spark shell open the command prompt in the same directory and type spark-shell. If everything goes well then you will see a screen like this:-

2.Spark_shell_1

2.Spark_shell_2

You can test the shell is started properly or not by typing sc and then it will display the object reference.

When you start the spark-shell it will give you too much logging messages. If you want to change the log level that also you can do by changing the configuration file log4j.properties present inside %SPARK_HOME%\conf directory.Change the log level from INFO to WARN or ERROR.

log4j.rootCategory=INFO, console

REPL commands:-

  • :help
  • :quit
  • :paste

Example:-

lets use the same log file that we used for our application initially and try to create a RDD out of it. After that we will extract the lines which contail only the word “EC_COUNT” and “2000”.

 If you want to fetch data from hdfs then you can do that by :-

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu Title