Creating new project with spark , scala ,eclipse and maven build tool
A) You will have to download scala plugin in your eclipse with the following steps:
1) In Eclipse under help > Install new software > Paste this link and install it using normal installation process.
2) Restart eclipse.
B) Create new maven project and assign it some name like 'Spark Sample Project'
NOTE: If you are creating project for the first time it will take time to load all the maven packages.
2) Right click on project > configure > Add scala nature
After adding scala nature you ll be able to see the above project structure. Now you ll have to remove scala library container and ll have to select the JRE version which you have installed on your system. In my case it's JDK 1.8
3) Right click on the project > build path > configure build path > libraries section . Under libraries section remove scala library and click on add library and select the JRE library which is installed in your system. Refer below screenshot:
After adding JRE library it should show below libraries only
- Now click on source tab (first tab in the above screenshot) > add folder > select main > click on create new folder > name it as "scala".
-Select Scala Compiler section under above properties tab, by default you should be able to see the below screenshot:
- Tick the checkbox of 'Use Project Settings' and from the dropdown list of Scala Installation select the scala version which is installed in your system. Like in my case it's 2.11
-Click on apply and close and if any warning message pops up then just click on yes and proceed further. Now you should be able to see the below project structure without any warnings or error messages.
Scala project is ready, Hurray !! We will see wordcount example and will create JAR file of the same.
Insert below code in your project's pom.xml file under dependencies tag :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
NOTE: If you want you can delete by default created junit dependency tag since we won't be dealing with Junit as of now atleast. If you do it then you ll have to delete AppTest.java file under src/test/java because it may result in project error. I will recommend to delete that file and remove junit's dependency tag.
-Right click on src/main/scala folder > select new > other >search for "scala object" and click it > next >type "com.sample.WordCount" in name section and click on finish. You can see below project structure:
In WordCount.scala write below code :
package com.sample
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf()
conf.set("spark.master", "local")
conf.set("spark.app.name", "Word Count Example")
val sc = new SparkContext(conf)
// Loading textfile into the variable
val textFile = sc.textFile(args(0))
if (args.length < 1) {
println("Missing File Path")
} else {
//Splitting the words in the file on the basis of space i.e " "
val words = textFile.flatMap(line => line.split(" "))
//assign count 1 to each word
val counts = words.map(word => (word, 1))
// Group by key and sum list of values
val wordCount = counts.reduceByKey(_ + _)
val sortedWords = wordCount.sortByKey()
sortedWords.collect().foreach(println)
}
}
}
- Now right click on project and under run as > select second maven build option from the list of available options. You ll be able to see the below window
-Type "package" as value in goals section and click on run. You should see below image in console window. If it fails then you ll have to check your configuration again
-Refresh your project and under target folder you ll see the generated jar file
- Copy the jar file and write below code in your terminal in linux / windows where you ll have to paste the copied jar file
spark-submit \
> --class com.sample.WordCount \
> file:///usr/local/eclipse-workspace/SparkSampleProject/target/SparkSampleProject-0.0.1-SNAPSHOT.jar /home/debuggerrr/Desktop/sampleDoc
where '/home/debuggerrr/Desktop/sampleDoc' is my textfile .
- Execute the above command and you ll be able to see the similar type of below result in your terminal
(This ,2)
(is,1)
(my,1)
(file,2)
1) In Eclipse under help > Install new software > Paste this link and install it using normal installation process.
2) Restart eclipse.
B) Create new maven project and assign it some name like 'Spark Sample Project'
NOTE: If you are creating project for the first time it will take time to load all the maven packages.
2) Right click on project > configure > Add scala nature
After adding scala nature you ll be able to see the above project structure. Now you ll have to remove scala library container and ll have to select the JRE version which you have installed on your system. In my case it's JDK 1.8
3) Right click on the project > build path > configure build path > libraries section . Under libraries section remove scala library and click on add library and select the JRE library which is installed in your system. Refer below screenshot:
After adding JRE library it should show below libraries only
- Now click on source tab (first tab in the above screenshot) > add folder > select main > click on create new folder > name it as "scala".
-Select Scala Compiler section under above properties tab, by default you should be able to see the below screenshot:
- Tick the checkbox of 'Use Project Settings' and from the dropdown list of Scala Installation select the scala version which is installed in your system. Like in my case it's 2.11
-Click on apply and close and if any warning message pops up then just click on yes and proceed further. Now you should be able to see the below project structure without any warnings or error messages.
Scala project is ready, Hurray !! We will see wordcount example and will create JAR file of the same.
Insert below code in your project's pom.xml file under dependencies tag :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
NOTE: If you want you can delete by default created junit dependency tag since we won't be dealing with Junit as of now atleast. If you do it then you ll have to delete AppTest.java file under src/test/java because it may result in project error. I will recommend to delete that file and remove junit's dependency tag.
-Right click on src/main/scala folder > select new > other >search for "scala object" and click it > next >type "com.sample.WordCount" in name section and click on finish. You can see below project structure:
In WordCount.scala write below code :
package com.sample
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf()
conf.set("spark.master", "local")
conf.set("spark.app.name", "Word Count Example")
val sc = new SparkContext(conf)
// Loading textfile into the variable
val textFile = sc.textFile(args(0))
if (args.length < 1) {
println("Missing File Path")
} else {
//Splitting the words in the file on the basis of space i.e " "
val words = textFile.flatMap(line => line.split(" "))
//assign count 1 to each word
val counts = words.map(word => (word, 1))
// Group by key and sum list of values
val wordCount = counts.reduceByKey(_ + _)
val sortedWords = wordCount.sortByKey()
sortedWords.collect().foreach(println)
}
}
}
- Now right click on project and under run as > select second maven build option from the list of available options. You ll be able to see the below window
-Type "package" as value in goals section and click on run. You should see below image in console window. If it fails then you ll have to check your configuration again
-Refresh your project and under target folder you ll see the generated jar file
- Copy the jar file and write below code in your terminal in linux / windows where you ll have to paste the copied jar file
spark-submit \
> --class com.sample.WordCount \
> file:///usr/local/eclipse-workspace/SparkSampleProject/target/SparkSampleProject-0.0.1-SNAPSHOT.jar /home/debuggerrr/Desktop/sampleDoc
where '/home/debuggerrr/Desktop/sampleDoc' is my textfile .
- Execute the above command and you ll be able to see the similar type of below result in your terminal
(This ,2)
(is,1)
(my,1)
(file,2)
Supported cellular units embrace Apple or iOS units, Android units and 우리카지노 likewise Windows units
ReplyDelete