The below exception is because your output directory is already existing in the HDFS file system.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/C:/HadoopWS/outfile already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
You have to delete the output directory after running the job once. This can be done on command line using the below script:
$ hdfs dfs –rm -r /pathToDirectory
If you would like it to do through the java code below code snippet can be used. This will delete the output folder before running the job everytime.
Path output =new Path(outPath);
FileSystem hdfs = FileSystem.get(conf);
if (hdfs.exists(output)) {
hdfs.delete(output, true);
}
Another workaround would be to pass the output directory through command line as below.
$ yarn jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}
If you would like to create a new directory everytime below code can be used.
String timeStamp = new SimpleDateFormat("yyyy.MM.dd.HH.mm.ss", Locale.US).format(new Timestamp(System.currentTimeMillis()));
FileOutputFormat.setOutputPath(job, new Path(“/MyDir” + "/" + timeStamp));