Feb 7 2012
Hadoop On Azure: FileNotFoundException in Hadoop Streaming
The example description for Hadoop Streaming on Azure has some path typo error, hence you may struggle with following error:
Exception in thread "main" java.io.FileNotFoundException: File hdfs://xxx.xxx.xxx.xxx:9000/example/apps/wc.exe does not exist. at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:390) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:287) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:413) at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:164) at org.apache.hadoop.util.GenericOptionsParser. (GenericOptionsParser.java:147)
The problem is with the “HDFS://…” arguments. The sample gives the path like below
hdfs://xxx.xxx.xxx.xxx:9000/example/apps/wc.exe
This means the application is resided in example/apps. When you access through HDFS URL, it starts from root directory, hence it founds any existence of “example” director in the root directory. However, your actual directory in user/
Hence, the HDFS URL should be hdfs://xxx.xxx.xxx.xxx:9000/user/user_name/example/apps/wc.exe
Also, another parameter -input “/example/data/davinci.txt” -output “/example/data/StreamingOutput/wc.txt” actually mentions the input data and output directory. Here also the “example” directory starts from root directory. Instead, it should be “example/data”, which resolves to “user/user_name/example/data”.