This is a very simple Movie Recommender in Hadoop.
The whole job is broken in 4 Map-Reduce jobs which are to be run sequentially as shown in below.
The steps are
<2> Finding Distances
<3> Contribution of Rating and
<4> Adding up the Ratings
In Normalisation Phase , ratings are normalised w.r.t to the average rating given by the user
Next , Distance of ratings are calculated between each pair of users.
Here Cosine distance is used as the distance metric
Next , each user contributes a part of his/her rating to other users based on their distance.
The less there distance , the more contribution happens.
Contribution phase may emit more than one rating for the same movie.
To combine them this final Addition is used.
Please run attached java codes as per below steps :
|hdfs dfs -rm -r /user/hadoopgyaan/input/.csvhdfs dfs -put ratings.csv /user/hadoopgyaan/input/|
|hdfs dfs -rm -r /user/hadoopgyaan/output|
|hadoop jar mr.jar MRNormalize /user/hadoopgyaan/input/ratings.csv output|
|hadoop jar mr.jar MRDistance /user/hadoopgyaan/output/p output1 20 151000|
|hadoop fs -get /user/hadoopgyaan/output1/p out.csv|
|hdfs dfs -put out.csv /user/hadoopgyaan/input/|
|hadoop jar mr.jar MRContrib /user/hadoopgyaan/input/ .csv output2 20|
|hadoop jar mr.jar MRAdd /user/hadoopgyaan/output2/p output3|
|hadoop fs -get /user/hadoopgyaan/output3/p result.txt|