Movie Recommender MapReduce Case Study

This is a very simple Movie Recommender in Hadoop.

The whole job is broken in 4 Map-Reduce jobs which are to be run sequentially as shown in below.
  
    The steps are

    <1> Normalization
    <2> Finding Distances
    <3> Contribution of Rating    and
    <4> Adding up the Ratings

    In Normalisation Phase , ratings are normalised w.r.t to the average rating given by the user

    Next , Distance of ratings are calculated between each pair of users. 
    Here Cosine distance is used as the distance metric

    Next , each user contributes a part of his/her rating to other users based on their distance.
    The less there distance , the more contribution happens.

    Contribution phase may emit more than one rating for the same movie.
    To combine them this final Addition is used.

Please run attached java codes as per below steps :

hdfs dfs -rm -r /user/hadoopgyaan/input/*.csvhdfs dfs -put ratings.csv /user/hadoopgyaan/input/
hdfs dfs -rm -r /user/hadoopgyaan/output*
#javac -classpath ../hadoop-core-1.2.1.jar MRNormalize.java
#javac -classpath ../hadoop-core-1.2.1.jar MRDistance.java
#javac -classpath ../hadoop-core-1.2.1.jar MRContrib.java
#javac -classpath ../hadoop-core-1.2.1.jar MRAdd.java
#jar cf mr.jar MR*.class
hadoop jar mr.jar MRNormalize /user/hadoopgyaan/input/ratings.csv output
hadoop jar mr.jar MRDistance /user/hadoopgyaan/output/p* output1 20 151000
hadoop fs -get /user/hadoopgyaan/output1/p* out.csv
hdfs dfs -put out.csv /user/hadoopgyaan/input/
hadoop jar mr.jar MRContrib /user/hadoopgyaan/input/*.csv output2 20
hadoop jar mr.jar MRAdd /user/hadoopgyaan/output2/p* output3
hadoop fs -get /user/hadoopgyaan/output3/p* result.txt
#hdfs dfs -cat /user/hadoopgyaan/output3/p*

Downloads :

Movie Recommender Java Codes and Ratings Dataset

I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..

Leave a Reply

Your email address will not be published. Required fields are marked *