Face Detection in Hadoop using HIPI and OpenCV

This project tries to solve the problem of processing big data of images on Apache Hadoop using Hadoop Image Processing Interface (HIPI) for storing and efficient distributed processing, combined with OpenCV, an open source library of rich image processing algorithms. A program to count number of faces in collection of images is demonstrated.


Processing large set of images on a single machine can be very time consuming and costly. HIPI is an image processing library designed to be used with the Apache Hadoop MapReduce, a software framework for sorting and processing big data in a distributed fashion on large cluster of commodity hardware. HIPI facilitates efficient and high-throughput image processing with MapReduce style parallel programs typically executed on a cluster. It provides a solution for how to store a large collection of images on the Hadoop Distributed File System (HDFS) and make them available for efficient distributed processing.


OpenCV (Open Source Computer Vision) is an open source library of rich image processing algorithms, mainly aimed at real time computer vision. Starting with OpenCV 2.4.4, OpenCV supports Java Development which can be used with Apache Hadoop.

Problem Statement :

To demonstrates how HIPI and OpenCV can be used together to count total number of faces in big image dataset.


  • ant
  • Hadoop ecosystem


We have used HIPI example program (downloader and dumphib) and used OpenCV jar to process the face detection problem.

Build the two map reduce function manually by giving following command

  • Run ‘ant downloader’ and ‘ant dumphib’

The above command will generate two jar files.

Move the list.txt file to hdfs “/user/hduser/hipi-hadoop/list.txt”

or you can download dataset of images from given link.

  • Run the runDownloader.sh <%nodes%> parameter

if you are running in a single hadoop system %nodes% = 1

The first downloader will download all the image in the list file and merge it to single HIPI image bundle.

The second dumpHIB will use the OpenCV jar to detect the faces in the image and store the images in local / file server.


Face Detection in Hadoop using HIPI and OpenCV Steps and Codes

Face Detection Dataset

I hope this tutorial will surely help you. If you have any questions or problems please let me know.

Happy Hadooping with Patrick..

One thought on “Face Detection in Hadoop using HIPI and OpenCV

  1. I see your page needs some fresh & unique articles.
    Writing manually is time consuming, but there is solution for
    this. Just search for: Masquro’s strategies

Leave a Reply

Your email address will not be published. Required fields are marked *