This example showing seismic data which measure earthquake magnitudes around the world. There are thousands of such sensors deployed around the world recording earthquake data in log files.
The dataset :
The input is raw data files listing earthquakes by region, magnitude and other information.
nc,71920701,1,”Saturday, January 12, 2013 19:43:18 UTC”,38.7865,-122.7630,1.5,1.10,27,“Northern California”
Each entry consists of lot of details. The items in red are the magnitude of the earthquake and the name of region where the reading was taken, respectively.
There are millions of such log files available. In addition, logs also contain erroneous entries such as when the sensor became faulty and went in an infinite loop dumping thousands of lines a second.
Problem Statement :
To process all input files to find the maximum magnitude quake reading for every region listed.
Output Result :
“region_name” <maximum magnitude of earthquake recorded>
I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..