Retail data modeling challenge is limited historical data to make decisions required. If the annual Christmas approaching, so have the opportunity to look at the impact of strategic decisions on the bottom line. In this game the recruitment, job seekers can access the mall is located 45 walmart historical sales data for different regions, each mall contains a number of sectors.Participants must estimate sales for each department every mall. To increase the challenge, the holiday price cuts included in the data set, we all know these price cuts will affect sales, the problem is to predict which sectors are affected and affected degree.
The Dataset :
2010-02-05 to 2012-11-01 historical data from the file includes the following information:
- Store- mall id
- Dept- department id
- Date- day of the week
- weekly_sales- designated store sales department designated
- isholiday- current week whether special holiday week
And test.csv format, except that the test set each row of lack of sales information, requiring users to predict
This file contains some information, such as shopping malls, department, to regional activities given date. Includes the following areas: * anonymous information the average temperature Store- mall id * Date- day of the week * Temperature- area * Fuel_price- regional fuel prices * Markdown1-5- related walmart being held promotional activities. Price data only includes data from November 2011 after, and not all stores are available at all times. All missing values replaced by NA. * Cpi- consumer price index * Unemployment- unemployment * IsHoliday- whether the current week is a special holiday for the convenience, the distribution of the data set the time for four major holidays (data does not include all of the holiday): Super Bowl: 12- Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labor Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26 -Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
./build_db.py data/ # output: sales.db ./build_full_csv.py sales.db # output: sales.csv ./build_full_csv.py sales.db --test # output: test.csv ./gen_ids test.csv # output: test.ids ./extract_features.py sales.csv test.csv train.num.csv test.num.csv # output: train.num.csv, test.num.csv ./train_sgdr.py train.num.csv sgdr.model # output: sgdr.model ./predict.py sgdr.model test.num.csv test.ids predictions # output: predictions
Evaluating a Model
./preprocess.sh ./train_sgdr.py train.num.csv sgdr.model ./evaluate.py sgdr.model train.num.csv
* Forecast 2012-11-02 to 2013-07-26 number of each store sales * test set contains each department each mall sales from 2010-02-05 to 2012-10-26 number of each department * data from 2010-02-05 to 2011-11-04 markdown does not include information from 2011-11-11 to 2013-07-26 data contains markdown information from 2013-05-03 to 2013-07-26 number of missing data consumer price index and unemployment information.
Walmart Recruiting Store Sales Forecasting Dataset
I hope this tutorial will surely help you. If you have any questions or problems please let me know.
Happy Hadooping with Patrick..