Using Hadoop for historical sales data analysis
Use this dataset to analyze various aspects of transaction
1) Average unit_price by country for a given item type in a certain year
2) Total units_sold by year for a given country and a given item type
3) Find the max and min units_sold in any order for each year by country for a given item type. Use a custom partitioner class instead of default hash based.
4) What are the top 10 order id for a given year by the total_profit
Please help in doing above analysis working on a Hadoop system using map reduce code, written either in Java or Python. Data preparation steps can be done as required before running a MapReduce job to answer questions above.
Dataset: