Big Data

  

The deliverable should contain the following components:

(1) Overall Goals/Research Hypothesis (20 %)

1-3 research questions to navigate/direct all your project.

You may delay this section until (1) you study all previous work and (2) you do some analysis and understand the dataset/project

(2) (Previous/Related Contributions) (40 %)

As most of the selected projects use public datasets, no doubt there are different attempts/projects to analyze those datasets. 30 % of this deliverable is in your overall assessment of previous data analysis efforts. This effort should include:

Evaluating existing source codes that they have (e.g. in Kernels and discussion sections) or any other refence. Make sure you try those codes and show their results

In addition to the code, summarize most relevant literature or efforts to analyze the same dataset you have picked. 

For the few who picked their own datasets, you are still expecting to do your literature survey in this section on what is most relevant to your data/idea/area and summarize those most relevant contributions.

(3) A comparison study (40 %)

Compare results in your own work/project with results from previous or other contributions (data and analysis comparison not literature review)

The difference between section 3 and section 2 is that section 2 focuses on code/data analysis found in sources such as Kaggle, github, etc. while section 3 focuses on research papers that not necessary studied the same dataset, but the same focus area