March, 23rd - March, 29th

ee09119
Mar 30, 2015
1 min read

During this week, we concluded that using one virtual machine (VM), i.e. one Hadoop worker node, per physical machine yields better performance than running more than one VM per machine. Overall job completion times felt shorter (by a few seconds) and completion times for concurrently running jobs made were a bit closer to what we would expect them to be. Nevertheless, we still believe that the analytical models are not suitable for the correct or very approximate predictions of the completion times.

We also came to the conclusion that the bigger the input file (compared to the size of the cluster), the more likely it is for jobs to achieve better data locality. We noticed that jobs had greater data locality and ocasionly achieved 100% data locality when their input files required more containers than the total number of containers in the cluster.

After a meeting with Professor Ricardo Morla, we discussed the fact that jobs are still very unstable and fast, which makes it dificult to predict their behaviour and can lead to incorrect decisions when deciding if or not to burst the job. The Professor was able to change the code of the PCAP MapReduce jobs to make them more stable, which is a great help. Furthermore, we decided that the focus for this week was to start the development of the Packet Gatherer. The first version of the Packet Gatherer is complete.

NEXT OBJECTIVES: Test the Packet Gatherer and fix eventual issues. Check how the Load Balancer deals with the Packet Gatherer.

P R G

Pedro Rocha Gonçalves

March, 23rd - March, 29th

Comments

Featured Posts

June, 16th - June, 29th

June, 15th - June, 21st

June, 8th - June, 14th

June, 1st - June, 7th

May, 25th - May, 31st

May, 18th - May, 24th

May, 11th - May, 17th

May, 4th - May, 10th

April, 27th - May, 3rd

April, 20th - April, 26th

Recent Posts