February, 23rd - March, 1st

Mar 1, 2015
2 min read

During this first week I started to work on this website and I also got to know OpenStack and the Sahara plugin. I was able to configure and launch Hadoop YARN clusters, and scale them when needed. I also got acquainted with the basic hadoop commands. So the next step was the creation of a folder in HDFS where all the PCAP files will be stored for processing. PCAP files of different sizes were uploaded in order to test the performance of the jobs when the file size differs.

The next logical step was to test the jobs that will be used to process the PCAP files. I was able to test the full set of IP Analysis jobs, which ran as expected. Unfortenately the TCP and HTTP job sets don't seem to be available in the jar file. Having tested some of the PCAP jobs, I started writing a detailed description of what the IP Analysis jobs do in the dissertation document. I could not write about the TCP and HTTP since those aren't available right now.

Moreover, scalability tests were performed to understand if the PCAP jobs perform better when more resources are added to the cluster. And after the tests, I was able to conclude that in fact, the PCAP jobs do perform faster when more resources are added to the cluster. Particularly, the initial setup used 3 worker nodes to process PcapTotalStats on a 6GB PCAP file and was able to complete in aproximately 362 seconds. After adding 6 workers, the same job ran in 221 seconds, which means that a 64% speed increase was achieved by simply adding more workers to the existing cluster. It is however important to note that after adding the worker nodes, it is crucial to run the hdfs balancer command to distribute the data throughout the newly added nodes. If this step is skipped, the performance of the job will be worse with more nodes, because the advantage of data locality will be lost. The network bisection is clearly a bottleneck in this situation.

Furthermore, an initial sketch of the logic behind the inter-cluster Load Balancer has been drawn. This sketch will serve as a guide to know what steps to take develop the Load Balancer. In addition, I also started reading tutorials on Python, since the Load Balancer will be developed in Python.

NEXT OJBECTIVES: After a meeting with Prof. Ricardo Morla, we have agreed that the main objective for now is to develop a simple Proof of Concept to demonstrate the Cloud Bursting idea. I already started to work on the Proof of Concept and it will be the main focus for the following days.

#OpenStack #Sahara #PCAP #Hadoop #Scalability #Performance

P R G

Pedro Rocha Gonçalves

February, 23rd - March, 1st

Comments

Featured Posts

June, 16th - June, 29th

June, 15th - June, 21st

June, 8th - June, 14th

June, 1st - June, 7th

May, 25th - May, 31st

May, 18th - May, 24th

May, 11th - May, 17th

May, 4th - May, 10th

April, 27th - May, 3rd

April, 20th - April, 26th

Recent Posts