April, 6th - April, 12th
- ee09119
- Apr 13, 2015
- 2 min read
This week was all about defining use cases to test our Proof-of-Concept. The main idea was to understand which situations our solution was good for and for which solutions it didn't output the best result. We decided to use iPerf, the TCP/UDP measurement tool, to generate UDP traffic (similar to a voice call) and capture it using our integrated solution. We started out by using the whole network bandwidth, which is theoretically equal to 100 Mbps on our Fast Ethernet interfaces. We managed to squeeze out 94.5 Mbps which is pretty close to the theoretical maximum, however, we quickly jumped to the conclusion that it would be impratical to capture so much traffic. A 10 minute capture generated a 7.7GB file, which took approximately 30 minutes to upload to HDFS. So, we limited the available bandwidth to 12 Mbps, which generated a 934.6MB file after a 10 minute capture, which takes about 3 minutes to upload to HDFS, a far more reasonable value. To simulate peak data streams, we captured network traffic with higher available bandwidths: 20 Mbps and 25 Mbps, which generated 1.6GB and 1.9GB files accordingly.
We then defined a pair of use cases to test our solution and proceeded with the tests. Overall the solution was on par with what was expected. But the need for a job completion time and HDFS upload time predictor was made obvious by one specific real use case. Work on the predictors has already begun and there already exists a simple working version of both predictors. The predictors effectively solved the problem for the specific use case, but other use cases will not be suited by this simple version.
NEXT OBJECTIVES: After a meeting with Professor Ricardo Morla, we decided that we should focus on improving the features of our solution to suit as many use cases as possible. For the moment, our solution assumes that only one job runs in our local data center at a time, meaning simultaneous jobs are not expected for this solution. We are currently studying a way to allow the running of two simultaneous jobs in the local cluster.
Comments