Boey, Calvin Mun Lek (2019) Cloud-to-cloud data transfer parallelization framework via spawning intermediate instances for scalable data migration. Master dissertation/thesis, UTAR.
Abstract
As enterprises are increasingly embracing the practice of multiple clouds federation, scalable data transfer between cloud datacenters is important from the standpoint of cloud consumers. Many existing works are done from the service provider perspective, requiring insights into the datacenter operations which are not available to the cloud consumer. In this dissertation, a data transfer framework that allows cloud consumers to circumvent the bandwidth limitation by spawning intermediate nodes and perform parallel transfer through many-tomany nodes is proposed. However, the effectiveness of such approach depends on many factors such as the time required to spawn new nodes, and bandwidth between the nodes. The objective of this work is to investigate the limitation and potential of the cloud-to-cloud parallel transfer (CPT). Firstly, all the components needed in the parallel data transfer is identified and modelled. Based on the transfer time and cost models, the circumstances where parallel transfer is worthy is identified. Then, a few optimizations are proposed, namely pipelining and network data piping to increase the data transfer throughput. Pipelining enables each stages of the parallel transfer to work concurrently while network data piping reduces the time spent on dividing files into chunks. Secondly, selected cloud Virtual Machines (VM) are benchmarked. Based on the observed behavior, pre-testing and VM-type selection techniques are proposed. Pre-testing utilized nodes top performing nodes while VM-type selection utilize suitable VM type and sizing. Thirdly, the CPT is implemented and tested on Amazon EC2. The adapted CPT for transfer between Hadoop clusters is also tested. The results showed that the transfer time of CPT is not only lesser than DistCp, but also has a lower cost – up to 8x in certain scenario.
Actions (login required)