Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
- Awaysheh, Feras Mahmoud Naji
- José Carlos Cabaleiro Domínguez Director/a
- Tomás F. Pena Director/a
Universidad de defensa: Universidade de Santiago de Compostela
Fecha de defensa: 28 de febrero de 2020
- Arturo González Escribano Presidente
- Patricia González Secretario/a
- Blesson Varghese Vocal
Tipo: Tesis
Resumen
The rapidly growth of using computers and Internet in the beginning of this new century led to vast amounts of information available online, which create a large datasets of both structured and unstructured information that need to be process, analyzed, and linked by business, government organizations and other industries. This leads Big Data to drawn a huge attention from researchers in information sciences, policy and decision makers. Meanwhile, as the speed of information growth many challenges arises, such as difficulties in data capture, data storage, data analysis and process [1]. In 2012, Gartner (an information technology research and advisory firm) defined Big Data as the high-volume, high-velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. This study aims to improve Data Intensive Computing performance (DIC), using MapReduce service within a hybrid resource architecture (dedicated and non-dedicated), that supports heterogeneous environment using Hadoop-based implementation cluster (i.e., as organizations data centers often use multiple generations of hardware or typical desktop computers in a university or research lab). Moreover, the study aims to propose amalgamation technologies to produce a reliable, low cost, a hybrid environment that extends Hadoops task scheduling, with keeping in mind the MapReduce task scheduling Quality of Service requirements. Finally, aims to extensively evaluate the proposed system architecture design against current Hadoop system by conducting a comprehensive assessment of its performance, in addition, to characterize the improvements. One of the significant technological shifts of next-generation computing systems is in Big Data (BD) platforms. This trend has led to a wide range of revolutionary and state of the art enhancements within the data science led to the novel concept of Big Data as a Service (BDaaS). Apache Hadoop, the BD landmark framework, has evolved as a widely deployed large-scale data operating system. The new features provide Hadoop 3.x with the maturity and applicability to serve different markets. However, the performance and security of such systems are still the main concerns among practitioners. This trend leads Big Data to draw considerable attention from researchers in information sciences, policy, and decision makers. In this doctoral proposal, we aim to cope with two critical open challenges in current Big Data deployment architectures, which are system scalability and security. By employing the containerization technology and robust access control within a hybrid resource architecture (dedicated and non-dedicated), that supports the latest Hadoop 3.x platform. The study also aims to propose amalgamation technologies to produce a reliable, low cost, a hybrid environment that extends Hadoops task scheduling, with keeping in mind the Quality of Service and security requirements. Finally, it aims to extensively evaluate the proposed system architecture design against current Hadoop system by conducting a comprehensive assessment of its performance and trustworthiness in both centralized and edge deployment models.