Critically evaluate, select and employ appropriate tools

Post New Homework

COMP1702 Big Data

Learning Outcome 1: Explain the concept of Big Data and its importance in a modern economy
Learning Outcome 2: Explain the core architecture and algorithms underpinning big data processing
Learning Outcome 3: Analyse and visualize large data sets using a range of statistical and big data technologies
Learning Outcome 4 Critically evaluate, select and employ appropriate tools and technologies for the development of big data applications

Part A

o Task A.1 Explain the main characteristics of Big Data. (Word count: 200 words ±10%)

o Task A.2 Compare Hadoop and Relational Database Systems. Give an application scenario that is well suited to Hadoop and explain your reason. (Word count: 300 words ±10%)

Part B MapReduce Programming

Suppose that you have a large student file which cannot be stored in a single machine. Each record of this file contains information: (Student_ID, Student_Name, Sex, Age, Module, Grade, Department).

o Task B.1 Please design a MapReduce Algorithm (Pseudo-codes or Java Codes) to output the average grade for each module. The algorithm is expected to be as efficient as possible.

o Task B.2 Describe the algorithm designed. You should explain how the input is mapped into (key, value) pairs by the map stage, i.e., specify what is the key and what is the associated value in each pair, and, if needed, how the key(s) and value(s) are computed. Then you should explain how the output (key, value) pairs of the map stage are processed by the reduce stage to

get the final answer(s). You should also analyse the efficiency of the MapReduce algorithm designed. (Word count: 300 words ±10%)

Part C: Big Data Project Analysis
The CropY company is a leading provider of precision agriculture service. Precision agriculture is the science of gathering, processing, and analysing temporal, spatial and individual data. It combines other information to support management decisions according to estimated variability for improved resource use efficiency, productivity, quality, profitability.

The CropY company is now plan to develop a big data project to meet the following requirements: help worldwide users better understanding the implications of the weather and making contingency plans; buying supplies, such as fertilizer and seeds; as well as maintaining and monitoring the quality of yield, whether livestock or crops; knowing the variety of cultivated plants, conditions of its growth and its needs of seeds; choosing the type of fertilizer and pesticides, understanding their employment conditions and their impact on the climate- soil-plant; recognizing daily water needs for each kind of plant; calculating the median and mean values of yield; studying the conditions of natural environment; estimating the financial revenue and manage the potential risks.

o Task C.1 : The volume of big data is expected to be more than 500 Petabytes. The data will come from various sensors, satellites, drones, social media, market data, Online news feed etc. The Figure 1 below shows some example data of CropY company. Some IT technician plan to build a data warehouse to store data for further data analysis tasks but some others believe data lake is a better choice. Which choice do you prefer? Please justify your choice. (Word count: 300 words ±10%)

o Task C.2: The data of CropY company includes a large collection of plants, corps, diseases, symptoms, pests, and relationships between them. The CropY company needs to build a data analytical store which can facilitate queries like: "find all diseases which are directly or indirectly caused by nitrogen deficiency". Please recommend a data store and justify your choice. (Word count: 300 words ±10%)

o Task C.3: Some prediction and analytics services provided by the CropY company require to response in a few seconds after the arrival of new data. Namely, they are real time or near real time prediction and analytics tasks. Some IT managers suggested a popular distributed processing framework - MapReduce to implement these tasks. Do you agree with that? Please justify your choice. (Word count: 300 words ±10%)

o Task C.4: CropY company decided to move most of applications and services to cloud. These applications and services need to be highly available, scalable, and accessible from worldwide. Note that some data such as price and customer data are confidential. Please design a cloud hosting strategy for this big data project and explain how your design will meet the security, scalability, high availability. (Word count: 300 words ±10%)

Attachment:- Big Data.rar

Post New Homework
Captcha

Looking tutor’s service for getting help in UK studies or college assignments? Order Now