![]() ![]() ![]() 2 How do we get data to the workers? NAS Compute Nodes SAN 2ģ Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes in the cluster Start up the workers on the node that has the data local Why? Not enough RAM to hold all the data in memory Disk access is slow, but disk throughput is reasonable A distributed file system is the answer GFS (Google File System) for Google s MapReduce HDFS (Hadoop Distributed File System) for Hadoop 3Ĥ Features Highly fault-tolerant Failure is the norm rather than exception High throughput May consist of thousands of server machines, each storing part of the file system s data.
0 Comments
Leave a Reply. |