Monday, October 19, 2015

GFS阅读笔记

== 文章结构 ==
Abstract
1. Introduction
2. Design Overview
   2.1 Assumptions
   2.2 Interface
   2.3 Architecture
   2.4 Single Master
   2.5 Chunck Size
   2.6 Metadata
      2.6.1 In Memory Data Structures
      2.6.2 Chunk Locations
      2.6.3 Operation Log
   2.7 Consistency Model
      2.7.1 Guarantees by GFS
      2.7.2 Implications for Applications
3. System Interactions
   3.1 Leases and Mutation Order
   3.2 Data Flow
   3.3 Atomic Record Appends
   3.4 Snapshot
4. Master Operation
   4.1 Namespace Management and Locking
   4.2 Replica Placement
   4.3 Creation, Re-replication, Rebalancing
   4.4 Garbage Collection
      4.4.1 Mechanism
      4.4.2 Discussion
   4.5 Stale Replica Detection
5. Fault Tolerance and Diagnosis
   5.1 High Availability
      5.1.1 Fast Recovery
      5.1.2 Chunk Replication
      5.1.3 Master Replication
   5.2 Data Integrity
   5.3 Diagnostic Tools
6. Measurements
   6.1 Micro-benchmarks
      6.1.1 Reads
      6.1.2 Writes
      6.1.3 Record Appends
   6.2 Real World Clusters
      6.2.1 Storage
      6.2.2 Metadata
      6.2.3 Read and Write Rates
      6.2.4 Master Load
      6.2.5 Recovery Time
   6.3 Workload Breakdown
      6.3.1 Methodology and Caveats
      6.3.2 Chunkserver Workload
      6.3.3 Appends versus Writes
      6.3.4 Master Workload
7. Experience
8. Related Work
9. Conclusions

== 1. 背景介绍 ==

1. Component Failure are norm rather than exception.
2. Files are huge than traditional standards.
3. Most data are mutated by appending new data rather than overwriting existing data.
4. Co-designing the applications and the file system API benefits the overall system by increasing our flexibility

== 2. 设计概要 ==

1. 前提假设
   a) The system is built from many inexpensive commodity components that often fail.
   b) The system stores a modest number of large files.
   c) The workloads primarily consist of two kinds of reads:large streaming reads and small random reads.
   d) The workloads also have many large, sequential writes that append data to files.
   e) The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file.
   f) High sustained bandwidth is more important than low latency.

2. 接口
   a) GFS provides a familiar file system interface which supports the usual operations to create, delete, open, close, read, and write files.
   b) Snapshot creates a copy of a file or a directory tree at low cost.
   c) Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append.

3. 架构





   a) A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients.
   b) Each of these is typically a commodity Linux machine running a user-level server process.
   c) Files are divided into fixed-size chunks.
       Each chunk is identified by an immutable and globally unique 64 bit chunk handle assigned by the master at the time of chunk creation.
       Each chunk is replicated by default 3 replicas on multiple chunkservers.
   d) The master maintains all file system metadata including the namespace, access control information, the mapping from files to chunks,
       and the current locations of chunks.
   e) It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between
       chunkservers.
   f) The master periodically communicates with each chunkserver in HeartBeat messages to give it instructions and collect its state.
   g) Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkservers.
   h) Neither the client nor the chunkserver caches file data because most applications stream through huge files or have working sets too large to be cached.

4. 单主服务器策略

   a) Clients never read or write file data through master.
   b) Instead Clients ask the master which chunkserver it should contact.
   c) The client will cache the information until it expires.
   d) So when accessing start, client already know which chunkserver it needs to access.
   e) It will first send out request to master, master will reply with a chunk handle and location of replicas.
   f) The client will send request to 1 of the replicas which is the closest one.

5. 块大小的选择 (64MB)

优点:
   a) Reduce client's need to interact with Master.
   b) Reduce network overhead by keep a persistent TCP connection to the chunkserver over an extended period of time.
   c) Reduce size of metadata on Master.

缺点:Small file like only 1 chunk size will be hot spots if many clients are accessing the same file.
临时解决方案:多备份,默认3个备份可以提高到100等等。
长期解决方案:允许client间相互传送数据。
 

No comments:

Post a Comment