$treeview $search $mathjax $extrastylesheet
librsync
2.0.2
$projectbrief
|
$projectbrief
|
$searchbox |
00001 # Buffer internals {#buffer_internals} 00002 00003 ## Input scoop 00004 00005 A module called the *scoop* is used for buffering data going into 00006 librsync. It accumulates data when the application does not supply it 00007 in large enough chunks for librsync to make use of it. 00008 00009 The scoop object is a set of fields in the rs_job_t object:: 00010 00011 char *scoop_buf; /* the allocation pointer */ 00012 size_t scoop_alloc; /* the allocation size */ 00013 size_t scoop_avail; /* the data size */ 00014 00015 Data from the read callback always goes into the scoop buffer. 00016 00017 The state functions call rs__scoop_read when they need some input 00018 data. If the read callback blocks, it might take multiple attempts 00019 before it can be filled. Each time, the state function will also need 00020 to block, and then be reawakened by the library. 00021 00022 Once the scoop has been sufficiently filled, it must be completely 00023 consumed by the state function. This is easy if the state function 00024 always requests one unit of work at a time: a block, a file header 00025 element, etc. 00026 00027 All this means that the valid data is always located at the start of 00028 the scoop, continuing for scoop_avail bytes. The library is never 00029 allowed to consume only part of the data. 00030 00031 One the state function has consumed the data, it should call 00032 rs__scoop_reset(), which resets scoop_avail to 0. 00033 00034 00035 ## Output queue 00036 00037 The library can set up data to be written out by putting a 00038 pointer/length for it in the output queue:: 00039 00040 char *outq_ptr; 00041 size_t outq_bytes; 00042 00043 The job infrastructure will make sure this is written out before the 00044 next call into the state machine. 00045 00046 There is only one outq_ptr, so any given state function can only 00047 produce one contiguous block of output. 00048 00049 00050 ## Buffer sharing 00051 00052 The scoop buffer may be used by the output queue. This means that 00053 data can traverse the library with no extra copies: one copy into the 00054 scoop buffer, and one copy out. In this case outq_ptr points into 00055 scoop_buf, and outq_bytes tells how much data needs to be written. 00056 00057 The state function calls rs__scoop_reset before returning when it is 00058 finished with the data in the scoop. However, the outq may still 00059 point into the scoop buffer, if it has not yet been able to be copied 00060 out. This means that there is data in the scoop beyond scoop_avail 00061 that must still be retained. 00062 00063 This is safe because neither the scoop nor the state function will 00064 get to run before the output queue has completely drained. 00065 00066 00067 ## Readahead 00068 00069 How much readahead is required? 00070 00071 At the moment (??) our rollsum and MD4 routines require a full 00072 contiguous block to calculate a checksum. This could be relaxed, at a 00073 possible loss of efficiency. 00074 00075 So calculating block checksums requires one full block to be in 00076 memory. 00077 00078 When applying a patch, we only need enough readahead to unpack the 00079 command header. 00080 00081 When calculating a delta, we need a full block to calculate its 00082 checksum, plus space for the missed data. We can accumulate any 00083 amount of missed data before emitting it as a literal; the more we can 00084 accumulate the more compact the encoding will be.