Basically, we're multi-tenant memcached
MemCachier is latency sensitive. We need to answer queries...
Our servers are heavily loaded:
type User struct { entries map[string]*CacheValue ...metadata... } type Cache map[username]*User
At 1.6GB
The garbage collector has two phases:
mark(toplevelPointers) func mark(ps []Pointer) { for _,p := range(ps) { setLiveBit(p) mark(p.Children()) } }
cur := firstPointerInHeap while cur != nil { next := cur.next if !cur.Live() { free(cur) } cur = next }
unsafe
packagetype Pointer *ArbitraryTypepre 1) A pointer value of any type can be converted to a Pointer. 2) A Pointer can be converted to a pointer value of any type. 3) A uintptr can be converted to a Pointer. 4) A Pointer can be converted to a uintptr.
We can use unsafe.Pointer
to do out own
memory management!
type CacheValue { ...bunch of metadata... } var greatBigSlice []byte = make([]byte, 4GB) func Get(offset int) (*CacheValue) { return (*CacheValue)(unsafe.Pointer(&greatBigSlice[offset])) }
The problem is that CacheValue
actually looks like this:
type CacheValue struct { Key []byte Value []byte Flags []byte Expiration time.Time Version uint64 }
[]byte
A slab memory-allocator stores memory objects in fix sized bins.
Why?
A Chunk represents an application-level item in the allocator
A Page is a 1MB region containing Chunks of one size.
Slabs manage pages for the same chunk sizes.
type Chunk []byte type Page []byte type Slab struct { Pages []Page FreePage uint16 // Next free page FreeItem uint16 // Next free item slot ChunkSize ByteSize }
Structure of the top level Slab-based cache.
type Cache struct { // 15 pointers Slabs [MAX_FACTOR - MIN_FACTOR + 1]Slab // Protects access to each slab meta data Slabtex [MAX_FACTOR - MIN_FACTOR + 1]sync.Mutex // Stores the memory location of the beginning of a chain // for hashed keys. Keys [HASH_SLOTS]MemRef // Protects linked chains in the hash-map. Keytex [HASH_SLOTS]sync.Mutex }
To get data in and out of the allocator:
func (ptr MemRef) toChunk(c *Cache) Chunk { slab := c.GetSlab(ptr.Slab) page := slab.Pages[ptr.Page] chunkNum := ByteSize(ptr.Chunk) return Chunk(page[slab.ChunkSize * chunkNum:][:slab.ChunkSize]) } func (ptr MemRef) toHeader(c *Cache) (*ItemHeader) { chunk := ptr.toChunk(c) return (*ItemHeader)(unsafe.Pointer(&chunk[0])) }
Use native operations instead of log functions on Floats
TEXT ·log2Floor(SB),7,$0 BSRQ 8(SP), AX // 8(SP) is the first argument MOVQ AX, 16(SP) // store result RET
About an order of magnitude faster, used many times per Get/Set
Instead of 64-bit pointers, we use a struct with three 16-bit indexes.
Doesn't matter alone, but helps save 4 bytes here and there when reused in other structs.
type MemRef struct { Slab uint16 Page uint16 Chunk uint16 }
The GC with slab based version:
Compare this to ~100ms in steady-state and 1 secondmaximum for 1GB
Mmap
to directly allocate memory
When using make
to allocate []byte
, GC pauses still
grew to 20ms steady state and 500ms max.
page, err := syscall.Mmap(-1, 0, int(PAGE_SIZE), syscall.PROT_READ|syscall.PROT_WRITE|syscall.PROT_EXEC, syscall.MAP_ANON | syscall.MAP_PRIVATE)
This makes the GC totally ignore this part of the heap.
Be excellent to each other!
/
#