LevelDB之Slice

发表于 2023-08-31 更新于 2023-09-01

前言

LevelDB中出现最多的对象就是Slice了，他算的上是一个轻量级的string 对象，里面只有两个成员变量：

1 2	const char* data_; size_t size_;

在Slice 中看到一个c++的operator的关键字，这个关键字的作用就是重载运算。比如重载两个数据中的+
1
2
3
4
5
6
7
8
9
10
class MyClass {
public:
    int value;
    MyClass(int val) : value(val) {}
    // Overloading + operator
    MyClass operator+(const MyClass& other) {
        MyClass result(value + other.value);
        return result;
    }
};
如果有MyClass + MyClass的操作，就会执行重载里面的方法。如果是下面那种=的重载，则相当于重载了赋值= 的含义，后面使用default，则表示使用默认的复制函数

const 在后面还有const修饰符，成员函数如果被声明为 const，表示这个成员函数是一个不会修改对象状态的函数。也就是说，它不会修改调用该函数的对象的任何成员变量。感觉是不可变量的一样。

他的构造函数为：

 Slice() : data_(""), size_(0) {} //空slice，
 // Create a slice that refers to d[0,n-1].
 Slice(const char* d, size_t n) : data_(d), size_(n) {} // 在d指向的char数组中0到n-1的char的值
 // Create a slice that refers to the contents of "s"
 Slice(const std::string& s) : data_(s.data()), size_(s.size()) {} // 传入一个string的地址
 // Create a slice that refers to s[0,strlen(s)-1]
 Slice(const char* s) : data_(s), size_(strlen(s)) {} //自传入char的指针
 // Intentionally copyable.
Slice(const Slice&) = default; 
Slice& operator=(const Slice&) = default; // 这种写法就是重载了= 的运算符

可以看到，他接受的最后基本上都是一个char的指针，以及当前在char指针中存储的数据length，需要获取数据，就直接去对应的char数组里面获取。本身没有创建内存或者释放内存，也就是说只是做为一个char数组的标识一样，本身不会产生内存和数据。

他的成员函数包括：

// Return a pointer to the beginning of the referenced data
 const char* data() const { return data_; } // 返回当前data的指针
 // Return the length (in bytes) of the referenced data
 size_t size() const { return size_; } // 当前slice表示的数组长度
 // Return true iff the length of the referenced data is zero
 bool empty() const { return size_ == 0; } // 判断是否为空
 // Return the ith byte in the referenced data.
 // REQUIRES: n < size()
 char operator[](size_t n) const { // 重载[] 运算符，相当于可以让当前的值变成数组？
   assert(n < size());
   return data_[n];
 }
 // Change this slice to refer to an empty array
 void clear() { // 清理当前的数据
   data_ = "";
   size_ = 0;
 }
 // Drop the first "n" bytes from this slice.
 void remove_prefix(size_t n) { // char* 后面的数组内存是连续的。在C++中，数组是一块连续的内存区域，而 char* 是一个指向字符（char）的指针，它可以指向数组的起始位置.所以可以直接将指针往后移，将初始位往后移动
   assert(n <= size());
   data_ += n;
   size_ -= n;
 }
 // Return a string that contains the copy of the referenced data.
 std::string ToString() const { return std::string(data_, size_); } // 转为string
 // Three-way comparison.  Returns value:
 //   <  0 iff "*this" <  "b",
 //   == 0 iff "*this" == "b",
 //   >  0 iff "*this" >  "b"
 int compare(const Slice& b) const; // 和b进行比较
 // Return true iff "x" is a prefix of "*this"
 bool starts_with(const Slice& x) const { // 前缀判断
   return ((size_ >= x.size_) && (memcmp(data_, x.data_, x.size_) == 0));
 }

slice的方法除了remove_prefix和clear以外都是非静态的即对Slice本身不做改变，所以在Slice.h中作者写道

// Multiple threads can invoke const methods on a Slice without
// external synchronization, but if any of the threads may call a
// non-const method, all threads accessing the same Slice must use
// external synchronization.

也就是说只要没有调用非const方法，那么Slice 就是线程安全的。

Slice还有3个内联函数

内联方法是会将这个方法在编译阶段直接展开的方法，而不是需要方法调用

memcmp 是 C/C++ 标准库中的一个函数，用于比较两块内存区域的内容是否相等。它的声明如下：

int memcmp(const void* ptr1, const void* ptr2, size_t num);

inline bool operator==(const Slice& x, const Slice& y) { // 重载== 方法
  return ((x.size() == y.size()) &&
          (memcmp(x.data(), y.data(), x.size()) == 0));
}

inline bool operator!=(const Slice& x, const Slice& y) { return !(x == y); } // 重载 !=

inline int Slice::compare(const Slice& b) const  // 和b进行比较,比较两者长度相同位置的char 是否相等
  const size_t min_len = (size_ < b.size_) ? size_ : b.size_; 
  int r = memcmp(data_, b.data_, min_len);
  if (r == 0) {
    if (size_ < b.size_)
      r = -1;
    else if (size_ > b.size_)
      r = +1;
  }
  return r;
}

why

为什么不使用string对象，而是使用Slice 作为char数组呢？在代码中的doc/index.md 中说明了原因：

Returning a Slice is a cheaper alternative to returning a std::string since we do not need to copy potentially large keys and values. In addition, leveldb methods do not return null-terminated C-style strings since leveldb keys and values are allowed to contain '\0' bytes.

Slice 本身是不包含内存管理和分配的，而是直接应用外部的char数组指针，这样在操作的时候就避免了数据的复制和拷贝。主要是因为存储格式是按照length value的形势紧凑的存储的，如果key或者value的值较大，可能就是会从原始数据中截断，然后复制到string中，添加了额外的数据复制。
使用Slice 还有一个好处就是上文一直提到的，它本身不管理内存，只是对引用的数据，优化了内存的使用。
数据更加灵活，因为string本身是按照‘\0’作为结束符，但是在LevelDB中可能存在空的字节，这个是因为sequence等高位为空的情况，所以不太适合于string。PS：在debug的过程中，尤其是debug到skiplist插入的时候，就可能存在只能在clion中看到key，但是无法看到value的情况。就是因为string已经被截断了。
和string的切换十分方便，因为构造函数和tostring其实都是将size和data中的数据进行一个转换，能够灵活转换，而且更加灵活记录数据。

后记

Slice 其实就是一个仅记录char 数组头部指针和长度的数据类型，本身可以说不存储数据，个人感觉就是个start，end的结构来记录char 数组中的数据，然后就是提供了需要使用的比较和赋值等数据的方法。因为没有内存，但是可以灵活操作数组中的数据，这里的操作大部分都是线程安全的如比较或者获取的操作。写的操作较少。