WEEK 4 Memory

1. 十六进制

十六进制的好处就是一个字符可以充分表示4个Bit的所有状态，两个字符可以充分表示1个字节的所有状态。

为了区别十进制和十六进制，一般十六进制前面会加上0x，例如0xFF表示十进制的255。

2. 内存地址和指针

C有两种类型的变量：

基本数据类型：储存基本数据类型。
指针：储存8字节的物理地址。

所有复杂数据类型（数组，结构体）都退化成了指针变量。

可以通过指针变量，访问实际存储的数据。

2.1 `*`和`&`

计算机的内存一般都是按照最小单位为字节的方式计数。假设内存一共有1GB = $2 ^ {10}$ MB = $2 ^ {20}$ KB = $2 ^ {30}$ Byte，那么一共有 $2^{30}$ 个字节。每个字节都排个序号，那么只需要从1到 $2^{30}$ 就够了。1位十六进制数能表示 $2^4$ 个数，到我们只需要8位十六进制数就能表示所有的字节编号（40000000= $2^{30}$ ）。

指针就是一个变量背后的物理地址字节的编号，一般一个变量的大小都是字节的整数倍，因此，指针其实就是首字节的编号。

&：返回一个变量的物理地址（指针），即这个变量的虚拟内存的第一个字节编号。例如：
```
#include <stdio.h>

int main(void){
    int n = 100;
    printf("%p\n", &n);
}
```
运行：
```
ubuntu@c-test-node:~/C/w4$ ./pointer1 
0x7ffca3963bec
```
0x7ffca3963bec表示的就是储存n的实际物理字节的编号，这里使用了12位16进制数。
通常情况下64位系统中，指针占8个字节。
具体是多少位取决于CPU架构，如果是Intel x86-64一般是48位的虚拟内存（非不是64位），意思就是其上限就是允许 $2^{48}$ 个不同的字节，等效于256TB。
地址有64位的宽度，但目前的实现机制（以及任何已知处在计划阶段的芯片）并不允许整个16EB的虚拟地址空间都被使用。大多数的操作系统和应用程序在可见的未来都用不到如此巨大的地址空间（比如，Windows在AMD64上的实现仅应用了256TB，即48位的宽度），所以实现如此巨大的地址宽度只会增加系统的复杂度和地址转换的成本，带不来任何好处。AMD因此决定，在对这一架构的首次实现中，只有虚拟地址的最低48位才会在地址转换（页表查询）时被使用。
*（Dereference Operator）：表示
- 一个变量储存的是指针（name is a pointer to var_type）
- &的反义运算符，即一个指针指向的数据（把这个字节作为首字节而形成的数据）（每一个指针指向的数据类型都是Well-defined的）
例如：
```
#include <stdio.h>

int main(void){
    int n = 100;
    int * p = &n;
    printf("%p\n", p);
    printf("%d\n", *p);
    printf("%p\n", &p);
}
```
执行结果：
```
ubuntu@c-test-node:~/C/w4$ ./pointer1
0x7ffc25070b5c
100
0x7ffc25070b50
```
这里，p已经是指针，因此直接调用p，得到的是指针。而调用*p返回的是指针所指向的数据（表示p指向的东西）。因为之前定义了p指向的是int数据，因此*p会把从p开始的4个字节转化了int。
注意：尽管指针只是一个8字节的地址，但是也要申明指针指向数据的类型。这样系统才知道，当跳转到被指向的第一个字节的时候，还需要继续读多少个字节。
实际上，p本质上是一个Long的变量，也会存在某个地方。但是注意，指针一定是针对变量而言的，如果只是Literal，那么是没有指针的说法的。

指针本质上是48位的整数，也会存在某个物理内存字节中，因此指针的指针也是有意义的。

通常情况下64位系统中，所有的指针都占8个字节。

每一个变量（不管是基本数据变量，结构体变量，数组变量，还是指针变量）都有不同的指针。

和Java不一样的是，在C里面，数组和Struct变量不一定是指针变量，可以占有实际数据的空间，而在Java中数组和对象都是Reference。

#include <stdio.h>

int main(void){
    int n1 = 100;
    int n2 = n1;
    int * p1 = &n1;
    int * p2 = p2;
    printf("%p\n", &n1);
    printf("%p\n", &n2);
    printf("%p\n", &p1);
    printf("%p\n", &p2);
}

运行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer1
0x7ffc0f31423c
0x7ffc0f314238
0x7ffc0f314230
0x7ffc0f314228

对于基本数据类型，改变值并不会改变变量的指针：

#include <stdio.h>

int main(void){
    int a = 100;
    printf("%p\n", &a);
    a = 200;
    printf("%p\n", &a);
}

结果：

0x7ffe84e946fc
0x7ffe84e946fc

C语言最强大也是最危险的地方就在于可以操控指针，也就是操控内存。C语言可以读取任何物理编号的字节数据，并加以修改。

With C, we can also go to specific addresses in memory, which might cause segmentation faults, where we’ve tried to read or write to memory we don’t have permission to.
Segmentation Fault：试图接触不允许访问的物理内存地址。

2.2 指针与数组

指针与数组在一定意义上是等价的，其实数组变量本身就是退化指针，指向这个数组的第一个元素的第一个字节：

#include <stdio.h>

int main(void){
    char s[] = "apple";
    char *s2 = s; // s本身就是指针
    // 报错 char s3[] = s; 
    printf("%p\n", s);
    printf("%p\n", &s);
    printf("%p\n", *&s);
    printf("%p\n", s2);
}

执行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer2
0x7ffca7ec65fa
0x7ffca7ec65fa
0x7ffca7ec65fa
0x7ffca7ec65fa

实际上，任何一个指针变量都可以被当成数组来使用（就算其本来不是数组），可以使用数组的索引，索引的范围是没有限制的，当然有可能会发生Overflow。除非这个变量之前被显示定义为数组，索引的范围是之前指定的最大的长度。

int main(void){
    char s[] = "apple";
    char *s2 = s; // s本身就是指针
    int n = 100;
    int *p = &n;
    printf("%c\n", *s2); // s2指向数组中第一个元素
    printf("%c\n", s2[100]); // s2[100]指向"a"之后的第100个Byte
    // 报错：printf("%p\n", s[100]); 以为s的长度已经显示地指定
    printf("%i\n", p[100]); // p[100]指向n之后的第397-400个Byte
}

执行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer3
a
�
32764

尽管如此，在申明数组的时候，除了传入字符串，必须使用[]符号来申明，不能使用*符号。只不过数组作为形参的时候，都可以使用。

2.3 指针的长度

数组的指针变量和普通的指针变量的区别在于，数组的指针变量变量记录了这个数组的长度信息，使用sizeof会返回这个数组的字节长度，而普通的指针变量会返回8（64位系统）字节。

#include <stdio.h>
#include <string.h>

int main(void){
    int nums[] = {1,2,3,4,5,6};
    int *nums2 = nums;
    printf("%lu\n", sizeof(nums));
    printf("%lu\n", sizeof(nums2));
}

执行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer3
24
8

从这可以得出两个结论：

在数组申明的代码块中可以使用sizeof(a)/sizeof(a[0])得到数组的长度（元素的个数）。
当数组指针被赋值给其他指针变量，或者被传入其他函数的时候，sizeof(a)会固定返回指针的长度，数组中元素的个数的信息被丢失。因此作为参数传入其他函数的时候，最好同时传入数组的元素个数。

注意：作为参数传入时候，a[]和*a其实是等价的。

2.4 命令行参数

之前讲到过命令行参数可以用char* argv[]，逻辑如下：

首先string argv[]很好理解。
string其实char*的别名。string就是一个char的数组，退化为指向数组中第一个数据的指针。
char* argv[]代表argv是一个数组，里面每一个元素都是一个指针指向char，由于指针其实也是数组，所以argv[]里存储的就是char[]数组。
而argv本身也是个退化的指针，意思就是argv[]和*argv其实是等价的，从而char* argv[]等价于char ** argv。char ** argv意思就是：argv是一个指针，其指向的元素也是个指针（因此后后续的内存也是指针），每个指针指向一个char。

如何确定*argv作为数组有多少个指针元素？argc确定。

如何确定**argv作为数组有多少个char元素？使用EOF定位符。

#include <stdio.h>
#include <string.h>

int main(void){
    char s[] = "apple";
    char s1[] = {'a', 'b', 'c', 'd'};
    char *s2 = s;
    char *s4 = s1;
    printf("%lu\n", sizeof(s));
    printf("%lu\n", sizeof(s1));
    printf("%lu\n", sizeof(s2));
    printf("%s\n", s);
    printf("%s\n", s1);
    printf("%s\n", s2);
    for (int i = 0; i < 9; i++){
        printf("%c", s4[i]);
    }
    printf("\n");
}
执行结果：
6
4
8
apple
abcdapple
apple
abcdapple
可以发现s%的背后逻辑就是从提供的指针开始，直到遇见\0（8个0比特）停止输出。s1的打印结果并不只是abcd而是把s的结果也打印出来了，因为碰巧这两个虚拟内存相邻，从最后的循环也能验证这个结论。
有时候s1也能正确打印，是因为当大多数内存空闲的时候，都是以0比特的方式存在的，所以s1结束后是EOF的情况也很有可能。

2.5 空指针

在C里面空指针为NULL，是一种特殊的指针，每一个比特位都是0，故名为空指针。

有些函数可能会返回空指针，因此要做判断（== ）。

2.6 指针的默认值

任何变量（x）未初始化的时候，都已经分配了指针（%x），只不过值（x）还没有分配。

有些编译器不允许使用未初始化的变量，有些可以。

指针变量未初始化的时候是NULL。

通常指针变量会通过malloc，calloc和realloc赋值。

注意：数组变量尽管退化为了指针，但是未初始化的时候也会赋值，其内存地址属于栈空间。

#include <stdio.h>

int main(void){
    char x[10];
    char *y; // NULL
    printf("%p\n", x); // 虽然没有初始化，值（指针）已经存在，属于栈空间
    // printf("%p", y); 报错
    printf("%p\n", &y); // 每个变量都被分配了指针，就算没有初始化。
}

结果：

0x7ffe4a684616
0x7ffe4a684608

默认情况下每个变量的指针都属于栈空间。

3. Strings

3.1 String的定义

C里面的String的正确定义方式为：

char *s1 = "apple";
char s2[] = "apple";

String本质上是Char的数组，s1和s2已经退化成了指针，指向数组的第一个字节。

在String的定义代码块内，会保留数组中元素的个数，但是当作为参数传入其他函数，或者复制给其他指针变量的时候，数组中元素的个数信息会丢失，会成为一个指向Char的指针（系统并不知道是不是数组/字符串）。

通常来说，指针变量必须被赋值为地址，但是数组（字符串就是Char数组）等复杂数据类型是个例外。

#include <stdio.h>
 int main(void){
    char *s = "Apple";
    char c = s[0];
    char *p1 = &c;
    char *p2 = &s[0];
    printf("s:-----------\n");
    printf("%p\n", s);
    printf("p1:----------\n");
    printf("%p\n", p1);
    printf("%c\n", *p1);
    printf("%s\n", p1);
    printf("p2:----------\n");
    printf("%p\n", p2);
    printf("%c\n", *p2);
    printf("%s\n", p2);
 }

运行结果：

ubuntu@c-test-node:~/C/w4$ ./string 
s:-----------
0x55aae261d857
p1:----------
0x7ffd9261981f
A
AW�a�U
p2:----------
0x55aae261d857
A
Apple

可以发现当把s[0]重新赋值给c之后，系统为c重新开辟了一个地址来存储A，因此c和s有不同的指针。

而&s[0]和s的指针是相同的。

注意：使用%s打印字符串的时候，需要传入char *类型的变量！

3.2 指针的连续

证明字符串里的每一个字符的指针都是连续的：

#include <stdio.h>
#include <string.h>

int main(void){
    printf("string:\n");
    char *s = "Apple";
    for (int i = 0, l = strlen(s); i < l; i++){
        printf("%p\n", &s[i]);
    }
    printf("int:\n");
    int n[] = {1, 2, 3, 4, 5};
    for (int i = 0; i < 5; i++){
        printf("%p\n", &n[i]);
    }
}

运行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer4
string:
0x560e74c42860
0x560e74c42861
0x560e74c42862
0x560e74c42863
0x560e74c42864
int:
0x7fff7875f890
0x7fff7875f894
0x7fff7875f898
0x7fff7875f89c
0x7fff7875f8a0

因为char只占了一个字节，因此字符串里每个字符的指针相差1。int占了4字节，因此字符串里每个字符的指针相差4。

3.3 字符串常量池

C里面也有字符串常量池这种概念。

#include <stdio.h>
 int main(void){
    char *s1 = "Apple";
    char *s2 = "Apple";
    printf("%p\n", s1);
    printf("%p\n", s2);
    printf("%i\n", s1 == s2);
 }

运行结果：

0x55e292b85857
0x55e292b85857
1

可以看到，两个"Apple"字符串在编译的时候就已经储存在内存了，两个变量的指针是一样的。

3.4 Mutable

C中非Literal的String都是Mutable的，因为本质上就是一个char数组。

#include <stdio.h>
 int main(void){
    char *s1 = "Apple";
    char s2[] = {'a', 'b', 'c', '\0'};
    char *s3 = s2;
    // 报错：s1[0] = 'a';
    s2[0] = 'A';
    printf("%s\n", s2);
    printf("%s\n", s3);
 }

结果：

ubuntu@c-test-node:~/C/w4$ ./string3
Abc
Abc

之所以会报错是因为"Apple"是个Literal，在编译的时候已经确定了内存位置和数据，不能发生改变。

可以发现s3也发生了修改，因为s2和s3的指针是一样的。

4. 指针运算

指针本质上是个Long，所以可以直接使用加减运算得到相邻的内存地址。

#include <stdio.h>
#include <string.h>

int main(void){
    char *s = "Apple";
    for (int i = 0, l = strlen(s); i < l; i++){
        printf("%c\n", *(s + i));
    }
    printf("%s\n", s - 100);
}

运行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer5
A
p
p
l
e
/../sanitizer_common/sanitizer_signal_interceptors.incd

指针加减运算强大地方在于，能自动乘以所指向的数据的字节数。例如int类型的指针加上1，实际效果是前往后第四个字节的位置。

#include <stdio.h>
#include <string.h>

int main(void){
    int n[] = {1, 2, 3};
    for (int i = 0; i < 3; i++){
        printf("%i\n", *(n + i));
        printf("%p\n", n + i);
    }
}

运行结果：

ubuntu@c-test-node:~/C/w4$ ./pointer6
1
0x7ffe34956ab0
2
0x7ffe34956ab4
3
0x7ffe34956ab8

注意：对于指针p，*(p + i)和p[i]（语法糖）等价。

5. 字符串的复制

如果把一个char *指针赋值给另外一个char *变量，这两个变量实际上储存同一个地址，对其中一个数组修改，相当于对两个数组都修改了。

因此，字符串的赋值，不能是简单的赋值，而是需要一个目标内存地址（Buffer），把每个字符串拷贝到目标地址中。专业术语叫做：Dynamic Memory Allocation。

#include <string.h>
#include <stdio.h>

int main(void){
   char *s1 = "Apple";
   int l = strlen(s1);
   char s2[l + 1];
   for (int i = 0; i <= l; i++){
      s2[i] = s1[i];
   }
   printf("%s\n", s1);
   printf("%s\n", s2);
   s2[0] = 'a';
   printf("%s\n", s1);
   printf("%s\n", s2);
 }

结果：

ubuntu@c-test-node:~/C/w4$ ./string3
Apple
Apple
Apple
apple

实际上，C的<stdlib.h>库中提供了两个方法：

void *malloc(size_t size)：分配所需的内存空间，并返回一个指向它的指针。如果请求失败，则返回 NULL。
返回的是void *类型的指针，通常需要转化成相应的类型：
```
pointer_name = (cast-type*) malloc(size);
```
void *realloc(void *ptr, size_t size)：尝试重新调整之前调用malloc或calloc所分配的ptr所指向的内存块的大小。
- ptr：指针指向一个要重新分配内存的内存块，该内存块之前是通过调用 malloc、calloc 或 realloc 进行分配内存的。如果为空指针，则会分配一个新的内存块，且函数返回一个指向它的指针。
- size：内存块的新的大小，以字节为单位。如果大小为 0，且ptr指向一个已存在的内存块，则 ptr所指向的内存块会被释放，并返回一个空指针。
void free(void *ptr)：释放之前调用calloc、malloc或realloc所分配的内存空间。指针指向一个要释放内存的内存块，如果传递的参数是一个空指针，则不会执行任何动作。
It does not change the value of the pointer which means it still points to the same memory location.

string.h库提供了strcpy和strcat方法：

char *strcpy(char *dest, const char *src)：复制字符串s2到字符串s1，src所指向的字符串复制到dest。需要注意的是如果目标数组dest不够大，而源字符串的长度又太长，可能会造成缓冲溢出的情况。该函数返回一个指向最终的目标字符串dest的指针（通常不会使用）。
char *strcat(char *dest, const char *src)：把src所指向的字符串追加到dest所指向的字符串的结尾。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
 
int main()
{
   char *str;
 
   /* 最初的内存分配 */
   str = (char *) malloc(15);
   if (str == NULL){
       return 1; 
   }
   strcpy(str, "runoob");
   printf("String = %s,  Address = %p\n", str, str);
 
   /* 重新分配内存 */
   str = (char *) realloc(str, 25);
   strcat(str, ".com");
   printf("String = %s,  Address = %p\n", str, str);
 
   free(str);
   printf("String = %s,  Address = %p\n", str, str);
}

结果：

String = runoob,  Address = 0x5641c0caf2d0
String = runoob.com,  Address = 0x5641c0caf3a0
String = �֤DV,  Address = 0x5641c0caf3a0

可以发现，如果使用重新分配，会改变头字节的指针，中间的数据实际上做一个拷贝。

使用free之后，原内存空间的数据都被清空了。

由于malloc可能失败，返回NULL，所以需要使用先判断是否成功。

Best Practice：使用了malloc不需要的时候使用free。

注意：

malloc等函数会划分堆内存而不是栈内存，而free函数只能清空堆内存，所以一般的变量所对应的指针都不能传入free。
堆内存是一大片内存空间，堆内存的分配是动态且不连续的，程序可以按需申请堆内存空间，但是访问速度要比栈内存慢不少。
堆内存里的数据可以长时间存在，无用的数据需要程序主动去回收，如果大量无用数据占用内存就会造成内存泄露（Memory leak）。
简单来说：堆内存适合存放生命周期长，占用空间较大或占用空间不固定的数据。

如果malloc分配的内存小于被拷贝的长度，不一定会失败，但是有可能触发Seg Fault。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
 
int main()
{
   char *str;
   str = (char *) malloc(3);
   strcpy(str, "abcdefg");
   printf("String = %s,  Address = %p\n", str, str);
   free(str);
}

结果：

String = abcdefg,  Address = 0x55f14fd042e0

strcpy会拷贝最后的\0，因此malloc分配的长度至少为strlen(s) + 1。
malloc的参数一般为：n * sizeof(type)。

6. `valgrind`

当访问未被分配的的内存时，不一定会出错，因此不一定能Debug出来。

Recommended Valgrind Options:

valgrind工具能够帮助我们检测是否有内存安全隐患。

valgrind -–leak-resolution=high –-leak-check=full
–-show-reachable=yes –-track-fds=yes ./myProgram

以最近的字符串复制为例：

ubuntu@c-test-node:~/C/w4$ valgrind ./copy2
==32337== Memcheck, a memory error detector
==32337== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32337== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==32337== Command: ./copy2
==32337== 
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x1b
get_Form_szB: unhandled 27 (DW_FORM_addrx)
--32337-- WARNING: Serious error when reading debug info
--32337-- When reading debug info from /home/ubuntu/C/w4/copy2:
--32337-- get_Form_contents: unhandled DW_FORM
==32337== Invalid write of size 1
==32337==    at 0x484EE7C: strcpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134DD5: main (copy2.c:9)
==32337==  Address 0x4bdd233 is 0 bytes after a block of size 3 alloc'd
==32337==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134D88: main (copy2.c:8)
==32337== 
==32337== Invalid write of size 1
==32337==    at 0x484EE8E: strcpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134DD5: main (copy2.c:9)
==32337==  Address 0x4bdd237 is 4 bytes after a block of size 3 alloc'd
==32337==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134D88: main (copy2.c:8)
==32337== 
==32337== Invalid read of size 1
==32337==    at 0x484ED24: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x4A21DB0: __vfprintf_internal (vfprintf-internal.c:1517)
==32337==    by 0x4A226E4: buffered_vfprintf (vfprintf-internal.c:2261)
==32337==    by 0x4A0B81E: printf (printf.c:33)
==32337==    by 0x134DEB: main (copy2.c:10)
==32337==  Address 0x4bdd233 is 0 bytes after a block of size 3 alloc'd
==32337==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134D88: main (copy2.c:8)
==32337== 
==32337== Invalid read of size 1
==32337==    at 0x4A39030: _IO_default_xsputn (genops.c:394)
==32337==    by 0x4A39030: _IO_default_xsputn (genops.c:370)
==32337==    by 0x4A2208B: outstring_func (vfprintf-internal.c:239)
==32337==    by 0x4A2208B: __vfprintf_internal (vfprintf-internal.c:1517)
==32337==    by 0x4A226E4: buffered_vfprintf (vfprintf-internal.c:2261)
==32337==    by 0x4A0B81E: printf (printf.c:33)
==32337==    by 0x134DEB: main (copy2.c:10)
==32337==  Address 0x4bdd233 is 0 bytes after a block of size 3 alloc'd
==32337==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134D88: main (copy2.c:8)
==32337== 
==32337== Invalid read of size 1
==32337==    at 0x4A3903E: _IO_default_xsputn (genops.c:393)
==32337==    by 0x4A3903E: _IO_default_xsputn (genops.c:370)
==32337==    by 0x4A2208B: outstring_func (vfprintf-internal.c:239)
==32337==    by 0x4A2208B: __vfprintf_internal (vfprintf-internal.c:1517)
==32337==    by 0x4A226E4: buffered_vfprintf (vfprintf-internal.c:2261)
==32337==    by 0x4A0B81E: printf (printf.c:33)
==32337==    by 0x134DEB: main (copy2.c:10)
==32337==  Address 0x4bdd234 is 1 bytes after a block of size 3 alloc'd
==32337==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==32337==    by 0x134D88: main (copy2.c:8)
==32337== 
String = abcdefg,  Address = 0x4bdd230
==32337== 
==32337== HEAP SUMMARY:
==32337==     in use at exit: 0 bytes in 0 blocks
==32337==   total heap usage: 5 allocs, 5 frees, 210 bytes allocated
==32337== 
==32337== All heap blocks were freed -- no leaks are possible
==32337== 
==32337== For lists of detected and suppressed errors, rerun with: -s
==32337== ERROR SUMMARY: 14 errors from 5 contexts (suppressed: 0 from 0)

常见的问题：

非法访问。应检查访问下标。
非法写入。应检查访问下标。
内存泄漏。应检查是否使用了free()。
内存泄漏（Memory Leak）是指程序中已动态分配的堆内存由于某种原因程序未释放或无法释放，造成系统内存的浪费，导致程序运行速度减慢甚至系统崩溃等严重后果。
Reading freed variables
Reading uninitialized variables
Writing to uninitialized memory

Types of Memory Leaks

Still Reachable:
- Block is still pointed at, programmer could go back and free it before exiting
Definitely Lost
- No pointer to the block can be found
Indirectly Lost
- Block is “lost” because the blocks that point to it are themselves lost
Possibly Lost
- Pointer exists but it points to an internal part of the memory block

7. Garbage Values

全局变量会初始化。

局部变量不会初始化。

数组变量未初始化可以使用，因为指针的性质。但是可能会存在Garbage Values，即之前其他内存使用过，但还未归位的数据。
普通变量在未初始化之前不能使用。

#include <stdio.h>

int a;
int b[3];

int main(void){
    int c;
    int d[3];
    printf("a: %i\n", a);
    for (int i = 0; i < 3; i++){
        printf("b[%i]: %i\n", i, b[i]);
    }
    // 报错: printf("c: %i", c);
    for (int i = 0; i < 3; i++){
        printf("d[%i]: %i\n", i, d[i]);
    }
}

结果：

a: 0
b[0]: 0
b[1]: 0
b[2]: 0
d[0]: 0
d[1]: 2113180960
d[2]: 32596

能够看见Garbage Values，可能会出现安全问题。

当申明了一个指针数组，如果不初始化，会出现随机的地址。

#include <stdio.h>

int main(void){
    char * s[5];
    for (int i = 0; i < 5; i++){
        printf("%p\n", s[i]);
        // printf("%c", *s[i]); 如果是空指针会报错
    }
}

结果：

(nil)
(nil)
(nil)
(nil)
(nil)

int x和int *x：

共同点：都已经分配了内存地址，即&x都是Well-defined。真正的数据都是Garbage Value。
不同点：安全性不一样。int x的Garbage Value很有可能是0，不影响安全。int *x的Garbage Value很有可能是空指针，对*x赋值有可能运行时Error。
int *x通常和malloc配合，例如int *x = (int *) malloc(sizeof(int));。+

8. 传入参数

8.1 基本数据的指针

当传入基本数据的时候，在函数内部会创建新的指针，指向相同大小的数据（Pass By Value）。

在函数内部对实参做改变，不会影响函数外部变量。

如果想改变函数外部变量的值，可以传入外部变量的指针（Pass By Reference）。

#include <stdio.h>

void swap1(int a, int b);
void swap2(int *a, int *b);

int main(void){
    int a = 100;
    int b = 200;
    printf("a: %i, b: %i\n", a, b);
    swap1(a, b);
    printf("a: %i, b: %i\n", a, b);
    swap2(&a, &b);
    printf("a: %i, b: %i\n", a, b);
}

void swap1(int a, int b){
    int tmp = a;
    a = b;
    b = tmp;
}

void swap2(int *a, int *b){
    int tmp = *a;
    *a = *b;
    *b = tmp;
}

结果：

a: 100, b: 200
a: 100, b: 200
a: 200, b: 100

8.2 数组

如果您想要在函数中传递一个一维数组作为参数，您必须以下面三种方式来声明函数形式参数，这三种声明方式的结果是一样的，因为每种方式都会告诉编译器将要接收一个整型指针。同样地，您也可以传递一个多维数组作为形式参数。

指针：void myFunction(int *param)
已定义大小的数组：void myFunction(int param[10])
还可以通过其他参数来确定长度：void myFunction(int l, int param[l])
未定义大小的数组：void myFunction(int param[])

9. 返回参数

C 不支持在函数外返回局部变量的地址，除非定义局部变量为static变量。

C 语言不允许返回一个完整的数组作为函数的参数。但是，您可以通过指定不带索引的数组名来返回一个指向数组的指针。

int * myFunction()

10. Memory layout

Within our computer’s memory, different types of data that need to be stored for our program are organized into different sections:

The machine code section is our compiled program’s binary code. When we run our program, that code is loaded into memory.
Just below, or in the next part of memory, are global variables we declared in our program.
The heap section is an empty area from where malloc can get free memory for our program to use. As we call malloc, we start allocating memory from the top down.
The stack section is used by functions and local variables in our program as they are called, and grows upwards.

If we call malloc for too much memory, we will have a heap overflow, since we end up going past our heap. Or, if we call too many functions without returning from them, we will have a stack overflow, where our stack has too much memory allocated as well.

11. `scanf`

使用stdio.h库的函数scanf从Keyboard得到数据输入。

int scanf(const char *format, ...)：从标准输入stdin读取格式化输入。

format：这是 C 字符串，包含了以下各项中的一个或多个：空格字符、非空格字符 和 format 说明符。
format说明符形式为:
```
%[*][width][modifiers]type
```
附加参数：根据不同的format字符串，函数可能需要一系列的附加参数，每个参数包含了一个要被插入的值，替换了format参数中指定的每个%标签。参数的个数应与%标签的个数相同。

附加参数传入的都是指针，代表要被填入的数据的地址。

可以理解为printf()的反函数，并带有字符串解析能力。

#include <stdio.h>

int main(void){
    int Y;
    int M;
    int D;
    char s[256]; // Note that 256 is an arbitrary choice here
    printf("Input: ");
    scanf("%i-%i-%i Comment: %s", &Y, &M, &D, s);
    printf("Y: %i\n", Y);
    printf("M: %i\n", M);
    printf("D: %i\n", D);
    printf("Comment: %s\n", s);
}

结果：

Input: 2020-8-12 Comment: apple pine
Y: 2020
M: 8
D: 12
Comment: apple

对于字符串输入，有几点注意：

需要预设Buffer的大小，由于读取字符串通常在一个函数内完成，只会返回指针，因此这个Buffer的大小会在函数结束后释放。
要确保Buffer的大小不小于最大的输入长度，否则可能会出现Seg Fault。最好的方式是动态扩展内存的分配，cs50库就是这样实现的。
get_string from the CS50 library continuously allocates more memory as scanf reads in more characters, so it doesn’t have this issue.
%s会忽略开头的空白符，并以之后的第一个空白符结束。

12. 文件

使用stdio中的函数读取/写入文件。

12.1 变量类型

FILE：这是一个适合存储文件流信息的对象类型。

12.2 打开关闭文件

FILE *fopen(const char *filename, const char *mode)：使用给定的模式 mode 打开 filename 所指向的文件。该函数返回一个 FILE 指针。否则返回 NULL，且设置全局变量errno来标识错误。
- filename：字符串，表示要打开的文件名称。
- mode：字符串，表示文件的访问模式，可以是以下表格中的值：
  模式
  描述
  "r"
  打开一个用于读取的文件。该文件必须存在。
  "w"
  创建一个用于写入的空文件。如果文件名称与已存在的文件相同，则会删除已有文件的内容，文件被视为一个新的空文件。
  "a"
  追加到一个文件。写操作向文件末尾追加数据。如果文件不存在，则创建文件。
  "r+"
  打开一个用于更新的文件，可读取也可写入。该文件必须存在。
  "w+"
  创建一个用于读写的空文件。
  "a+"
  打开一个用于读取和追加的文件。
int fclose(FILE *stream)：关闭流stream。刷新所有的缓冲区。如果流成功关闭，则该方法返回零。如果失败，则返回 EOF。

12.3 设置流

int fseek(FILE *stream, long int offset, int whence)：设置流stream的文件位置为给定的偏移offset，参数offset意味着从给定的whence位置查找的字节数。
- stream：这是指向 FILE 对象的指针，该 FILE 对象标识了流。
- offset：这是相对whence的偏移量，以字节为单位。
- whence：这是表示开始添加偏移offset的位置。它一般指定为下列常量之一：
  常量
  描述
  SEEK_SET
  文件的开头
  SEEK_CUR
  文件指针的当前位置
  SEEK_END
  文件的末尾
void clearerr(FILE *stream)：清除给定流stream的文件结束和错误标识符。这不会失败，且不会设置外部变量errno，但是如果它检测到它的参数不是一个有效的流，则返回 -1，并设置errno为 EBADF。
int fflush(FILE *stream)：刷新流stream的输出缓冲区。

12.4 读写文件

int feof(FILE *stream)：测试给定流stream的文件结束标识符。当设置了与流关联的文件结束标识符时，该函数返回一个非零值，否则返回零。
通常用于测试是否读到了文件末尾（一个字节一个字节读的时候）。
int ferror(FILE *stream)：测试给定流stream的错误标识符。如果设置了与流关联的错误标识符，该函数返回一个非零值，否则返回一个零值。
int fflush(FILE *stream)：刷新流stream的输出缓冲区。如果成功，该函数返回零值。如果发生错误，则返回 EOF，且设置错误标识符（即 feof）。
long int ftell(FILE *stream)：返回给定流stream的当前文件位置。
通常配合fseek并设置到文件末尾，得到文件字节大小。
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)：从给定流stream读取数据到ptr所指向的数组中。
- ptr：这是指向带有最小尺寸 size*nmemb 字节的内存块的指针。
- size：这是要读取的每个元素的大小，以字节为单位。
- nmemb：这是元素的个数，每个元素的大小为size字节。
- stream：这是指向 FILE 对象的指针，该 FILE 对象指定了一个输入流。
成功读取的元素总数会以size_t对象返回，size_t对象是一个整型数据类型。如果总数与nmemb参数不同，则可能发生了一个错误或者到达了文件末尾。
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream)：把ptr所指向的数组中的数据写入到给定流stream中。
- ptr：这是指向要被写入的元素数组的指针。
- size：这是要被写入的每个元素的大小，以字节为单位。
- nmemb：这是元素的个数，每个元素的大小为size字节。
- stream：这是指向 FILE 对象的指针，该 FILE 对象指定了一个输出流。
int fprintf(FILE *stream, const char *format, ...)：发送格式化输出到流stream中。
int fscanf(FILE *stream, const char *format, ...)：从流stream读取格式化输入。
int fgetc(FILE *stream)：从指定的流stream获取下一个字符（一个无符号字符），并把位置标识符往前移动。
char *fgets(char *str, int n, FILE *stream)：从指定的流stream读取一行，并把它存储在 str所指向的字符串内。当读取n-1个字符时，或者读取到换行符时，或者到达文件末尾时，它会停止，具体视情况而定。读完会自动添加空字符（所以最多只能读n-1个）。
- str：这是指向一个字符数组的指针，该数组存储了要读取的字符串。
- n：这是要读取的最大字符数（包括最后的空字符）。通常是使用以str传递的数组长度。
如果成功，该函数返回相同的str参数。如果到达文件末尾或者没有读取到任何字符，str的内容保持不变，并返回一个空指针。如果发生错误，返回一个空指针。
int fputc(int char, FILE *stream)：把参数char指定的字符（一个无符号字符）写入到指定的流stream中，并把位置标识符往前移动。
int fputs(const char *str, FILE *stream)：把字符串写入到指定的流stream中，但不包括空字符。

12.6 操作文件

int remove(const char *filename)：删除给定的文件名filename，以便它不再被访问。
int rename(const char *old_filename, const char *new_filename)：把old_filename所指向的文件名改为new_filename。

12.5 示例

示例文件：

ubuntu@c-test-node:~/C/w4$ cat hello.txt 
Hi, This is Ray from Unimelb.
Nice to meet you.

读文件例子：

#include <stdio.h>
#include <stdlib.h>
 
int main()
{
   char *buffer = malloc(256);
 
   /* 打开文件用于读写 */
   FILE *fp = fopen("hello.txt", "r");

   /* 获取文件字节数 */
   fseek(fp, 0, SEEK_END);
   int len = ftell(fp);

   /* 查找文件的开头 */
   fseek(fp, 0, SEEK_SET);
 
   /* 读取并显示数据 */
   fread(buffer, len, 1, fp);
   buffer[len] = '\0';
   printf("%s\n", buffer);
   
   /* 关闭文件 */
   fflush(fp);
   fclose(fp);
   free(buffer);
}

结果：

Hi, This is Ray from Unimelb.
Nice to meet you.

写文件：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
 
int main()
{
   char buffer[] = "New line 1\n";
 
   /* 打开文件用于读写 */
   FILE *fp = fopen("hello.txt", "a");

   /* 查找文件的末尾 */
   fseek(fp, 0, SEEK_END);
 
   /* 读取并显示数据 */
   fwrite(buffer, strlen(buffer), 1, fp);
   fputs("New Line 2\n", fp);
   fprintf(fp, "New Line %d\n", 3);
   
   /* 关闭文件 */
   fflush(fp);
   fclose(fp);
}

结果：

ubuntu@c-test-node:~/C/w4$ cat hello.txt 
Hi, This is Ray from Unimelb.
Nice to meet you.
New line 1
New Line 2
New Line 3

复制文件：

#include <stdio.h>
#include <stdlib.h>
#define BUF_SIZE 1024

int copyFile(char* file1, char *file2){
   // 打开文件
   FILE *fp_src = fopen(file1, "r");
   FILE *fp_dest = fopen(file2, "w");
   if(fp_src == NULL || fp_dest == NULL) {
      return(-1);
   }

   // 调整光标
   fseek(fp_src, 0, SEEK_SET);
   
   // 设置缓冲区
   char buffer[BUF_SIZE];

   // 复制文件
   while (fgets (buffer, BUF_SIZE, fp_src) != NULL){
      fputs(buffer, fp_dest);
   }

   // 关闭文件
   fflush(fp_src);
   fclose(fp_src);
   fflush(fp_dest);
   fclose(fp_dest);
   return 0;
}
 
int main(int argc, char **argv){
   return copyFile(argv[1], argv[2]);
}

结果：

ubuntu@c-test-node:~/C/w4$ ./copyFile hello.txt hello2.txt
ubuntu@c-test-node:~/C/w4$ cat hello2.txt 
Hi, This is Ray from Unimelb.
Nice to meet you.
New line 1
New Line 2
New Line 3

12.6 注意

fread和fwrite函数的缓冲区指针并不是char，因此这两个函数和\0无关，缓冲区有什么，指定长度后，就会读写什么。缓冲区指针可以是任意类型，这对于读写结构化的文件很有帮助。
fread和fwrite函数的缓冲区指针可以是任意类型，目的只是告诉从哪个内存地址开始读/写，至于长度是是另外两个参数决定的。如果想让读写的长度是指定数据类型的整数倍，可以使用sizeof()方法规定Block大小。读写完毕后如何解释这些字节，由指针的类型决定。
fgets会自动添加\0，fputs会忽略\0。
一般需要判断FILE指针是否为NULL。

13. JPEG

Let’s look at a program that opens a file and tells us if it’s a JPEG file, a particular format for image files:

// Detects if a file is a JPEG
  
#include <stdint.h>
#include <stdio.h>
  
typedef uint8_t BYTE;
  
int main(int argc, char *argv[])
{
    // Check usage
    if (argc != 2)
    {
        return 1;
    }
  
    // Open file
    FILE *file = fopen(argv[1], "r");
    if (!file)
    {
        return 1;
    }
  
    // Read first three bytes
    BYTE bytes[3];
    fread(bytes, sizeof(BYTE), 3, file);
  
    // Check first three bytes
    if (bytes[0] == 0xff && bytes[1] == 0xd8 && bytes[2] == 0xff)
    {
        printf("Yes, possibly\n");
    }
    else
    {
        printf("No\n");
    }
  
    // Close file
    fclose(file);
}

First, we define a BYTE as 8 bits, so we can refer to a byte as a type more easily in C.
Then, we’ll read from a file with a function called fread.
We can compare the first three bytes (in hexadecimal) to the three bytes required to begin a JPEG file. If they’re the same, then our file is likely to be a JPEG file (though, other types of files may still begin with those bytes). But if they’re not the same, we know it’s definitely not a JPEG file.

It turns out that BMP files, another format for images, have even more bytes in its header, or beginning of the file.

We’ll learn more about these in this week’s problem set as well, and even implement our own version of image filters, like one that only shows the color red:

#include "helpers.h"
  
// Only let red through
void filter(int height, int width, RGBTRIPLE image[height][width])
{
    // Loop over all pixels
    for (int i = 0; i < height; i++)
    {
        for (int j = 0; j < width; j++)
        {
            image[i][j].rgbtBlue = 0x00;
            image[i][j].rgbtGreen = 0x00;
        }
    }
}

Here, we have a loop that iterates over all the pixels in a two-dimensional array, and sets the blue and green values to 0.

14. 解析二进制文件

14.1 `struct`的变量类型

struct变量并不是并不是指针，而是一堆基本数据和指针的集合，可以理解为特殊的基本数据。一个struct变量的比特大小是最大成员的整数倍，例如：

#include <stdint.h>
#include <stdio.h>

typedef uint8_t  BYTE;

typedef struct
{
    BYTE  rgbtBlue;
    BYTE  rgbtGreen;
    BYTE  rgbtRed;
    int i;
    char *s; 
} 
RGBTRIPLE;

void printSize(RGBTRIPLE c){
  	printf("%p\n", &c);
    printf("%lu\n", sizeof(c));
}

int main(void){
    RGBTRIPLE c1 = {65, 66,67, 68, "Apple"};
    printSize(c1);
  	printf("%p\n", &c1);
}

结果：

0x7fff6ff7d670
16
0x7fff6ff7d690

上面BYTE类型为1个字节，int类型为4字节，char*类型为8字节，一共是15个字节。因为必须是最大成员字节数（char*）的整数倍，所以这个Struct变量是16个字节。

可以发现，把这个Struct变量传入一个函数之后，仍然是16字节，因此传入的不是指针，是具体的数据（Pass By Value）。而且可以发现传入前和传入后的指针不同，只是数据内容一样。

15.2 `struct`数组

类似于基本数据数组，数组变量是指向第一个struct元素的指针。

int main(void){
    RGBTRIPLE cs[5];
    printf("%p\n", cs);
    printf("%p\n", cs + 1);
    printf("%p\n", cs + 2);
}

结果：

0x7ffe7e3b2270
0x7ffe7e3b2280
0x7ffe7e3b2290

可以发现，因为一个RGBTRIPLE对象的比特大小为16，因此RGBTRIPLE数组中每个元素的指针也相差16位。

如果是二维数组RGBTRIPLE *css[]，那么css是一个指针，指向第一个子数组的内存位置，css[i]都是指针，因此相邻元素的指针&css[i]（或css + i）相差8。css[i]指向具体的数据，因此&css[i][j]（或css[i] + j）相差16（Struct的字节大小）。

和基本数据一样，当Struct数组作为参数传递的时候，传递的是指针，因此在函数内部修改数据，会对函数体外造成影响。

15.3 解析二进制文件

不管是文本文件，还是二进制文件，本质上都是一连串的Byte。不管是何种文件，我们都必须要事先知道文件的结构，例如：

CSV文件，以,为分隔符，为换行符，通过这个规则可以把文件读成一个二维数组。

即便是二进制文件，也有都是有特定的结构，一般来说，至少有以下两个部分：

Header：通常Header的字节数是固定的，每一个标号的字节都有固定的意思。这些字节包含了数据的元信息和验证信息。可以按照既定的规则把比特转换成数据。
Data：通常Data部分的长度不是固定的，但是通常都是某种Pattern的重复出现。例如，图片就是以RBG三个字节为一组，重复出现。

TCP/UDP报文就属于这种格式。

由于C语言可以读取任意字节数目，以及按照任意的方式（即按照指针所规定的数据类型）去解析文件，使用stdio.h库中的fread()方法，可以很轻松的解析文件。

以BMP（Bitmap Image File，一种位压缩的图像文件）文件为例：

Header：BMP文件的Header有两部分。

BITMAPINFOHEADER：一共14个字节。
- bfType (2Byte uint16_t 类型)
- bfSize (4Byte uint32_t 类型)
- bfReserved1 (2Byte uint16_t 类型)
- bfReserved2 (2Byte uint16_t 类型)
- bfOffBits (4Byte uint32_t 类型)
BITMAPFILEHEADER：一共40字节。
- biSize (4Byte uint32_t 类型)
- biWidth (4Byte int32_t 类型)
- biHeight (4Byte int32_t 类型)
- biPlanes (2Byte uint16_t 类型)
- biBitCount (2Byte uint16_t 类型)
- biCompression (4Byte uint32_t 类型)
- biSizeImage (4Byte uint32_t 类型)
- biXPelsPerMeter (4Byte int32_t 类型)
- biYPelsPerMeter (4Byte int32_t 类型)
- biClrUsed (4Byte uint32_t 类型)
- biClrImportant (4Byte uint32_t 类型)

即然Header的字节数和每个字节的解释方式是确定的，那么我们可以创建相应的Struct对象，并传入其指针到fread()中来读取文件。注意到C中Struct对象变量并不是指针，而是基本数据的集合。

#include <stdint.h>
#include <stdio.h>

typedef uint8_t  BYTE;
typedef uint32_t DWORD;
typedef int32_t  LONG;
typedef uint16_t WORD;

typedef struct
{
    WORD   bfType;
    DWORD  bfSize;
    WORD   bfReserved1;
    WORD   bfReserved2;
    DWORD  bfOffBits;
} __attribute__((__packed__))
BITMAPFILEHEADER;

typedef struct
{
    DWORD  biSize;
    LONG   biWidth;
    LONG   biHeight;
    WORD   biPlanes;
    WORD   biBitCount;
    DWORD  biCompression;
    DWORD  biSizeImage;
    LONG   biXPelsPerMeter;
    LONG   biYPelsPerMeter;
    DWORD  biClrUsed;
    DWORD  biClrImportant;
} __attribute__((__packed__))
BITMAPINFOHEADER;

...
  
// Read infile's BITMAPFILEHEADER
  BITMAPFILEHEADER bf;
fread(&bf, sizeof(BITMAPFILEHEADER), 1, inptr);

// Read infile's BITMAPINFOHEADER
BITMAPINFOHEADER bi;
fread(&bi, sizeof(BITMAPINFOHEADER), 1, inptr);

这样我们就读取并解析了Header。

Data：BMP的数据是一系列的RGB的Triple，我们可以一次性读三个字节，因此，我们创建相应的Struct来接受数据。

typedef struct
{
    BYTE  rgbtBlue;
    BYTE  rgbtGreen;
    BYTE  rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;

由于Header中包含了图像的长和宽，因此我们可以用这些信息创建Buffer：

 // Get image's dimensions
int height = abs(bi.biHeight);
int width = bi.biWidth;

// Allocate memory for image
RGBTRIPLE(*image)[width] = calloc(height, width * sizeof(RGBTRIPLE));
if (image == NULL)
{
  printf("Not enough memory to store image.\n");
  fclose(outptr);
  fclose(inptr);
  return 7;
}

接下来依次读文件即可：

// Determine padding for scanlines
int padding = (4 - (width * sizeof(RGBTRIPLE)) % 4) % 4;

// Iterate over infile's scanlines
for (int i = 0; i < height; i++)
{
  // Read row into pixel array
  fread(image[i], sizeof(RGBTRIPLE), width, inptr);

  // Skip over padding
  fseek(inptr, padding, SEEK_CUR);
}

Header中通常还有验证信息，可以在读取数据前验证文件：

// Ensure infile is (likely) a 24-bit uncompressed BMP 4.0
if (bf.bfType != 0x4d42 || bf.bfOffBits != 54 || bi.biSize != 40 ||
    bi.biBitCount != 24 || bi.biCompression != 0)
{
  fclose(outptr);
  fclose(inptr);
  printf("Unsupported file format.\n");
  return 6;
}

类似的，可以把Header和数据重新写入文件。

// Write outfile's BITMAPFILEHEADER
fwrite(&bf, sizeof(BITMAPFILEHEADER), 1, outptr);

// Write outfile's BITMAPINFOHEADER
fwrite(&bi, sizeof(BITMAPINFOHEADER), 1, outptr);

// Write new pixels to outfile
for (int i = 0; i < height; i++)
{
  // Write row to outfile
  fwrite(image[i], sizeof(RGBTRIPLE), width, outptr);

  // Write padding at end of row
  for (int k = 0; k < padding; k++)
  {
    fputc(0x00, outptr);
  }
}

在读文件的时候，通常会使用malloc和calloc，需要在最后释放内存。

15.4 文件恢复

当用户把文件从硬盘删除的时候，并不是真正的删除，而是等效于：

删除文件的元文件信息，命名空间等等。
释放所占据的硬盘空间，允许其他文件覆盖这片硬盘区域。

而并没有真正的初始化这块空间（把每个字节重制为0x00）。因此，不小心误删数据的话是有可能可以恢复的，只要那块硬盘空间没有被新文件覆盖。

由于元文件信息都已经丢失，因此，为了寻找被删除的文件，只能解析硬盘，相当于解析一个庞大的二进制文件。如果丢失文件的类型已知，我们可以通过匹配Header信息，来寻找丢失的文件。如果发现一段字节和Header信息格式匹配，那么我们就（有可能）可以恢复其中一个文件。恢复完一个后，依次寻找下一个。

例如：恢复JPEG文件。

Even though JPEGs are more complicated than BMPs, JPEGs have “signatures,” patterns of bytes that can distinguish them from other file formats. Specifically, the first three bytes of JPEGs are

0xff 0xd8 0xff

from first byte to third byte, left to right. The fourth byte, meanwhile, is either 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, or 0xef. Put another way, the fourth byte’s first four bits are 1110.

Fortunately, digital cameras tend to store photographs contiguously on memory cards, whereby each photo is stored immediately after the previously taken photo. Accordingly, the start of a JPEG usually demarks the end of another. However, digital cameras often initialize cards with a FAT file system whose “block size” is 512 bytes (B). The implication is that these cameras only write to those cards in units of 512 B. A photo that’s 1 MB (i.e., 1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 “blocks” on a memory card. But so does a photo that’s, say, one byte smaller (i.e., 1,048,575 B)! The wasted space on disk is called “slack space.” Forensic investigators often look at slack space for remnants of suspicious data.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define BLOCK_SIZE 512

typedef uint8_t BYTE;

int main(int argc, char *argv[])
{
    // Open raw memory card
    FILE *fp_in = fopen(argv[1], "r");
    if (!fp_in){
        return 1;
    }

    // Configure output file
    int file_index = 0;
    char *file_name = malloc(14);
    FILE *fp_out = NULL;

    BYTE *buffer = malloc(BLOCK_SIZE);
    while (fread(buffer, 1, BLOCK_SIZE, fp_in) == BLOCK_SIZE){
        if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0){
            if (fp_out){
                fclose(fp_out);
            }
            file_index += 1;
            sprintf(file_name, "./jpg/%03i.jpg", file_index);
            fp_out = fopen(file_name, "w");
        }
        if (fp_out){
            fwrite(buffer, BLOCK_SIZE, 1, fp_out);
        }
    }
    if (fp_out){
        fclose(fp_out);  
    }
    fclose(fp_in);
    free(buffer);
    free(file_name);
    return 0;
}

15. `stdlib.h`

库变量：

size_t：size_t这是无符号整数类型，它是sizeof关键字的结果。

库宏：

NULL：NULL这个宏是一个空指针常量的值。

字符串转换函数：

序号

函数 & 描述

double atof(const char *str)：把参数 str 所指向的字符串转换为一个浮点数（类型为 double 型）。

int atoi(const char *str)：把参数 str 所指向的字符串转换为一个整数（类型为 int 型）。

long int atol(const char *str)：把参数 str 所指向的字符串转换为一个长整数（类型为 long int 型）。

double strtod(const char *str, char **endptr)：把参数 str 所指向的字符串转换为一个浮点数（类型为 double 型）。

long int strtol(const char *str, char **endptr, int base)：把参数 str 所指向的字符串转换为一个长整数（类型为 long int 型）。

unsigned long int strtoul(const char *str, char **endptr, int base)：把参数 str 所指向的字符串转换为一个无符号长整数（类型为 unsigned long int 型）。

分配内存空间函数：

序号

函数 & 描述

void *calloc(size_t nitems, size_t size)：分配所需的内存空间，并返回一个指向它的指针。

void free(void *ptr)：释放之前调用 calloc、malloc 或 realloc 所分配的内存空间。

void *malloc(size_t size)：分配所需的内存空间，并返回一个指向它的指针。

void *realloc(void *ptr, size_t size)：尝试重新调整之前调用 malloc 或 calloc 所分配的 ptr 所指向的内存块的大小。

数学：

序号

函数 & 描述

int abs(int x)：返回 x 的绝对值。

div_t div(int numer, int denom)：分子除以分母。

long int labs(long int x)：返回 x 的绝对值。

ldiv_t ldiv(long int numer, long int denom)：分子除以分母。

int rand(void)：返回一个范围在 0 到 RAND_MAX 之间的伪随机数。

void srand(unsigned int seed)：该函数播种由函数rand使用的随机数发生器。

PreviousWEEK 3 Algorithms NextWEEK5 Data Structures

Last updated 2 years ago

1. 十六进制

2. 内存地址和指针

2.1 *和&

2.2 指针与数组

2.3 指针的长度

2.4 命令行参数

2.5 空指针

2.6 指针的默认值

3. Strings

3.1 String的定义

3.2 指针的连续

3.3 字符串常量池

3.4 Mutable

4. 指针运算

5. 字符串的复制

6. valgrind

7. Garbage Values

8. 传入参数

8.1 基本数据的指针

8.2 数组

9. 返回参数

10. Memory layout

11. scanf

12. 文件

12.1 变量类型

12.2 打开关闭文件

12.3 设置流

12.4 读写文件

12.6 操作文件

12.5 示例

12.6 注意

13. JPEG

14. 解析二进制文件

14.1 struct的变量类型

15.2 struct数组

15.3 解析二进制文件

15.4 文件恢复

15. stdlib.h

2.1 `*`和`&`

6. `valgrind`

11. `scanf`

14.1 `struct`的变量类型

15.2 `struct`数组

15. `stdlib.h`