WEEK 2 Arrays

1. 编译

To those new to programming, this seems fairly simple. A naive compiler might read in every source file, translate everything into machine code, and write out an executable. This could work, but has two serious problems. First, for a large project, the computer may not have enough memory to read all of the source code at once. Second, if you make a change to a single source file, you would rather not have to recompile the entire application.

To deal with these problems, compilers break their job down into steps; for each source file (each .c file), the compiler reads the file, reads the files it references with #include, and translates it to machine code. The result of this is an ”object file” (.o). Once every object file is made, a ”linker” collects all of the object files and writes the actual program. This way, if you change one source file, only that file needs to be recompiled and then the application needs to be re-linked.

编译的流程：

Preprocessing：加载头文件（#include里的文件）（的所需要的Prototype，让接下来出现的函数/变量能够通过编译）。

Compiling：将源代码编译成汇编语言（ASM），汇编语言更细，对计算内存和CPU做出了更加细致的指令，可以理解为将一行代码转换成了具体的对于CPU和内存的操作。

不同的CPU会有不同的可识别的一系列操作指令。

...
main:                         # @main
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp0:
    .cfi_def_cfa_offset 16
.Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp2:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    xorl    %eax, %eax
    movl    %eax, %edi
    movabsq    $.L.str, %rsi
    movb    $0, %al
    callq    get_string
    movabsq    $.L.str.1, %rdi
    movq    %rax, -8(%rbp)
    movq    -8(%rbp), %rsi
    movb    $0, %al
    callq    printf
    ...

Notice that we see some recognisable strings, like main, get_string, and printf. But the other lines are instructions for basic operations on memory and values, closer to the binary instructions that a computer’s processor can directly understand.

Assembling：将汇编语言翻译成机器语言（0/1）。
It's important to note after discussing the basics that compilation is a ”one way street”. That is, compiling a C source file into machine code is easy, but ”decompiling” (turning machine code into the C source that creates it) is not. Decompilers for C do exist, but the code they create is hard to understand and only useful for reverse engineering
Linking：一个程序的源代码可能包含多个文件，需要讲不同文件的编译结果链接在一起。
The last step is linking, where previously compiled versions of libraries that we included earlier, like cs50.c, are actually combined with the compiled binary of our program. In other words, linking is the process of combining all the machine code for hello.c, cs50.c, and stdio.c into our one binary file, a.out or hello.

C语言常用的编译器有clang。

clang [args*] file

常见参数有：

-o：制定输出文件名。默认文件名为xxx.out。
-lxxx：连接库。一般直接接库名，例如-lcs50，-lm（Math）。

Automation: For large C projects, many programmers choose to automate compilation, both in order to reduce user interaction requirements and to speed up the process by only recompiling modified files. On UNIX-like systems, make and Makefiles are often used to accomplish the same. make is traditional and flexible and is available as one of the standard developer tools on most Unix and GNU distributions.

2. 内存

之前讲数据类型已经讲了各种数据类型的字节大小。

注意：A Boolean value can technically be represented with just a single bit, but for simplicity our computers use an entire byte.

Inside our computers, we have chips called RAM, random-access memory, that stores zeroes and ones. We can think of bytes stored in RAM as though they were in a grid, one after the other.

In reality, there are millions or billions of bytes per chip.

3. 数组

3.1 数组的定义

为什么要数组？数组里的每个元素在物理内存中是连续存在的，因此可以使用一个表示符就代表整个数组。

并且数组里每一个元素占据相同的物理内存大小，可以使用Indexing的方式找到相应位置的元素。

定义方式：

// 方式1
type name[num];
name[0] = ...;
name[1] = ...;

// 方式2
type name[] = {..., ..., ...};

// 方式3
type name[num] = {..., ..., ...}; // {}中的个数不可以超过num

通常使用循环遍历数组。

特别的，字符数组可以直接通过字符串来定义：

char myString[] = "Apple";

这也意味着，在C里面的字符串本质上都是字符数组，可以通过Indexing来便利每个字符。

注意：在C里面，数组变量和数组对象是绑定的，一旦定义，不能够更换（指针是固定的），单纯的指针变量可以储存的地址。

3.2 多维数组

C 语言支持多维数组。多维数组声明的一般形式如下：

type name[size1][size2]...[sizeN];

多维数组最简单的形式是二维数组。一个二维数组，在本质上，是一个一维数组的列表。声明一个x行y列的二维整型数组，形式如下：

type arrayName [x][y];

多维数组可以通过在括号内为每行指定值来进行初始化。下面是一个带有 3 行 4 列的数组。

int a[3][4] = {  
 {0, 1, 2, 3} ,   /*  初始化索引号为 0 的行 */
 {4, 5, 6, 7} ,   /*  初始化索引号为 1 的行 */
 {8, 9, 10, 11}   /*  初始化索引号为 2 的行 */
};

内部嵌套的括号是可选的，下面的初始化与上面是等同的：

int a[3][4] = {0,1,2,3,4,5,6,7,8,9,10,11};

二维数组中的元素是通过使用下标（即数组的行索引和列索引）来访问的。例如：

int val = a[2][3];

外层的数组其实储存的是指针。

4. 字符

4.1 字符的表示形式。

char类型即可以是数字形式也可以是字符形式：

#include <stdio.h>

int main(void){
    char c1 = 'A';
    char c2 = 66;
    printf("%c %c %i %i\n", c1, c2, c1, c2);
}

运行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./char
A B 65 66

默认的字符是\0，也就是8个0比特。

4.2 大小写转换

可以使用ASCII编码进行转换。

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void){
    string a = "123appLE456";
    for(int i = 0, l = strlen(a); i < l; i++){
        if (a[i] >= 'a' && a[i] <= 'z'){
            printf("%c", a[i] - 32);
        } 
        else if (a[i] >= 'A' && a[i] <= 'Z'){
            printf("%c", a[i] + 32);
        } 
        else {
            printf("%c", a[i]);
        }
    }
    printf("\n");

}

运行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./cases 
123APPle456

4.3 `ctype.h`

C 标准库的ctype.h头文件提供了一些函数，可用于测试和映射字符。

这些函数接受int作为参数，它的值必须是 EOF 或表示为一个无符号字符。

如果参数c满足描述的条件，则这些函数返回非零（True）。如果参数c不满足描述的条件，则这些函数返回零。

除了使用大小于符号，还可以使用ctype.h库里的函数：

isalnum - check whether a character is alphanumeric
isalpha - check whether a character is alphabetical
isdigit - check whether a character is a digit
islower - check whether a character is lowercase
isspace - check whether a character is whitespace
isupper - check whether a character is uppercase
tolower - convert a char to lowercase
toupper - convert a char to uppercase

5. 字符串

5.1 表示

字符串本质上就是char的数组，因此cs50.h里的String可以使用[]来遍历。

可以使用字符数组来表示字符串。

#include <stdio.h>

int main(void){
    char s[] = "ABC";
    printf("%c %c %i %i\n", s[0], s[1], s[0], s[1]);
}

运行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./string
A B 65 66

实际中，每一个String字面量都会以\0结束（空字符（Null character），又称结束符，缩写NUL），因此长度为n的字符串，有n + 1个字符。其他类型数组则没有这个特性。当然，如果显示指定字符数组，则也没有这个特性。

#include <stdio.h>

int main(void){
    char s[] = "ABC";
    printf("%c %c %i %i\n", s[0], s[3], s[0], s[3]);
}

执行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./string
A  65 0

之所以会以\0结束，是因为字符串的长度是动态的，系统并不知道一个字符串什么时候结束，因此需要使用\0。

5.2 比较

字符串的比较不能使用 ==，需要包含string.h头文件，然后使用函数：

int strcmp(const char *str1, const char *str2)：把str1所指向的字符串和str2所指向的字符串进行比较。

This function returns:

an int less than 0 if s1 comes before s2,
0 if s1 is the same as s2,
an int greater than 0 if s1 comes after s2.

The strings are compared using "ASCIIbetical" order, based on the ASCII values of their characters. For instance, "AAA" would come before "BBB", and "AAA" would also come before "aaa".

#include <stdio.h>
#include <cs50.h>
#include <string.h>

int main(void){
    string a = "apple";
    printf("%d\n", strcmp(a, "apple"));
    printf("%d\n", strcmp(a, "appLe"));
}

执行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./stringcomp 
0
32

5.3 遍历

方式1：使用定位符\0。

#include <cs50.h>
#include <stdio.h>

int main(void){
    char a[] = "apple";
    int i = 0;
    while (a[i] != '\0'){
        printf("%c ", a[i]);
        i++;
    }
    printf("\n");

}

运行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./iterString1 
a p p l e

也可以通过此方法获得字符串的长度。

方法2：使用string.h库里面的strlen()方法。

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void){
    string a = "apple";
    for(int i = 0, l = strlen(a); i < l; i++){
        printf("%c ", a[i]);
    }
    printf("\n");

}

运行结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./iterString2
a p p l e

5.4 `string.h`

string.h头文件定义了一个变量类型、一个宏和各种操作字符数组的函数。

变量：

size_t：这是无符号整数类型，它是sizeof关键字的结果。

宏：

NULL：这个宏是一个空指针常量的值。

函数：

序号

函数 & 目的

strcpy(s1, s2); 复制字符串 s2 到字符串 s1。

strcat(s1, s2); 连接字符串 s2 到字符串 s1 的末尾。

strlen(s1); 返回字符串 s1 的长度。

strcmp(s1, s2); 如果 s1 和 s2 是相同的，则返回 0；如果 s1<s2 则返回小于 0；如果 s1>s2 则返回大于 0。

strchr(s1, ch); 返回一个指针，指向字符串 s1 中字符 ch 的第一次出现的位置。

strstr(s1, s2); 返回一个指针，指向字符串 s1 中字符串 s2 的第一次出现的位置。

strcasestr(s1, s2); 返回一个指针，指向字符串 s1 中字符串 s2 的第一次出现的位置，大小写不敏感。

6. 命令行参数

改变main函数的参数即可。

一般是使用int main(int argc, char* argv[])，int main(int argc, char ** argv)，int main(int argc, string argv[])也可以。

argc：参数数量。
argv：具体参数。

#include <cs50.h>
#include <stdio.h>
int main(int argc, char* argv[]){
    for(int i = 0; i < argc; i++){
        printf("arg[%i]: %s\n", i, argv[i]);
    }
}

结果：

(base) ubuntu@hadoop-node-1:~/C/w2$ ./cliargs s1 s2 apple
arg[0]: ./cliargs
arg[1]: s1
arg[2]: s2
arg[3]: apple

实际上argv的长度并不代表实际长度，如果错误指定Index并不会报错。因此最好在使用argv先对argc做检查。

7. Exist Status

我们注意到main函数会返回一个int值。

通常0代表正常（默认），1代表错误。

不同的应用可以有不同的对于Exist Status的解释。

在命令行查看上次运行结果：

echo $?

PreviousWEEK 1 C NextWEEK 3 Algorithms

Last updated 2 years ago

1. 编译

2. 内存

3. 数组

3.1 数组的定义

3.2 多维数组

4. 字符

4.1 字符的表示形式。

4.2 大小写转换

4.3 ctype.h

5. 字符串

5.1 表示

5.2 比较

5.3 遍历

5.4 string.h

6. 命令行参数

7. Exist Status

4.3 `ctype.h`

5.4 `string.h`