钛节点

线程在线猛干，老李落泪回忆 --- 多线程系列（二）

发表@2021-05-17 09:27:34

更新@2023-04-27 20:11:53

大家好，我是文章格式越来越不修边幅、写法越来越随意的谢顶道人 --- 老李。最近有些人问老李，你是如何面对自己越来越大的年纪与自己越来越少的头发之间这种矛盾的，关于这件事儿，作为过来人的我自然是有一番心得的，不然怎么可能自称为道人？

往事不要再提

曾经有一天谢顶道人和他的老板原上草一起在树下等人。老板蓦然地嘬了一口香烟，看着远处刚刚走过去的小年轻，自顾自地说：你知道人类在未来几千年的进化中哪些地方将会最可能消失掉吗？

老李茫然的摇了摇头，眼神中放射出哈士奇般睿智的眼光。

![](https://ti-node.com/static/upload/PNP/a/img_66.png#width-full)

老板原上草嘴角一抽动，看着那些头发浓密的小年轻轻蔑地说道：根据达尔文进化学论，没有用的器官或部位会率先消失掉，比如头发！

老李更加茫然地摸了摸自己光头，在摸了一手脑油后赶紧把手又偷偷蹭了蹭裤子，眼神中的哈士奇光芒更加具备放射性了。

![](https://ti-node.com/static/upload/PNP/a/img_67.png#width-full)

原上草更加笃定地说：连人类影视作品中比人类先进的文明形象都是没有头发的，你感受一下。比如普罗米修斯里、比如外星人ET，哪个有头发？

![](https://ti-node.com/static/upload/PNP/a/img_68.png#width-full)

说完原上草倒换了一下夹着烟的右手，腾出来去掀了一下自己侧面最长的几缕头发试图掩盖一下头顶的空虚，看着老李笃定地说道：早进化早享受！

上篇整整叨逼叨了一整篇的《史记*Linux本记》，那你说这线程到底比进程能牛逼出多少啊，其实这个还是很容易就能测试出来的，都是提前进化早享受的道人了，写个最简单的测试还是问题不大的，线程切换与进程切换之间性能不好对比，创建还是相对来说很容易的。

下面两坨代码是分别创建500个进程与500个线程所需要耗费的时间，其中需要说明一下线程特意设置了未分离（不知道啥意思就不知道吧，老师说了理解不了先背过），其实这个也应该是默认项，代码可以粘贴走，只要不是Windows系统应该都能流畅跑起来。

```php
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/types.h>
#define MAX_LIMIT 500

void run();
long get_timestamp();

int main(int argc, char *argv[])
{
    int   counter, status;
    pid_t pid;
    long  begin_tick, end_tick;
    begin_tick = get_timestamp();
    for (counter = 1; counter <= MAX_LIMIT; counter++) {
        pid = fork();
        if (pid < 0) {
            printf("fork error\n");
            exit(-1);
        } else if (0 == pid) {
            run();
            exit(0);
        } else if (pid > 0) {
            waitpid(pid, &status, 0);
        }
    }
    end_tick = get_timestamp();
    printf("fork出%d个进程共需要%lu微秒\n", MAX_LIMIT, (end_tick - begin_tick));
    return 0;
}

void run()
{}

long get_timestamp()
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec*1000 + tv.tv_usec/1000;
}
```

```php
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/types.h>
#define MAX_LIMIT 500

void *run();
long get_timestamp();

int main(int argc, char *argv[])
{
    int            counter, process_thread_ret;
    long           begin_tick, end_tick;
    pthread_t      new_thread_id;
    pthread_attr_t new_thread_attr;
    pthread_attr_init(&new_thread_attr);
    pthread_attr_setdetachstate(&new_thread_attr, PTHREAD_CREATE_JOINABLE);
    begin_tick = get_timestamp();
    for (counter = 1; counter <= MAX_LIMIT; counter++) {
        process_thread_ret = pthread_create(&new_thread_id, &new_thread_attr, run, NULL);
        if (process_thread_ret != 0)
            printf("Create new thread error:%d\n", process_thread_ret);
        process_thread_ret = pthread_join(new_thread_id, NULL);
        if (process_thread_ret != 0)
            printf("Create new thread error:%d\n", process_thread_ret);
    }
    end_tick = get_timestamp();
    printf("fork出%d个线程共需要%lu微秒\n", MAX_LIMIT, (end_tick - begin_tick));
    pthread_attr_destroy(&new_thread_attr);
    return 0;
}

void *run()
{
    return ((void *)NULL);
}

long get_timestamp()
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec*1000 + tv.tv_usec/1000;
}
```

说实话，这两坨代码放在这里，整篇文章的篇幅瞬间就上来了、滚动条瞬间就被拉长了，我甚至都不想再接着写了。

上面代码跑一下，你们感受一下（PS 线程不叫fork）：

![](https://ti-node.com/static/upload/PNP/a/img_69.png#width-full)

就冲这个结果，难道你还没有强烈意愿去尝试一下多线程吗？你笑眯眯地看着我，平静地说道：这样的一个结果对于我们的价值是什么？

老李：啊，这样的结果还不够好吗？
你说：不，我不要你觉得，我要我觉得，你还没有经过过深入的思考。
老李：挖槽。。。这还不行？
你说：你的方案没有说服我，对于公司价值与生态落地的价值也没有得到体现。
老李：那我该怎么做...
你说：我不管你怎么做的，那是你的事情，如果你这点事情都做不到，那么你存在的价值是什么？
... ... ...
... ...
...
老李：
![](https://ti-node.com/static/upload/PNP/a/img_70.png#width-full)

尽管被你们PUA了这么久，但是还是要强忍着恶心，接着聊一下线程的创建与销毁，不过这里值得注意的是线程与线程之间不讲究什么大小父子规矩，全是平辈的，其次是一个进程中到底能创建多少个线程取决于系统的具体实现，所以我们先来尝试创建两个线程来做点儿事情，关键函数一共有这么几个：
一、pthread_create()
二、pthread_join()
三、pthread_exit()
四、gettid()

关键数据结构目前老李只说一个：
一、pthread_t

下面代码价值500万，其中注释至少值10块，无任何版权，你们可以直接复制张贴走：

```php
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef SYS_gettid
#error "你的系统上没有SYS_gettid，辣鸡"
#endif
#define gettid() ((pid_t)syscall(SYS_gettid))
/*
 * 下面代码就是搞出两个线程，然后两个线程分别做点儿什么事情....
 * 两个线程直接中直接运行run_for_nothing
 * run_for_nothing()实际上就是打空炮而已
 */
void * thread_creat_cb(void *);
void * thread_creat_cb1(void *);
void run_for_nothing();

pthread_t new_thread_id;
pthread_t new_thread_id1;

int main(int argc, char * argv[])
{
    int err;
    printf("主控制线程begin\n");
    // 严格意义上说，下面代码是有些问题的
    // 因为创建第一个线程成功后，第二个线程可能会创建失败
    // 这样第一个线程可能无法会被回收，程序就直接退出运行了
    // 你先能看明白再说...
    
    // 首先创建一个线程A
    err = pthread_create(&new_thread_id, NULL, thread_creat_cb, NULL);
    if (err != 0) {
        printf("创建线程失败：%d", err);
        exit(err);
    }
    // 然后再创建一个线程B
    err = pthread_create(&new_thread_id1, NULL, thread_creat_cb1, NULL);
    if (err != 0) {
        printf("创建线程失败：%d", err);
        exit(err);
    }

// 等待线程A结束，回收线程A资源
    err = pthread_join(new_thread_id, NULL);
    if (err != 0) {
        printf("Join线程失败：%d", err);
        exit(err);
    }
    // 等待线程B结束，回收线程A资源
    err = pthread_join(new_thread_id1, NULL);
    if (err != 0) {
        printf("Join线程失败：%d", err);
        exit(err);
    }

printf("主控制线程end\n");
    return 0;
}

void * thread_creat_cb(void * arg)
{
    run_for_nothing(5);
    return NULL;
}
void * thread_creat_cb1(void * arg)
{
    run_for_nothing(3);
    return((void *)0);
}
// 这个函数其实本质上啥也没干，就是在不断打空炮
void run_for_nothing(int i)
{
    int          counter, os_thread_id;
    pid_t        process_id;
    pthread_t    thread_id;
    process_id   = getpid();
    os_thread_id = gettid();
    thread_id    = pthread_self();
    for (counter = 1; counter <= i; counter++) {
        sleep(1);
        printf("pid:%lu, os_tid:%d, tid:%lu\n", (unsigned long)process_id, os_thread_id, (unsigned long)thread_id);
    }
}
```

如果不出意外，运行结果大概如下，你们感受一下：

![](https://ti-node.com/static/upload/PNP/a/img_71.png#width-full)

好了，下面需要分批次解释一下其中的关键问题了。

线程创建：pthread_create()完成线程的创建，原型如下
```php
int pthread_create(
  // 创建完成后，给thread赋值
  pthread_t *restrict thread, 
  // 创建线程时候，可以给线程提前设置一些属性
  const pthread_attr_t *restrict attr, 
  // 线程创建OK后，线程自动执行的函数，有点儿类似于回调on
  void *(*start_routine)(void *), 
  // 传递给第三个参数执行函数的参数
  void *restrict arg
);
```

在我们的demo中只用到了第一个和第三个，因为这样也能用也能跑起来，不要在意太多细节。创建成功后，函数会返回int类型的返回码，其中0是成功，大于0的时候各有各失败的原因。这里有一个值得注意的问题是就是这个返回错误码和Linux/UNIX传统中的errno，这是个使用习惯的问题，尽管errno是可以兼容多线程环境的（errno本来是全局，很久之前压根是不支持多线程的），不过一直以来在多线程环境中尽量不使用errno来表达pthread系列的返回值了，而是直接自己启用一个局部变量即可，也就是上述demo中的err。

线程ID：正规的线程ID是由pthread_t这个数据结构来表达的，但是这个结构在不同操作系统中完全就是不同的面目，在Linux下是个unsigned long，但是在其他系统下就说不好是个啥玩意，也正是因为如此如果你想比较两个线程ID是否相同的话，就绝对不能用==来搞，而只能用pthread_equal(thread_id1, thread_id2)来进行比较。

获取线程ID：pthread_self()可以获取当前线程的线程ID，不过你注意到了还有一个函数叫做gettid()吗？这个函数是Linux系统专属，和pthread_self()区别是什么呢？pthread_self()返回的是pthread库维护的threadID，而getpid()是返回的线程在操作系统Kernel层次的标识。如果你仔细观察下图：

![](https://ti-node.com/static/upload/PNP/a/img_71.png#width-full)

注意第二列os_tid，你会觉得这个数字和第一列的PID好接近啊，是的。你还记得上节课说过Linux下NPTL实际上的实现依然是进程么？对于LinuxKernel而言，线程和进程一样都同样使用了task_struct，线程你可以理解为特殊的进程（可以共享很多信息的进程），通过clone浅拷贝实现的；而第三列就是由用户层的pthread库维护的线程ID了。

僵尸线程：这个玩意的概念和僵尸进程是一样一样的。话说回来啊，我发现很多泥腿子们总是记不清僵尸进程和孤儿进程，总是搞混了，尤其是企图在面试官面前装B骗工资的时候，一紧张就更容易搞混淆说反，其实很好搞定的，你就记住了死了没人管尸体变硬了就是僵尸。僵尸线程产生的原因也是由于主控制线程只管堆不管埋、穿起来裤子不认人导致的，所以为了避免产生僵尸线程，在上述demo里我们通过pthread_join()来解决这个问题，这个函数的原型是：

```php
int pthread_join(pthread_t thread, void **retval);
```

pthread_join()会将当前主线程阻塞挂起一直等待第一个参数所代表的线程结束退出，如果第一个参数所代表的线程已经退出的话该函数就会立马执行完毕返回。其中第二参数可以用来接受退出线程的返回值，大概类似于线程要走了留了份遗书可以放到retval中去，如果你很无情地将第二个参数直接码死成NULL，恭喜所有的线程遗书直接跟着线程去西天。

说到线程去西天，就不得不说线程的退出问题了，这个问题需要细腻对待一下，因为线程退出涉及到三个极为需要注意的地方：
一、清理回收
二、线程分离、连接状态
三、主线程中的pthread_exit

实在太长了，我TM实在编不下去了，直接开（三）吧。