C语言中一种优雅的链表实现方式——Linus way

C语言中一种优雅的链表实现方式——Linus way

在一个ted的采访中Linus Torvalds, 大约在14:20处,他快速的谈论了什么是编程的。其中他以链変举例,其中他提到了使用了二级指针,实现了一个链表的操作方式。

这方法我之前确实还不会,本来想自己写一篇博文,但是发现github上mkirchner已有一篇博文介绍了这个方法,作者TaivasJumala对其进行了翻译。因此就不再写,只是mark一下。

TL;DR

直接贴出Linus认为的优雅方式,方便日后查阅:

struct list_item {
int value;
struct list_item *next;
};
typedef struct list_item list_item;

void remove_elegant(list *l, list_item *target)
{
list_item **p = &l->head;
while (*p != target)
p = &(*p)->next;
*p = target->next;
}

备份

个人fork

英语原文备份:

Linked lists, pointer tricks and good taste

Introduction

In a 2016 [TED interview][ted] (14:10) Linus Torvalds speaks about what he
considers good taste in coding. As an example, he presents two
implementations of item removal in singly linked lists (reproduced below). In
order to remove the first item from a list, one of the implementations requires
a special case, the other one does not. Linus, obviously, prefers the latter.

His comment is:

[…] I don’t want you to understand why it doesn’t have the if statement.
But I want you to understand that sometimes you can see a problem in a
different way and rewrite it so that a special case goes away and becomes the
normal case, and that’s good code. […] – L. Torvalds

The code snippets he presents are C-style pseudocode and are simple enough to
follow. However, as Linus mentions in the comment, the snippets lack a
conceptual explanation and it is not immediately evident how the more elegant
solution actually works.

The next two sections look at the technical approach in detail and demonstrate
how and why the indirect addressing approach is so neat. The last section
extends the solution from item deletion to insertion.

The code

The basic data structure for a singly linked list of integers is shown in
Figure 1: Singly linked list of integers.

Numbers are arbitrarily chosen integer values and arrows indicate pointers.
head is a pointer of type list_item * and each of the boxes
is an instance of an list_item struct, each with a member variable (called
next in the code) of type list_item * that points to the next item.

The C implementation of the data structure is:

struct list_item {
int value;
struct list_item *next;
};
typedef struct list_item list_item;

struct list {
struct list_item *head;
};
typedef struct list list;

We also include a (minimal) API:

/* The textbook version */
void remove_cs101(list *l, list_item *target);
/* A more elegant solution */
void remove_elegant(list *l, list_item *target);

With that in place, let’s have a look at the implementations of
remove_cs101() and remove_elegant(). The code of these examples is true
to the pseudocode from Linus’ example and also compiles and runs.

The CS101 version

Figure 2: The conceptual model for the list data structure in the CS101 algorithm.

void remove_cs101(list *l, list_item *target)
{
list_item *cur = l->head, *prev = NULL;
while (cur != target) {
prev = cur;
cur = cur->next;
}
if (prev)
prev->next = cur->next;
else
l->head = cur->next;
}

The standard CS101 approach makes use of two traversal pointers cur and
prev, marking the current and previous traversal position in the list,
respectively. cur starts at the list head head, and advances until the
target is found. prev starts at NULL and is subsequently updated with the
previous value of cur every time cur advances. After the target is found,
the algorithm tests if prev is non-NULL. If yes, the item is not at the
beginning of the list and the removal consists of re-routing the linked list
around cur. If prev is NULL, cur is pointing to the first element in
the list, in which case, removal means moving the list head forward.

A more elegant solution

The more elegant version has less code and does not require a separate branch
to deal with deletion of the first element in a list.

void remove_elegant(list *l, list_item *target)
{
list_item **p = &l->head;
while (*p != target)
p = &(*p)->next;
*p = target->next;
}

The code uses an indirect pointer p that holds the address of a pointer to a
list item, starting with the address of head. In every iteration, that
pointer is advanced to hold the address of the pointer to the next list item,
i.e. the address of the next element in the current list_item.
When the pointer to the list item *p equals target, we exit the search
loop and remove the item from the list.

How does it work?

The key insight is that using an indirect pointer p has two conceptual
benefits:

  1. It allows us to interpret the linked list in a way that makes the head
    pointer an integral part the data structure. This eliminates the need
    for a special case to remove the first item.
  2. It also allows us to evaluate the condition of the while loop without
    having to let go of the pointer that points to target. This allows us to
    modify the pointer that points to target and to get away with a single
    iterator as opposed to prev and cur.

Let’s look each of these points in turn.

Integrating the head pointer

The standard model interprets the linked list as a sequence of list_item
instances. The beginning of the sequence can be accessed through a head
pointer. This leads to the conceptual model illustrated in Figure 2 above. The head pointer is
merely considered as a handle to access the start of the list. prev and cur
are pointers of type list_item * and always point to an item or NULL.

The elegant implementation uses indirect addressing scheme that yields a different
view on the data structure:

Figure 3: The conceptual model for the list data structure in the more elegant approach.

Here, p is of type list_item ** and holds the address of the pointer to
the current list item. When we advance the pointer, we forward to the address
of the pointer to the next list item.

In code, this translates to p = &(*p)->next, meaning we

  1. (*p): dereference the address to the pointer to the current list item
  2. ->next: dereference that pointer again and select the field that holds
    the address of the next list item
  3. &: take the address of that address field (i.e. get a pointer to it)

This corresponds to an interpretation of the data structure where the list is a
a sequence of pointers to list_items (cf. Figure 3).

Maintaining a handle

An additional benefit of that particular interpretation is that it supports
editing the next pointer of the predecessor of the current item throughout the
entire traversal.

With p holding the address of a pointer to a list item, the comparison in the
search loop becomes

while (*p != target)

The search loop will exit if *p equals target, and once it does, we are
still able to modify *p since we hold its address p. Thus, despite
iterating the loop until we hit target, we still maintain a handle (the
address of the next field or the head pointer) that can be used to directly
modify the pointer that points to the item.

This is the reason why we can modify the incoming pointer to an item to point
to a different location using *p = target->next and why we do not need prev
and cur pointers to traverse the list for item removal.

Going beyond

It turns out that the idea behind remove_elegant() can be applied to yield a
particularly concise implementation of another function in the list API:
insert_before(), i.e. inserting a given item before another one.

Inserting before existing items

First, let’s add the following declaration to the list API in list.h:

void insert_before(list *l, list_item *before, list_item *item);

The function will take a pointer to a list l, a pointer before to an
item in that list and a pointer to a new list item item that the function
will insert before before.

Quick refactor

Before we move on, we refactor the search loop into a separate
function


static inline list_item **find_indirect(list *l, list_item *target)
{
list_item **p = &l->head;
while (*p != target)
p = &(*p)->next;
return p;
}

and use that function in remove_elegant() like so

void remove_elegant(list *l, list_item *target)
{
list_item **p = find_indirect(l, target);
*p = target->next;
}

Implementing insert_before()

Using find_indirect(), it is straightforward to implement insert_before():

void insert_before(list *l, list_item *before, list_item *item)
{
list_item **p = find_indirect(l, before);
*p = item;
item->next = before;
}

A particularly beautiful outcome is that the implementation has consistent
semantics for the edge cases: if before points to the list head, the new item
will be inserted at the beginning of the list, if before is NULL or invalid
(i.e. the item does not exist in l), the new item will be appended at the
end.

Conclusion

The premise of the more elegant solution for item deletion is a single, simple
change: using an indirect list_item ** pointer to iterate over the pointers
to the list items. Everything else flows from there: there is no need for a
special case or branching and a single iterator is sufficient to find and
remove the target item.
It also turns out that the same approach provides an elegant solution for item
insertion in general and for insertion before an existing item in particular.

So, going back to Linus’ initial comment: is it good taste? Hard to say, but
it’s certainly a different, creative and very elegant solution to a well-known
CS task.