在计算进程D状态持续时间及等IO的时间遇到的一处问题
一、背景
之前的博客 线程每次iodelay监控及D状态开始和结束监控并做堆栈记录-CSDN博客 里,我们讲到了通过内核模块抓取D状态的进程和等IO事件的方法,里面也用到了通过获取rq的symbol,再去获取rq里的rq_clock_task时间的方法(内核模块里访问struct rq及获取rq_clock_task时间的方法-CSDN博客),通过获取到rq_clock_task时间,可以在同一个时间轴上去判断和计算D状态的持续时间和等IO的持续时间。但是会遇到如下的情况,即抓出的等io的时间或者D状态的时间是一个很大的数值,如下图:

这个数其实就是long类型的负数,为什么会造成这种情况,我们会在第二章里做实验,然后得出结果。然后在第三章里,改造这个抓取D状态持续时间和等io时间的内核模块。
二、进一步实验及原因分析
如第一章里所说,异常时,这个数值其实就是一个long类型的负数,取绝对值后,其值并不大,所以,减数和被减数在时间轴上的位置并不很远。而之前的博客 与调度相关的内核时间接口的分析及实现介绍_中断里的逻辑耗时超过500us-CSDN博客 里,我们讲到了rq_clock_task接口获取到的时间的原理,它是一个单调递增的时间,但是它只是相对于每个核心上,它是单调递增的,另外,rq_clock_task的更新并不像local_clock那样每次返回回来基本都会增加,rq_clock_task的增加的节奏是由调度逻辑被动触发的,并不是实时的。
2.1 尝试使用local_clock来替代my_rq_clock_task来作为被减数
实验方式如下:

虽然这样并不容易复现减数大于被减数的情况,但是运行很久时间以后,仍然还是复现了:

另外,从原理上,它也是不成立的,因为不同的核之间的local_clock在时间轴上的比如时间A和时间B这两个时刻,local_clock并不能确保靠后的时间B这个时刻上的所有的核心上的local_clock数值都比时间A时刻的所有核心上的数值都高。更多的这块细节见之前的博客 与调度相关的内核时间接口的分析及实现介绍_中断里的逻辑耗时超过500us-CSDN博客。
另外,在之前的博客 内核模块里访问struct rq及获取rq_clock_task时间的方法-CSDN博客 里,我们也讲到,如果直接用local_clock来作为被减数来进行运算在某些内核选项打开下是不行的。我们还是得用rq_clock_task这个时间轴来作为减数。
2.1 增加运行的核的判断,如果发生任务迁移则不打印
如下修改:

判断是否上一次运行的核心和进行唤醒时的核心是否不一样,如果不一样则不打印。
这样修改之后,就不在出现异常的打印了。基本也符合理论上的解释,因为不同的核心的rq_clock_task的更新节奏是不一样的,且更新的时间轴也是不一样的;但是同一个核心上rq_clock_task则是单调递增的。
三、改造后的抓取D状态和等io时间的内核模块
我们根据 2.1 里的实验的结果,修改一下被减数,从my_rq_clock_task改成任务原运行核心上的rq_clock_task:

如上图修改,修改了被减数,变成了原任务核心上的rq_clock_task数值,另外,去掉了是否发生迁移的检查,如果发生减数大于被减数的情况则进行打印,可以发现长时间运行后并没有异常情况出现了。
修改过后的内核模块完整源码如下:
#include <linux/module.h>
#include <linux/capability.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/proc_fs.h>
#include <linux/ctype.h>
#include <linux/seq_file.h>
#include <linux/poll.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#include <linux/errno.h>
#include <linux/stddef.h>
#include <linux/lockdep.h>
#include <linux/kthread.h>
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/wait.h>
#include <linux/init.h>
#include <asm/atomic.h>
#include <trace/events/workqueue.h>
#include <linux/sched/clock.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/tracepoint.h>
#include <trace/events/osmonitor.h>
#include <trace/events/sched.h>
#include <trace/events/irq.h>
#include <trace/events/kmem.h>
#include <linux/ptrace.h>
#include <linux/uaccess.h>
#include <asm/processor.h>
#include <linux/sched/task_stack.h>
#include <linux/nmi.h>
#include <asm/apic.h>
#include <linux/version.h>
#include <linux/sched/mm.h>
#include <asm/irq_regs.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/stop_machine.h>MODULE_LICENSE("GPL");
MODULE_AUTHOR("zhaoxin");
MODULE_DESCRIPTION("Module for monitor D tasks.");
MODULE_VERSION("1.0");#define IODELAY_TRACEPOINT_ENABLE#define TEST_STACK_TRACE_ENTRIES 32typedef unsigned int (*stack_trace_save_tsk_func)(struct task_struct *task,unsigned long *store, unsigned int size,unsigned int skipnr);
stack_trace_save_tsk_func _stack_trace_save_tsk;typedef int (*get_cmdline_func)(struct task_struct *task, char *buffer, int buflen);
get_cmdline_func _get_cmdline_func;#define TESTDIOMONITOR_SAMPLEDESC_SWDSTART "swDstart"
#define TESTDIOMONITOR_SAMPLEDESC_WADSTOP "waDstop"
#define TESTDIOMONITOR_SAMPLEDESC_SWDIOSTART "swDiostart"
#define TESTDIOMONITOR_SAMPLEDESC_WADIOSTOP "waDiostop"
#define TESTDIOMONITOR_SAMPLEDESC_DEXCEED "Dexceed"
#define TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED "Dioexceed"
#define TESTDIOMONITOR_SAMPLEDESC_IOEXCEED "Ioexceed"// 1ms
//#define TESTDIOMONITOR_DEXCEED_THRESHOLD 1000ull//1000000ullstruct uclamp_bucket {unsigned long value : bits_per(SCHED_CAPACITY_SCALE);unsigned long tasks : BITS_PER_LONG - bits_per(SCHED_CAPACITY_SCALE);
};struct uclamp_rq {unsigned int value;struct uclamp_bucket bucket[UCLAMP_BUCKETS];
};/* CFS-related fields in a runqueue */
struct cfs_rq {struct load_weight load;unsigned int nr_running;unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */unsigned int idle_nr_running; /* SCHED_IDLE */unsigned int idle_h_nr_running; /* SCHED_IDLE */u64 exec_clock;u64 min_vruntime;
#ifdef CONFIG_SCHED_COREunsigned int forceidle_seq;u64 min_vruntime_fi;
#endif#ifndef CONFIG_64BITu64 min_vruntime_copy;
#endifstruct rb_root_cached tasks_timeline;/** 'curr' points to currently running entity on this cfs_rq.* It is set to NULL otherwise (i.e when none are currently running).*/struct sched_entity *curr;struct sched_entity *next;struct sched_entity *last;struct sched_entity *skip;#ifdef CONFIG_SCHED_DEBUGunsigned int nr_spread_over;
#endif#ifdef CONFIG_SMP/** CFS load tracking*/struct sched_avg avg;
#ifndef CONFIG_64BITu64 last_update_time_copy;
#endifstruct {raw_spinlock_t lock ____cacheline_aligned;int nr;unsigned long load_avg;unsigned long util_avg;unsigned long runnable_avg;} removed;#ifdef CONFIG_FAIR_GROUP_SCHEDunsigned long tg_load_avg_contrib;long propagate;long prop_runnable_sum;/** h_load = weight * f(tg)** Where f(tg) is the recursive weight fraction assigned to* this group.*/unsigned long h_load;u64 last_h_load_update;struct sched_entity *h_load_next;
#endif /* CONFIG_FAIR_GROUP_SCHED */
#endif /* CONFIG_SMP */#ifdef CONFIG_FAIR_GROUP_SCHEDstruct rq *rq; /* CPU runqueue to which this cfs_rq is attached *//** leaf cfs_rqs are those that hold tasks (lowest schedulable entity in* a hierarchy). Non-leaf lrqs hold other higher schedulable entities* (like users, containers etc.)** leaf_cfs_rq_list ties together list of leaf cfs_rq's in a CPU.* This list is used during load balance.*/int on_list;struct list_head leaf_cfs_rq_list;struct task_group *tg; /* group that "owns" this runqueue *//* Locally cached copy of our task_group's idle value */int idle;#ifdef CONFIG_CFS_BANDWIDTHint runtime_enabled;s64 runtime_remaining;u64 throttled_pelt_idle;
#ifndef CONFIG_64BITu64 throttled_pelt_idle_copy;
#endifu64 throttled_clock;u64 throttled_clock_pelt;u64 throttled_clock_pelt_time;int throttled;int throttle_count;struct list_head throttled_list;
#ifdef CONFIG_SMPstruct list_head throttled_csd_list;
#endif
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
};struct rt_prio_array {DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); /* include 1 bit for delimiter */struct list_head queue[MAX_RT_PRIO];
};/* Real-Time classes' related field in a runqueue: */
struct rt_rq {struct rt_prio_array active;unsigned int rt_nr_running;unsigned int rr_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHEDstruct {int curr; /* highest queued rt task prio */
#ifdef CONFIG_SMPint next; /* next highest */
#endif} highest_prio;
#endif
#ifdef CONFIG_SMPunsigned int rt_nr_migratory;unsigned int rt_nr_total;int overloaded;struct plist_head pushable_tasks;#endif /* CONFIG_SMP */int rt_queued;int rt_throttled;u64 rt_time;u64 rt_runtime;/* Nests inside the rq lock: */raw_spinlock_t rt_runtime_lock;#ifdef CONFIG_RT_GROUP_SCHEDunsigned int rt_nr_boosted;struct rq *rq;struct task_group *tg;
#endif
};/* Deadline class' related fields in a runqueue */
struct dl_rq {/* runqueue is an rbtree, ordered by deadline */struct rb_root_cached root;unsigned int dl_nr_running;#ifdef CONFIG_SMP/** Deadline values of the currently executing and the* earliest ready task on this rq. Caching these facilitates* the decision whether or not a ready but not running task* should migrate somewhere else.*/struct {u64 curr;u64 next;} earliest_dl;unsigned int dl_nr_migratory;int overloaded;/** Tasks on this rq that can be pushed away. They are kept in* an rb-tree, ordered by tasks' deadlines, with caching* of the leftmost (earliest deadline) element.*/struct rb_root_cached pushable_dl_tasks_root;
#elsestruct dl_bw dl_bw;
#endif/** "Active utilization" for this runqueue: increased when a* task wakes up (becomes TASK_RUNNING) and decreased when a* task blocks*/u64 running_bw;/** Utilization of the tasks "assigned" to this runqueue (including* the tasks that are in runqueue and the tasks that executed on this* CPU and blocked). Increased when a task moves to this runqueue, and* decreased when the task moves away (migrates, changes scheduling* policy, or terminates).* This is needed to compute the "inactive utilization" for the* runqueue (inactive utilization = this_bw - running_bw).*/u64 this_bw;u64 extra_bw;/** Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM* tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).*/u64 max_bw;/** Inverse of the fraction of CPU utilization that can be reclaimed* by the GRUB algorithm.*/u64 bw_ratio;
};struct rq {/* runqueue lock: */raw_spinlock_t __lock;/** nr_running and cpu_load should be in the same cacheline because* remote CPUs use both these fields when doing load calculation.*/unsigned int nr_running;
#ifdef CONFIG_NUMA_BALANCINGunsigned int nr_numa_running;unsigned int nr_preferred_running;unsigned int numa_migrate_on;
#endif
#ifdef CONFIG_NO_HZ_COMMON
#ifdef CONFIG_SMPunsigned long last_blocked_load_update_tick;unsigned int has_blocked_load;call_single_data_t nohz_csd;
#endif /* CONFIG_SMP */unsigned int nohz_tick_stopped;atomic_t nohz_flags;
#endif /* CONFIG_NO_HZ_COMMON */#ifdef CONFIG_SMPunsigned int ttwu_pending;
#endifu64 nr_switches;#ifdef CONFIG_UCLAMP_TASK/* Utilization clamp values based on CPU's RUNNABLE tasks */struct uclamp_rq uclamp[UCLAMP_CNT] ____cacheline_aligned;unsigned int uclamp_flags;
#define UCLAMP_FLAG_IDLE 0x01
#endifstruct cfs_rq cfs;struct rt_rq rt;struct dl_rq dl;#ifdef CONFIG_FAIR_GROUP_SCHED/* list of leaf cfs_rq on this CPU: */struct list_head leaf_cfs_rq_list;struct list_head *tmp_alone_branch;
#endif /* CONFIG_FAIR_GROUP_SCHED *//** This is part of a global counter where only the total sum* over all CPUs matters. A task can increase this counter on* one CPU and if it got migrated afterwards it may decrease* it on another CPU. Always updated under the runqueue lock:*/unsigned int nr_uninterruptible;struct task_struct __rcu *curr;struct task_struct *idle;struct task_struct *stop;unsigned long next_balance;struct mm_struct *prev_mm;unsigned int clock_update_flags;u64 clock;/* Ensure that all clocks are in the same cache line */u64 clock_task ____cacheline_aligned;u64 clock_pelt;unsigned long lost_idle_time;atomic_t nr_iowait;#ifdef CONFIG_SCHED_DEBUGu64 last_seen_need_resched_ns;int ticks_without_resched;
#endif#ifdef CONFIG_MEMBARRIERint membarrier_state;
#endif#ifdef CONFIG_SMPstruct root_domain *rd;struct sched_domain __rcu *sd;unsigned long cpu_capacity;unsigned long cpu_capacity_orig;struct callback_head *balance_callback;unsigned char nohz_idle_balance;unsigned char idle_balance;unsigned long misfit_task_load;/* For active balancing */int active_balance;int push_cpu;struct cpu_stop_work active_balance_work;/* CPU of this runqueue: */int cpu;int online;struct list_head cfs_tasks;struct sched_avg avg_rt;struct sched_avg avg_dl;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQstruct sched_avg avg_irq;
#endif
#ifdef CONFIG_SCHED_THERMAL_PRESSUREstruct sched_avg avg_thermal;
#endifu64 idle_stamp;u64 avg_idle;unsigned long wake_stamp;u64 wake_avg_idle;/* This is used to determine avg_idle's max value */u64 max_idle_balance_cost;#ifdef CONFIG_HOTPLUG_CPUstruct rcuwait hotplug_wait;
#endif
#endif /* CONFIG_SMP */#ifdef CONFIG_IRQ_TIME_ACCOUNTINGu64 prev_irq_time;
#endif
#ifdef CONFIG_PARAVIRTu64 prev_steal_time;
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTINGu64 prev_steal_time_rq;
#endif/* calc_load related fields */unsigned long calc_load_update;long calc_load_active;#ifdef CONFIG_SCHED_HRTICK
#ifdef CONFIG_SMPcall_single_data_t hrtick_csd;
#endifstruct hrtimer hrtick_timer;ktime_t hrtick_time;
#endif#ifdef CONFIG_SCHEDSTATS/* latency stats */struct sched_info rq_sched_info;unsigned long long rq_cpu_time;/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? *//* sys_sched_yield() stats */unsigned int yld_count;/* schedule() stats */unsigned int sched_count;unsigned int sched_goidle;/* try_to_wake_up() stats */unsigned int ttwu_count;unsigned int ttwu_local;
#endif#ifdef CONFIG_CPU_IDLE/* Must be inspected within a rcu lock section */struct cpuidle_state *idle_state;
#endif#ifdef CONFIG_SMPunsigned int nr_pinned;
#endifunsigned int push_busy;struct cpu_stop_work push_work;#ifdef CONFIG_SCHED_CORE/* per rq */struct rq *core;struct task_struct *core_pick;unsigned int core_enabled;unsigned int core_sched_seq;struct rb_root core_tree;/* shared state -- careful with sched_core_cpu_deactivate() */unsigned int core_task_seq;unsigned int core_pick_seq;unsigned long core_cookie;unsigned int core_forceidle_count;unsigned int core_forceidle_seq;unsigned int core_forceidle_occupation;u64 core_forceidle_start;
#endif
};// runqueues (not export symbol)
struct rq* _prq = NULL;struct rq* my_cpu_rq(int i_cpu)
{return per_cpu_ptr(_prq, i_cpu);
}u64 my_rq_clock_task(void)
{struct rq* prq = my_cpu_rq(smp_processor_id());return prq->clock_task;
}typedef struct testdiomonitor_sample {struct timespec64 time;int cpu;int pid;int tgid;int ppid;char comm[TASK_COMM_LEN];char ppidcomm[TASK_COMM_LEN];// 0 or 1int bin_iowait;/** "swDstart" // 在sched_switch里* "waDstop" // 在sched_waking里* "swDiostart" // 在sched_switch里* "waDiostop" // 在sched_waking里* "Dexceed" // 超出阈值,非iowait* "Dioexceed" // 超出阈值,iowait*/const char* desc;u64 dtimens; // 纳秒单位,D状态持续的时间u64 iowaittimens; // 纳秒单位,等待io的时间int stackn;void* parray_stack[TEST_STACK_TRACE_ENTRIES];u32 writedone; // 0 or 1
} testdiomonitor_sample;#define TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT 8192typedef struct testdiomonitor_sample_ringbuff {testdiomonitor_sample* parray_sample;volatile u64 wp; // Index is wp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1).volatile u64 rp; // Index is rp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1).u32 skipcount; // 0 means no skip any abnormal event
} testdiomonitor_sample_ringbuff;#define TESTDIOMONITOR_LINEBUFF 1024typedef struct testdiomonitor_env {struct file* file;char file_linebuff[TESTDIOMONITOR_LINEBUFF];int headoffset;loff_t file_pos;testdiomonitor_sample_ringbuff ringbuff;
} testdiomonitor_env;static testdiomonitor_env _env;static struct delayed_work work_write_file;
static struct workqueue_struct *wq_write_file;#define FILENAME "test.txt"void init_file(void)
{_env.file = filp_open(FILENAME, O_WRONLY | O_CREAT | O_TRUNC, 0644);if (IS_ERR(_env.file)) {_env.file = NULL;}
}void exit_file(void)
{if (_env.file) {filp_close(_env.file, NULL);}
}void testdiomonitor_write_file(char* i_pchar, int i_size)
{if (_env.file) {kernel_write(_env.file, i_pchar, i_size, &_env.file_pos);}
}void testdiomonitor_write_file_emptyline(void)
{testdiomonitor_write_file("\n", strlen("\n"));
}void testdiomonitor_file_oneline(const char* i_format, ...)
{char* pcontent = &_env.file_linebuff[_env.headoffset];va_list args;va_start(args, i_format);vsnprintf(pcontent, TESTDIOMONITOR_LINEBUFF - _env.headoffset, i_format, args);va_end(args);testdiomonitor_write_file(_env.file_linebuff, strlen(_env.file_linebuff));
}void testdiomonitor_replace_null_with_space(char *str, int n) {for (int i = 0; i < n - 1; i++) {if (str[i] == '\0') {str[i] = ' ';}}
}void testdiomonitor_set_cmdline(char* i_pbuff, int i_buffsize, struct task_struct* i_ptask)
{int ret = _get_cmdline_func(i_ptask, i_pbuff, i_buffsize);if (ret <= 0) {i_pbuff[0] = '\0';return;}testdiomonitor_replace_null_with_space(i_pbuff, ret);i_pbuff[ret - 1] = '\0';
}void testdiomonitor_checkget_parentinfo_and_cmdline(testdiomonitor_sample* io_psample, struct task_struct* i_ptask)
{struct task_struct* parent;rcu_read_lock();parent = rcu_dereference(i_ptask->real_parent);io_psample->ppid = parent->pid;strlcpy(io_psample->ppidcomm, parent->comm, TASK_COMM_LEN);rcu_read_unlock();
}#define TESTDIOMONITOR_COMMANDLINE_MAX 128static void write_file(struct work_struct *w)
{ssize_t ret;u32 index;testdiomonitor_sample* psample;struct tm t;char timestr[64];char exceedstr[64];char temp_commandline[TESTDIOMONITOR_COMMANDLINE_MAX];struct pid* pid_struct;struct task_struct* ptask;int stacki;while (_env.ringbuff.rp != _env.ringbuff.wp) {index = (_env.ringbuff.rp & (TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1));psample = &_env.ringbuff.parray_sample[index];if (psample->writedone != 1) {break;}testdiomonitor_write_file_emptyline();_env.headoffset = sprintf(_env.file_linebuff, "[%llu][%s] ", _env.ringbuff.rp, psample->desc);time64_to_tm(psample->time.tv_sec + 8 * 60 * 60, 0, &t);snprintf(timestr, 64, "%04ld-%02d-%02d-%02d_%02d_%02d.%09ld",1900 + t.tm_year, t.tm_mon + 1, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec, psample->time.tv_nsec);if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_DEXCEED) {snprintf(exceedstr, 64, "dtimens[%llu]", psample->dtimens);}else if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED) {snprintf(exceedstr, 64, "iowaittimens[%llu]", psample->iowaittimens);}else if (psample->desc == TESTDIOMONITOR_SAMPLEDESC_IOEXCEED) {snprintf(exceedstr, 64, "delayacct_iowaittimens[%llu]", psample->iowaittimens);}else {exceedstr[0] = '\0';}testdiomonitor_file_oneline("begin...time[%s]cpu[%d]desc[%s]%s\n", timestr, psample->cpu, psample->desc, exceedstr);testdiomonitor_file_oneline("tgid[%d]pid[%d]comm[%s]ppid[%d]ppidcomm[%s]\n",psample->tgid, psample->pid, psample->ppidcomm, psample->pid, psample->comm);pid_struct = find_get_pid(psample->pid);if (pid_struct) {ptask = get_pid_task(pid_struct, PIDTYPE_PID);if (ptask) {testdiomonitor_set_cmdline(temp_commandline, TESTDIOMONITOR_COMMANDLINE_MAX, ptask);put_task_struct(ptask);}else {temp_commandline[0] = '\0';}put_pid(pid_struct);}else {temp_commandline[0] = '\0';}testdiomonitor_file_oneline("commandline[%s]\n", temp_commandline);pid_struct = find_get_pid(psample->ppid);if (pid_struct) {ptask = get_pid_task(pid_struct, PIDTYPE_PID);if (ptask) {testdiomonitor_set_cmdline(temp_commandline, TESTDIOMONITOR_COMMANDLINE_MAX, ptask);put_task_struct(ptask);}else {temp_commandline[0] = '\0';}put_pid(pid_struct);}else {temp_commandline[0] = '\0';}testdiomonitor_file_oneline("ppid_commandline[%s]\n", temp_commandline);testdiomonitor_file_oneline("stack[%d]:\n", psample->stackn);for (stacki = 0; stacki < psample->stackn; stacki++) {testdiomonitor_file_oneline("%*c%pS\n", 5, ' ', (void *)psample->parray_stack[stacki]);}testdiomonitor_write_file_emptyline();psample->writedone = 0;_env.ringbuff.rp ++;}queue_delayed_work_on(nr_cpu_ids - 1, wq_write_file,&work_write_file, 1);
}static void init_write_file(void)
{init_file();wq_write_file = alloc_workqueue("testdiomonitor_write_file", WQ_MEM_RECLAIM, 0);INIT_DELAYED_WORK(&work_write_file, write_file);queue_delayed_work_on(nr_cpu_ids - 1, wq_write_file,&work_write_file, 3);
}static void exit_write_file(void)
{cancel_delayed_work_sync(&work_write_file);destroy_workqueue(wq_write_file);exit_file();
}void init_testdiomonitor_sample_ringbuff(void)
{testdiomonitor_sample* psample;_env.ringbuff.parray_sample = kvzalloc(sizeof(testdiomonitor_sample) * TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT, GFP_KERNEL);
}void exit_testdiomonitor_sample_ringbuff(void)
{kvfree(_env.ringbuff.parray_sample);
}testdiomonitor_sample* testdiomonitor_get_psample(void)
{u64 windex_raw, windex_raw_old;u32 windex;while (1) {windex_raw = _env.ringbuff.wp;if (windex_raw - _env.ringbuff.rp >= (u64)(TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT)) {_env.ringbuff.skipcount ++;return NULL;}// atomic_cmpxchg return old valuewindex_raw_old = atomic64_cmpxchg((atomic64_t*)&_env.ringbuff.wp,windex_raw, windex_raw + 1);if (windex_raw_old == windex_raw) {break;}}windex = (u32)(windex_raw & (u64)(TESTDIOMONITOR_SAMPLE_RINGBUFF_MAXCOUNT - 1));return &_env.ringbuff.parray_sample[windex];
}void testdiomonitor_add_sample(const char* i_desc, struct task_struct* i_task, u64 i_timens)
{testdiomonitor_sample* psample = testdiomonitor_get_psample();if (!psample) {return;}ktime_get_real_ts64(&psample->time);psample->cpu = task_cpu(i_task);psample->pid = i_task->pid;psample->tgid = i_task->tgid;strlcpy(psample->comm, i_task->comm, TASK_COMM_LEN);testdiomonitor_checkget_parentinfo_and_cmdline(psample, i_task);psample->bin_iowait = i_task->in_iowait;psample->desc = i_desc;if (i_desc == TESTDIOMONITOR_SAMPLEDESC_DEXCEED) {psample->dtimens = i_timens;}else if (i_desc == TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED || i_desc == TESTDIOMONITOR_SAMPLEDESC_IOEXCEED) {psample->iowaittimens = i_timens;}psample->stackn = _stack_trace_save_tsk(i_task, (unsigned long*)psample->parray_stack, TEST_STACK_TRACE_ENTRIES, 0);psample->writedone = 1;
}static void cb_sched_switch(void *i_data, bool i_preempt,struct task_struct *i_prev,struct task_struct *i_next,unsigned int i_prev_state)
{void* parray_stack[TEST_STACK_TRACE_ENTRIES];int num_stack;int stacki;if (i_prev_state == TASK_UNINTERRUPTIBLE) {if (i_prev->in_iowait) {testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_SWDIOSTART, i_prev, 0);}else {testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_SWDSTART, i_prev, 0);}}
}static void cb_sched_waking(void *i_data, struct task_struct *i_p) {if (i_p->__state == TASK_UNINTERRUPTIBLE) {//u64 currns = my_rq_clock_task();struct rq* prq = my_cpu_rq(task_cpu(i_p));u64 currns = prq->clock_task;u64 local_c = local_clock();int cpuid = smp_processor_id();if (i_p->in_iowait) {testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_WADIOSTOP, i_p, 0);testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_DIOEXCEED, i_p, currns - i_p->se.exec_start);if (i_p->se.exec_start > currns) {//if (task_cpu(i_p) == cpuid) {printk("comm[%s]pid[%d]exec_start[%llu]currns[%llu]local_clock[%llu]last_cpu[%d]cpuid[%d]\n", i_p->comm, i_p->pid, i_p->se.exec_start, currns, local_c, task_cpu(i_p), cpuid);}}}else {testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_WADSTOP, i_p, 0);testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_DEXCEED, i_p, my_rq_clock_task() - i_p->se.exec_start);if (i_p->se.exec_start > currns) {//if (task_cpu(i_p) == cpuid) {printk("comm[%s]pid[%d]exec_start[%llu]currns[%llu]local_clock[%llu]last_cpu[%d]cpuid[%d]\n", i_p->comm, i_p->pid, i_p->se.exec_start, currns, local_c, task_cpu(i_p), cpuid);}}}}
}static void cb_iodelay_account(void *i_data, struct task_struct *i_curr,unsigned long long i_delta)
{testdiomonitor_add_sample(TESTDIOMONITOR_SAMPLEDESC_IOEXCEED, i_curr, i_delta);
}struct kern_tracepoint {void *callback;struct tracepoint *ptr;bool bregister;
};
static void clear_kern_tracepoint(struct kern_tracepoint *tp)
{if (tp->bregister) {tracepoint_probe_unregister(tp->ptr, tp->callback, NULL);}
}#define INIT_KERN_TRACEPOINT(tracepoint_name) \static struct kern_tracepoint mykern_##tracepoint_name = {.callback = NULL, .ptr = NULL, .bregister = false};#define TRACEPOINT_CHECK_AND_SET(tracepoint_name) \static void tracepoint_name##_tracepoint_check_and_set(struct tracepoint *tp, void *priv) \{ \if (!strcmp(#tracepoint_name, tp->name)) \{ \((struct kern_tracepoint *)priv)->ptr = tp; \return; \} \}INIT_KERN_TRACEPOINT(sched_switch)
TRACEPOINT_CHECK_AND_SET(sched_switch)
INIT_KERN_TRACEPOINT(sched_waking)
TRACEPOINT_CHECK_AND_SET(sched_waking)
#ifdef IODELAY_TRACEPOINT_ENABLE
INIT_KERN_TRACEPOINT(iodelay_account)
TRACEPOINT_CHECK_AND_SET(iodelay_account)
#endiftypedef unsigned long (*kallsyms_lookup_name_func)(const char *name);
kallsyms_lookup_name_func _kallsyms_lookup_name_func;void* get_func_by_symbol_name_kallsyms_lookup_name(void)
{int ret;void* pfunc = NULL;struct kprobe kp;memset(&kp, 0, sizeof(kp));kp.symbol_name = "kallsyms_lookup_name";kp.pre_handler = NULL;kp.addr = NULL; // 作为强调,提示使用symbol_nameret = register_kprobe(&kp);if (ret < 0) {printk("register_kprobe fail!\n");return NULL;}printk("register_kprobe succeed!\n");pfunc = (void*)kp.addr;unregister_kprobe(&kp);return pfunc;
}void* get_func_by_symbol_name(const char* i_symbol)
{if (_kallsyms_lookup_name_func == NULL) {return NULL;}return _kallsyms_lookup_name_func(i_symbol);
}static int __init testdiomonitor_init(void)
{_kallsyms_lookup_name_func = get_func_by_symbol_name_kallsyms_lookup_name();_prq = get_func_by_symbol_name("runqueues");if (_prq == NULL) {printk(KERN_ERR "get_func_by_symbol_name runqueues failed!\n");return -1;}init_testdiomonitor_sample_ringbuff();init_write_file();_stack_trace_save_tsk = get_func_by_symbol_name("stack_trace_save_tsk");if (_stack_trace_save_tsk == NULL) {printk(KERN_ERR "get_func_by_symbol_name stack_trace_save_tsk failed!\n");return -1;}_get_cmdline_func = get_func_by_symbol_name("get_cmdline");if (_get_cmdline_func == NULL) {printk(KERN_ERR "get_func_by_symbol_name get_cmdline failed!\n");return -1;}mykern_sched_switch.callback = cb_sched_switch;for_each_kernel_tracepoint(sched_switch_tracepoint_check_and_set, &mykern_sched_switch);if (!mykern_sched_switch.ptr) {printk(KERN_ERR "mykern_sched_switch register failed!\n");return -1;}else {printk(KERN_INFO "mykern_sched_switch register succeeded!\n");}tracepoint_probe_register(mykern_sched_switch.ptr, mykern_sched_switch.callback, NULL);mykern_sched_switch.bregister = 1;mykern_sched_waking.callback = cb_sched_waking;for_each_kernel_tracepoint(sched_waking_tracepoint_check_and_set, &mykern_sched_waking);if (!mykern_sched_waking.ptr) {printk(KERN_ERR "mykern_sched_waking register failed!\n");return -1;}else {printk(KERN_INFO "mykern_sched_waking register succeeded!\n");}tracepoint_probe_register(mykern_sched_waking.ptr, mykern_sched_waking.callback, NULL);mykern_sched_waking.bregister = 1;#ifdef IODELAY_TRACEPOINT_ENABLEmykern_iodelay_account.callback = cb_iodelay_account;for_each_kernel_tracepoint(iodelay_account_tracepoint_check_and_set, &mykern_iodelay_account);if (!mykern_iodelay_account.ptr) {printk(KERN_ERR "mykern_iodelay_account register failed!\n");return -1;}else {printk(KERN_INFO "mykern_iodelay_account register succeeded!\n");}tracepoint_probe_register(mykern_iodelay_account.ptr, mykern_iodelay_account.callback, NULL);mykern_iodelay_account.bregister = 1;
#endifreturn 0;
}static void __exit testdiomonitor_exit(void)
{clear_kern_tracepoint(&mykern_sched_switch);clear_kern_tracepoint(&mykern_sched_waking);
#ifdef IODELAY_TRACEPOINT_ENABLEclear_kern_tracepoint(&mykern_iodelay_account);
#endiftracepoint_synchronize_unregister();exit_write_file();exit_testdiomonitor_sample_ringbuff();
}module_init(testdiomonitor_init);
module_exit(testdiomonitor_exit);
相关文章:
在计算进程D状态持续时间及等IO的时间遇到的一处问题
一、背景 之前的博客 线程每次iodelay监控及D状态开始和结束监控并做堆栈记录-CSDN博客 里,我们讲到了通过内核模块抓取D状态的进程和等IO事件的方法,里面也用到了通过获取rq的symbol,再去获取rq里的rq_clock_task时间的方法(内核…...
Android11-12-13 替换系统默认壁纸
替换默认壁纸,是客需中再普通不过的需求,这里整理作为笔记记录 文章目录 需求场景 关联资料需求实现拓展总结 需求 客制化客户壁纸需求,替换客户定制的壁纸。 场景 手机-平板相关产品,各种广告机、工控、消费级产品,…...
Buffer overFolw---Kryo序列化出现缓冲区溢出的问题解决
问题: 由于我的数据量太大,我设置批次为10000万,50w数据大概有400M左右,然后进行spark数据处理时候报错为org.apache.spark.SparkException:Kryo serialization failed:Buffer overFolw.Available:0,rquired 58900977,To …...
什么是索引?为什么要使用B树作为索引数据结构?
MySQL的事务特性 1.原子性:原子性就是这个事件要么执行完,要么没执行,不会存在中间状态,与C中华那个加锁避免多线程竞争是一个道理; 2.一致性:保持事件的操作对象双方某数据之和是不变的,就以转账为例,A转给B100块,那么A的余额多100,B的余额就必须少100; 3.隔离性:隔离就是独…...
【Linux】线程基础
🔥个人主页:Quitecoder 🔥专栏:linux笔记仓 目录 01.背景知识02.线程概念简单使用线程线程调度成本更低 01.背景知识 OS进行内存管理,不是以字节为单位的,而是以内存块为单位的,默认大小为4kb&…...
TCP netstat TIME_WAIT CLOSE_WAIT
TIME_WAIT is different from CLOSE_WAIT, and it’s not necessarily a client-side fault. It happens due to how TCP handles connection termination. Key Differences TCP StateCauseWho is responsible?Fix/OptimizationCLOSE_WAITServer didn’t close() the socket…...
「宇树科技」13家核心零部件供应商梳理!
2025年2月6日,摩根士丹利(Morgan Stanley)发布最新人形机器人研报:Humanoid 100: Mapping the Humanoid Robot Value Chain(人形机器人100:全球人形机器人产业链梳理)。 2025年2月20日…...
Spring Boot 项目打包运行
打包成jar包,执行 java -jar 包名; 保证打出的jar包是独立可运行的包,需要xml中添加插件配置 <!-- SpringBoot应用打包插件--> <build><plugins><plugin><groupId>org.springframework.boot</groupId&…...
数据结构八股
线性数据结构 数组:数组的内存空间是连续的,随机访问的时间复杂度是01,适用于需要按索引访问元素的场景,但是插入和删除元素较慢,时间复杂度是On链表:链表是由节点组成,节点之间是分散存储的,内存不连续&a…...
7.2 控件和组件
版权声明:本文为博主原创文章,转载请在显著位置标明本文出处以及作者网名,未经作者允许不得用于商业目的 C#工具箱位于编辑窗口的左侧,它默认内置了大量的控件和组件。控件一般派生于System.Windows.Forms.Control类,显…...
自动驾驶背后的数学:ReLU,Sigmoid, Leaky ReLU, PReLU,Swish等激活函数解析
随着自动驾驶技术的飞速发展,深度学习在其中扮演着至关重要的角色。而激活函数作为神经网络中的关键组件,直接影响着模型的性能和效果。前面几篇博客 自动驾驶背后的数学:特征提取中的线性变换与非线性激活 , 「自动驾驶背后的数学࿱…...
transform
http://zhihu.com/question/445556653/answer/3254012065 西科技的文章 视频讲解 小白也能听懂的 transformer模型原理详解 self- attention 多头注意力机制 encoder decoder 机器翻译_哔哩哔哩_bilibili...
在 Simulink 里构建输水隧洞充水过程模型的基本步骤与思路
下面为你介绍在 Simulink 里构建输水隧洞充水过程模型的基本步骤与思路,不过由于没办法直接生成 Simulink 模型文件,这里会给出一个模拟该过程的 Matlab 脚本代码示例。 建模思路 输水隧洞充水过程一般能够用一阶常微分方程来描述,其方程如…...
详解堆排序(超详细)
堆排序 一、 堆的基本概念二、 堆排序的主要步骤:1. 构建大顶堆(Max-Heap)2. 排序过程3. 堆化的详细过程4. 堆排序的完整步骤总结5. 堆排序的时间复杂度6. 堆排序的空间复杂度 三、 举例讲解示例数组第一步:构建大顶堆1.1 找到最后…...
思库拉水厂开业庆典千人大会回顾
近日,思库拉离子水厂在广州隆重举办了开业盛典,现场汇聚了逾千名嘉宾。此次盛会不仅是对思库拉离子水厂正式投产的庆祝,更是对思库拉品牌未来蓝图的一次展示。 现场氛围热烈,洋溢着浓厚的喜庆气息。参与者来自五湖四海,既有思库拉的忠实拥趸,也有对思库拉产品充满兴趣的潜在消费…...
react 大屏根据屏幕分辨率缩放
记录,以防忘记 const DataLargeScreen () > {const layoutRef useRef<any>();// ui稿宽度const width useRef(1920).current;// ui稿高度const height useRef(1080).current;const [scaleValue, setScaleValue] useState(1);const useWhichScaleValu…...
JAVA学习*Object类
Object类 Object类是所有类的父类 类中有一些方法(都需要掌握) toString()方法 在学习类的对象的时候有介绍过了,当我们重新给此方法就会打印类与对象的信息 equals()方法 在Java中的比较, 如果左右两侧是基本类型变量&#…...
基于python脚本实现的打砖块小游戏
目录 1. 打砖块游戏 2. 初始化 Pygame 和设置屏幕 3. 定义游戏对象 3.1 定义玩家操作的paddle 3.2 定义球(Ball) 3.3 砖块(Bricks) 4. 游戏主循环 4.1 事件处理 4.2 板子移动 4.3 球移动和碰撞检测 4.4 绘制游戏对象 …...
20250317-vue-Prop4
运行时类型检查 校验选项中的 type 可以是下列这些原生构造函数: StringNumberBooleanArrayObjectDateFunctionSymbolError 另外,type 也可以是自定义的类或构造函数,Vue 将会通过 instanceof 来检查类型是否匹配。例如下面这个类…...
地理信息系统(GIS)在智慧城市中的40个应用场景案例
在智慧城市发展进程中,地理信息系统(GIS)作为关键技术之一,正扮演着不可或缺的角色,堪称智慧城市的神经中枢。通过空间数据分析优化城市管理,GIS技术为智慧城市的构建提供了强大的支持。 本文分享了GIS在智…...
XSS Game(DOM型) 靶场 通关
目录 靶场网址 Ma Spaghet! 分析 解题 Jefff 分析 解题 方法一 方法二 Ugandan Knuckles 分析 解题 Ricardo Milos 分析 解题 Ah Thats Hawt 分析 解题 方法一 方法二 Ligma 分析 解题 Mafia 分析 解题 方法一:构造函数 方法二…...
【大模型基础_毛玉仁】3.5 Prompt相关应用
目录 3.5 相关应用3.5.1 基于大语言模型的Agent3.5.2 数据合成3.5.3 Text-to-SQL3.5.4 GPTs 3.5 相关应用 Prompt工程应用广泛,能提升大语言模型处理基础及复杂任务的能力,在构建Agent、数据合成、Text-to-SQL转换和设计个性化GPTs等方面不可或缺。 . …...
《Python全栈开发》第12课:RESTful API设计 - 构建现代化接口
🌟 课程目标 理解REST设计原则掌握Flask-RESTful开发实现JWT认证接口构建标准化API文档一、REST是什么?(餐厅点餐系统比喻) 1.1 REST核心原则 #mermaid-svg-0rLbveAhUdJCLKTy {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;…...
深入解析libsunrpc:构建分布式系统的核心RPC库
深入解析libsunrpc:构建分布式系统的核心RPC库 引言 在分布式系统开发中,远程过程调用(Remote Procedure Call, RPC) 是连接不同节点、实现跨网络服务调用的关键技术。作为SUN公司开源的经典RPC实现,libsunrpc 凭借其…...
deepseek搭建本地私有知识库dify安装介绍docker compose图文教程
按照官方介绍,打开教程安装。下载源码, # 假设当前最新版本为 0.15.3 git clone https://github.com/langgenius/dify.git --branch 0.15.3 进入docker目录, cd dify/docker 网络科学的直接执行命令就可以了。 docker compose up -d 镜…...
C语言动态内存管理深度解析与嵌入式开发实战
C语言动态内存管理深度解析与嵌入式开发实战 (高级嵌入式软件开发工程师视角) 一、动态内存函数原理与差异 malloc 核心机制:从堆区分配指定字节的未初始化内存,返回void*指针。失败时返回NULL,必须检查返回值…...
右击没有Word、PPT、Excel功能
右击没有Word、PPT、Excel功能 导航 文章目录 右击没有Word、PPT、Excel功能导航一、问题描述二、事情经过三、解决方案其他思路分享 一、问题描述 在安装并激活了office之后,业务反馈右击没有出现新建Word功能,仅有Word文档 二、事情经过 按道…...
無人機高空收集地形之linux server 的應用部署
如何在Linux服务器上部署无人机高空地形测量应用? 一、技术实现步骤 系统环境搭建 操作系统与ROS安装 在Linux服务器(推荐Ubuntu LTS版本)上安装ROS(机器人操作系统),例如ROS Noetic或ROS2 Humble1。ROS提…...
DeepSeek R1 本地部署指南 (6) - Windows 本地部署使用 GPU 运行
DeepSeek R1 本地部署指南 (1) - Windows 本地部署 上一篇,安装好 Windows 本地步骤后,如果发现在任务管理器中 GPU 显示 0%。 1.在命令行中输入: ollama ps 显示: PROCESSOR CPU 2.安装 CUDA Toolkit CUDA Toolkit Downloads htt…...
鸿蒙进行视频上传,使用 request.uploadFile方法
一.拉起选择器进行视频选择,并且创建文件名称 async getPictureFromAlbum() {// 拉起相册,选择图片let PhotoSelectOptions new photoAccessHelper.PhotoSelectOptions();PhotoSelectOptions.MIMEType photoAccessHelper.PhotoViewMIMETypes.VIDEO_TY…...
