linux內核hardlockup機制:
hardlockup 是watchdog框架下的一種關注于中斷發(fā)生后一直占用CPU而其它中斷無法響應導致的系統(tǒng)問題的一種debug方法. 具體的超時判斷時間一般為10S,也可以通過sysctrl watchdog_thresh 來進行修改.
當觸發(fā)hardlockup時內核會打印當前的調用堆棧信息或者配置為panic可以觸發(fā)panic并打印當前堆棧信息. 可以通過sysctrl hardlockup_panic進行動態(tài)修改, 可以通過 CONFIG_BOOTPARAM_HARDLOCKUP_PANIC進行配置.
hardlockup機制實現(xiàn)基礎:
hardlockup 實現(xiàn)上依賴于下面內容:
a) watchdog的內核框架
b) 高精度timer框架: 高精度timer即hrtimer的實現(xiàn)在不同的計算機體系結構上會有不同的硬件去實現(xiàn).
c) perfEvent框架: perfEvent的實現(xiàn)同樣不同的計算機體系結構會有不同的實現(xiàn)方式,他們都依賴于具體的計算機體系結構, 而ARM實現(xiàn)perf Event的方式我們之前有做過簡單分析,具體的參考之前這篇文章.
hardlockup實現(xiàn)的框架圖:
hardlockup實現(xiàn)機制
hardlockup工作機制的源碼解讀(依賴計算機體系結構實現(xiàn)的PerfEvent以ARM的PMU為示例進行解讀):
啟動watchdog hrtimer并創(chuàng)建PerfEvent過程如下:
//kernel/watchdog.c
void __init lockup_detector_init(void){
...
if (!watchdog_nmi_probe())//創(chuàng)建對應perfEvent
nmi_watchdog_available = true;
lockup_detector_setup();//啟動高精度timer的watchdog同時觸發(fā)PerfEvent
}
下面我們來看看Perf Event的創(chuàng)建過程.
//kernel/watchdog_hld.c
int __init hardlockup_detector_perf_init(void){
int ret = hardlockup_detector_event_create();//hardloopup 創(chuàng)建對應perfevent過程
...
}
//對應perf Event 創(chuàng)建額type以及config
static struct perf_event_attr wd_hw_attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.size = sizeof(struct perf_event_attr),
.pinned = 1,
.disabled = 1,
};
static int hardlockup_detector_event_create(void)
{
...
struct perf_event_attr *wd_attr;
struct perf_event *evt;
wd_attr = &wd_hw_attr;
//這句和具體的體系結構有關系,對應的ARM的PMU為換算成對應cycle counter.
wd_attr- >sample_period = hw_nmi_get_sample_period(watchdog_thresh);
/* Try to register using hardware perf events */
/* watchdog_overflow_callback為cycle counter發(fā)生overflow時觸發(fā)的handler
* 對應到我們之前講的Perf Event基石PMU那篇文章就是 armv8pmu_handle_irq中
* call到perf_event_overflow函數(shù) */
evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
watchdog_overflow_callback, NULL);
...
return 0;
}
關于具體創(chuàng)建的我們稍后再詳細敘述, 這里只需要知道創(chuàng)建的具體過程是通過perf_event_overflow來實現(xiàn)的, 并且watchdog_overflow_callback是在對應PMU的counter overflow時會產生對應不可屏蔽中斷(NMI).我們先看一下watchdog_overflow_callback的具體實現(xiàn), 具體實現(xiàn)如下:
//kernel/watchdog_hld.c
/* 看到了嗎? 該函數(shù)參數(shù)是可以與 armv8pmu_handle_irq中call到的
* perf_event_overflow傳遞的參數(shù)是一致的
* 我們稍后解析這個函數(shù)是如何給具體的PerfEvent的 */
static void watchdog_overflow_callback(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs){
...
//watchdog_nmi_touch這個為可搶占case路徑提供的接口,我們不做討論
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
}
//
if (!watchdog_check_timestamp())
return;
/* is_hardlockup的實現(xiàn)就是判斷hrtimer_interrupts與
* 上次發(fā)生時保存的hrtimer_interrupts_saved是否相等,相等即hrtimer沒有做過響應
* 即觸發(fā)了hardlockup機制*/
if (is_hardlockup()) {
...
/* only print hardlockups once */
if (__this_cpu_read(hard_watchdog_warn) == true)
return;
//show對應信息或者dump堆棧信息.
if (regs)
show_regs(regs);
else
dump_stack();
...
if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");//觸發(fā)對應kernel panic
...
}
}
我們再來看看是如何更新hrtimer_interrupts與hrtimer_interrupts_saved的
//kernel/watchdog.c
lockup_detector_init
-- >lockup_detector_setup
-- >lockup_detector_reconfigure
-- >softlockup_start_all
-- >smp_call_on_cpu//每個CPU的核都對應綁定一個
-- >watchdog_enable
//如果對應支持CPU的熱插拔,會在cpu online中同樣做觸發(fā)
static void watchdog_enable(unsigned int cpu) {
struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
struct completion *done = this_cpu_ptr(&softlockup_completion);
...
/*Start the timer first to prevent the NMI watchdog triggering
* before the timer has a chance to fire.
*/
/* watchdog_timer_fn在以間隔時間sample_period=watchdog_thresh*2*NSEC_PER_SEC/5
* 即默認(watchdog_thresh為10S) 4S為周期的狀況下做一次hrtimer的觸發(fā)*/
hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtimer- >function=watchdog_timer_fn;
hrtimer_start(hrtimer, ns_to_ktime(sample_period),HRTIMER_MODE_REL_PINNED);
...
//Enable the perf event,啟動前面創(chuàng)建的perfEvent,如果沒有創(chuàng)建則進行創(chuàng)建
if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
watchdog_nmi_enable(cpu);
}
//watchdog kicker functions
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer){
...
/* kick the hardlockup detector */
watchdog_interrupt_count(); //對hrtimer_interrupts進行更新.
...
}
以上就是我們看到的"hardlockup實現(xiàn)機制"的具體代碼實現(xiàn)部分.那么我們再來剖析另一個關鍵點: 該PerfEvent事件的創(chuàng)建過程,即perf_event_create_kernel_counter的實現(xiàn)過程
//kernel/events/core.c
/**
* perf_event_create_kernel_counter
*
* @attr: attributes of the counter to create
* @cpu: cpu in which the counter is bound
* @task: task to profile (NULL for percpu)
*/
struct perf_event *
perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
struct task_struct *task,perf_overflow_handler_t overflow_handler,void *context){
struct perf_event_context *ctx;
struct perf_event *event;
...
/* 創(chuàng)建type為PERF_TYPE_HARDWARE,config為PERF_COUNT_HW_CPU_CYCLES
* perfiod為10s次的cycle counter*/
event = perf_event_alloc(attr, cpu, task, NULL, NULL,overflow_handler, context, -1);
...
//分配 匹配對應context。
ctx = find_get_context(event- >pmu, task, event);
...
perf_install_in_context(ctx, event, cpu);
perf_unpin_context(ctx);
...
return event;
}
/*分配并且初始化perfevent */
static struct perf_event *
perf_event_alloc(struct perf_event_attr *attr, int cpu,struct task_struct *task,
struct perf_event *group_leader,struct perf_event *parent_event,
perf_overflow_handler_t overflow_handler,void *context, int cgroup_fd){
struct pmu *pmu;
struct perf_event *event;
struct hw_perf_event *hwc;
...
//分配perf_event空間
event = kzalloc(sizeof(*event), GFP_KERNEL);
...//初始化變量
init_waitqueue_head(&event- >waitq);
init_irq_work(&event- >pending, perf_pending_event);
...
/* perf_event 做初始化,直接初始化到具體type的config
* -- >perf_init_event
* -- >perf_try_init_event
* -- > pmu- >event_init(event)
* /
pmu = perf_init_event(event);
...
}
//drivers/perf/arm_pmu.c
static int armpmu_event_init(struct perf_event *event){
....
/*根據(jù)之前perfEvent基石PMU中code的分析,改map_event對應為PMU中的
* armv8_pmuv3_perf_map 進行匹配,由于我們的config傳入的是PERF_COUNT_HW_CPU_CYCLES
* 所以對應的PMU的事件為ARMV8_PMUV3_PERFCTR_CPU_CYCLES */
if (armpmu- >map_event(event) == -ENOENT)
return -ENOENT;
return __hw_perf_event_init(event);
}
自此,PERF_COUNT_HW_CPU_CYCLES的PefEvent事件就創(chuàng)建成功,后面的work 流程就如同文章中Perf Event基石PMU討論的那樣。
總結:
hardlockup實際上就是一種debug cpu被中斷hung主的機制,它利用的NMI(不可屏蔽中斷)來定時監(jiān)控hrtimer中斷在監(jiān)控時間段內是否有更新, 如果未更新,則證明發(fā)生異常,異常后的行為根據(jù)配置的不同會有不同的表現(xiàn)。
評論