当前位置: 首页 > article >正文

ORACLE数据库实例报错ORA-00470: LGWR process terminated with error宕机问题分析报告

  • 服务概述

10月21号03:22分,BOSS数据库实例发生异常宕机;工程师及时响应此问题并对此故障原因进行分析及相关建议,详细的故障情况及相关日志、TRACE文件的分析及总结、建议,请参阅本文档。

  • hzboss数据库实例宕机分析

4.1 数据库层面日志的分析

4.1.1故障数据库实例alert日志

数据库实例的Alert日志

对故障时间点附近的ALERT日志分析如下:

Wed Oct 21 03:06:22 2024
Thread 1 advanced to log sequence 254206 (LGWR switch)
  Current log# 9 seq# 254206 mem# 0: /boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2024
WARNING: aiowait timed out 1 times --该时间段系统出现aio等待的告警,该告警由_iowait_timeouts隐含参数控制,超过该参数设置的值后会导致系统宕机,这里并未超过系统设置值。但是提示我们当时系统在io调度上出现了问题。
Wed Oct 21 03:22:11 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367' --22分17秒系统抛错,进程367持有controlfile的enqueue lock时间被其他进程(arc0)等待超时,该enqueue lock持有时间由隐含参数_controlfile_enqueue_timeout控制,默认为900秒
Wed Oct 21 03:22:17 2024
System State dumped to trace file /oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc
Wed Oct 21 03:22:50 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367' --22分50秒,进程367持有的controlfile的enqueue lock时间被其他进程(lgwr)等待超时抛错
Wed Oct 21 03:22:55 2024
Killing enqueue blocker (pid=367) on resource CF-00000000-00000000 --LGWR进程发现异常查杀持有controlfile equeue lock的进程367
 by killing session 1648.1
Wed Oct 21 03:22:55 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_lgwr_365.trc:
ORA-07445: exception encountered: core dump [ksuklms()+672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc: --后续一系列报错宕机开始
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:18 2024
PMON: terminating instance due to error 470
Wed Oct 21 03:23:18 2024
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2024
Errors in file /oracle/admin/dtvboss/bdump/dtvboss_j006_22335.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00470: LGWR process terminated with error
Wed Oct 21 03:23:22 2024
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2024
Instance termination failed to kill one or more processes

4.1.2 故障时trace文件分析

故障时,节点数据库实例的alert日志中提到了如下TRACE文件:dtvboss_arc0_404.trc

分析如下:

/oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/product/10.2.0/db_1

System name: SunOS

Node name: hzboss-ora1

Release: 5.10

Version: Generic_144488-12

Machine: sun4u

Instance name: dtvboss

Redo thread mounted by this instance: 1

Oracle process number: 23

Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)

*** SERVICE NAME:(SYS$BACKGROUND) 2024-10-21 03:22:11.264

*** SESSION ID:(1646.3) 2024-10-21 03:22:11.264

*** 2024-10-21 03:22:11.264

Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)

Possible local blocker ospid=367 sid=1648 sser=1 time_held=37 secs (ges mode req=6 held=4)

DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER

  possible owner[17.367] on resource CF-00000000-00000000

Dumping process info of pid[17.367] requested by pid[23.404]

ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 367'

-------------------------------------------------------------------------------

ENQUEUE [CF] HELD FOR TOO LONG

 

enqueue holder: 'inst 1, osid 367'

 

Process 'inst 1, osid 367' is holding an enqueue for maximum allowed time.

The process will be terminated.

 

Oracle Support Services triaging information: to find the root-cause, look

at the call stack of process 'inst 1, osid 367' located below. Ask the

developer that owns the first NON-service layer in the stack to investigate.

Common service layers are enqueues (ksq), latches (ksl), library cache

pins and locks (kgl), and row cache locks (kqr).

 

Dumping process 17.367 info:

*** 2024-10-21 03:22:11.300

---------------------------------------------------------------

从这一段信息可以看到,此TRACE文件是ARC0进程的TRACE,ARC0进程的ORACLE SID是23,OS PID是404;

ARC0进程无法获取资源:CF-00000000-00000000 (controlfile enqueue 队列锁),此资源的持有者是pid[17.367]

ARC0进程的TRACE分析

PROCESS 23:

  ----------------------------------------

  SO: 8a9004f88, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=23, calls cur/top: 8a95c6650/8a95c6650, flag: (2) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 24

              last post received-location: ksasnd

              last process to post me: b66002018 1 6

              last post sent: 0 0 24

              last post sent-location: ksasnd

              last process posted by me: b66005770 1 2

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 404

    OSD pid info: Unix process pid: 404, image: oracle@hzboss-ora1 (ARC0)

Dump of memory from 0x00000005F033B1F0 to 0x00000005F033B3F8

5F033B1F0 0000000C 00000000 00000008 AF739C70  [.............s.p]

………………………………

5F033B3E0 00000005 000313A7 00000008 AF739C70  [.............s.p]

5F033B3F0 00000006 000313A7                    [........]        

    (FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control03.ctl

     fno=2 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control02.ctl

     fno=1 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control01.ctl

     fno=0 lblksz=16384 fsiz=1308

    ----------------------------------------

    SO: 8a9778500, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,

                       event: 21, last message event: 21,

                       last message waited event: 21, messages read: 0

                       channel: (5eb52cc20) scumnt mount lock

                                scope: 1, event: 22, last mesage event: 20,

                                publishers/subscribers: 0/20,

                                messages published: 1

    ----------------------------------------

    SO: 8a95b7328, type: 4, owner: 8a9004f88, flag: INIT/-/-/0x00

    (session) sid: 1646 trans: 0, creator: 8a9004f88, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-0017-0000000D, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    last wait for 'enq: CF - contention' blocking sess=0x0 seq=43322 wait_time=2929723 seconds since wait started=36

                name|mode=43460004, 0=0, operation=0

    Dumping Session Wait History

     for 'enq: CF - contention' count=1 wait_time=2929723

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929723

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929725

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929726

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929724

                name|mode=43460004, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929750

                name|mode=43460004, 0=0, operation=0

    temporary object counter: 0

      ----------------------------------------

      UOL used : 0 locks(used=0, free=0)

      KGX Atomic Operation Log 5eaffc848

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5eaffc890

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5eaffc8d8

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 1646 efd 0 whr 0 slp 0

      ----------------------------------------

      SO: 8afaa4330, type: 41, owner: 8a95b7328, flag: INIT/-/-/0x00

      (dummy) nxc=0, nlb=0   

    ----------------------------------------

    SO: 8a97783d8, type: 11, owner: 8a9004f88, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a9004f88,

                       event: 24, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: 8a95c6650, type: 3, owner: 8a9004f88, flag: INIT/-/-/0x00

    (call) sess: cur 8a95b7328, rec 8a9591bc8, usr 8a95b7328; depth: 0

      ----------------------------------------

      SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, lock_flag: 0x0

      own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8

      ----------------------------------------

      SO: 8a9591bc8, type: 4, owner: 8a95c6650, flag: INIT/-/-/0x00

      (session) sid: 1618 trans: 0, creator: 0, flag: (2) -/REC -/-/-/-/-/-

                DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

                txn branch: 0

                oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

      temporary object counter: 0

    ----------------------------------------

    SO: 8a97ad460, type: 16, owner: 8a9004f88, flag: INIT/-/-/0x00

    (osp req holder)

此段信息可以看到,ARC0进程等待controlfile enqueue 队列锁。等待事件为enq: CF - contention

 

SO: 8a95d1fb8, type: 5, owner: 8a95c6650, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0017-0000000D

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, lock_flag: 0x0

      own: 0x8a95b7328, sess: 0x8a95b7328, prv: 0x8a95d1fe8

这里说明arc0在申请的资源是:res:0x5eb40f608

而在process 17号的ckpt中可以看到0x5eb40f608资源被ckptmode: SSX方式持有持有

CKPT进程的TRACE分析

PROCESS 17:

  ----------------------------------------

  SO: 8a90037d0, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=17, calls cur/top: 8a95c5df8/8a95c5df8, flag: (6) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 114

              last post received-location: kcbbza

              last process to post me: 8a9001830 1 6

              last post sent: 0 0 9

              last post sent-location: ksqrcl

              last process posted by me: b66011d18 91 0

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 367

    OSD pid info: Unix process pid: 367, image: oracle@hzboss-ora1 (CKPT)

Dump of memory from 0x00000005EF308A18 to 0x00000005EF308C20

5EF308A10                   0000000C 00000000          [........]

…………………………………………

    ----------------------------------------

    SO: 8a95b9df8, type: 4, owner: 8a90037d0, flag: INIT/-/-/0x00

    (session) sid: 1648 trans: 0, creator: 8a90037d0, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-0011-00000005, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    waiting for 'direct path write' blocking sess=0x0 seq=39369 wait_time=0 seconds since wait started=37

                file number=72, first dba=1, block cnt=1

……………………………………

    ----------------------------------------

    SO: 8a9777a30, type: 11, owner: 8a90037d0, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 8a90037d0,

                       event: 14, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: 8a95c5df8, type: 3, owner: 8a90037d0, flag: INIT/-/-/0x00

    (call) sess: cur 8a95b9df8, rec 8a95b5dc0, usr 8a95b9df8; depth: 0

      ----------------------------------------

      SO: 8a95d1f20, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000004 DID: 0001-0011-00000005

      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb507858, mode: S, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb507868

      ----------------------------------------

      SO: 8a95d1cc0, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-0011-00000005

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, mode: SSX, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f618

      ----------------------------------------

      SO: 8a95b5dc0, type: 4, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (session) sid: 1645 trans: 0, creator: 0, flag: (2) -/REC -/-/-/-/-/-

                DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

                txn branch: 0

                oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

      temporary object counter: 0

      ----------------------------------------

      SO: 8a95cb468, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) RS-00000019-00000001 DID: 0001-0011-00000005

      lv: 20 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f8c8, mode: SS, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x5eb40f8d8

      ----------------------------------------

      SO: 8a95cb338, type: 5, owner: 8a95c5df8, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0000-0011-00000004

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, mode: SS, lock_flag: 0x0

      own: 0x8a95b9df8, sess: 0x8a95b9df8, proc: 0x8a90037d0, prv: 0x8a95d1cf0

    ----------------------------------------

    SO: 8a97ad2c8, type: 16, owner: 8a90037d0, flag: INIT/-/-/0x00

    (osp req holder)

此处可以看到CKPT进程持有(enqueue) CF-00000000-00000000,持有模式是:res: 0x5eb40f608, mode: SSX, lock_flag: 0x0,可以对应到ARC0进程请求的资源。

LGWR进程的TRACE分析

PROCESS 10:

  ----------------------------------------

  SO: b66002018, type: 2, owner: 0, flag: INIT/-/-/0x00

  (process) Oracle pid=10, calls cur/top: b665c6de0/b665c6de0, flag: (6) SYSTEM

            int error: 0, call error: 0, sess error: 0, txn error 0

  (post info) last post received: 0 0 24

              last post received-location: ksasnd

              last process to post me: 8a9000860 1 6

              last post sent: 109 0 4

              last post sent-location: kslpsr

              last process posted by me: 8a902ef90 105 0

    (latch info) wait_event=0 bits=0

    Process Group: DEFAULT, pseudo proc: 5eb3388f0

    O/S info: user: oracle, term: UNKNOWN, ospid: 365

    OSD pid info: Unix process pid: 365, image: oracle@hzboss-ora1 (LGWR)

Dump of memory from 0x00000005EF308810 to 0x00000005EF308A18

5EF308810 00000015 00000000 0000000B 6C1C1158  [............l..X]

  ………………

    ----------------------------------------

    SO: b665cc938, type: 5, owner: b66002018, flag: INIT/-/-/0x00

    (enqueue) RT-00000001-00000000 DID: 0000-000A-00000005

    lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

    res: 0x5eb40f828, mode: X, lock_flag: 0x0

    own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f838

    (FOB) flags=2 fib=8af713780 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control03.ctl

     fno=2 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af7133e0 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control02.ctl

     fno=1 lblksz=16384 fsiz=1308

    (FOB) flags=2 fib=8af713040 incno=0 pending i/o cnt=0

     fname=/boss_sysdata/oradata/dtvboss/control01.ctl

     fno=0 lblksz=16384 fsiz=1308

    ----------------------------------------

    SO: b66778b58, type: 11, owner: b66002018, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: b66002018,

                       event: 12, last message event: 20,

                       last message waited event: 20, messages read: 1

                       channel: (5eb52cc20) scumnt mount lock

                                scope: 1, event: 22, last mesage event: 20,

                                publishers/subscribers: 0/20,

                                messages published: 1

    ----------------------------------------

    SO: b665bf398, type: 4, owner: b66002018, flag: INIT/-/-/0x00

    (session) sid: 824 trans: 0, creator: b66002018, flag: (51) USR/- BSY/-/-/-/-/-

              DID: 0001-000A-00000006, short-term DID: 0000-0000-00000000

              txn branch: 0

              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

    service name: SYS$BACKGROUND

    waiting for 'enq: CF - contention' blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0

                name|mode=43460005, 0=0, operation=0

    Dumping Session Wait History

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929716

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929716

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929718

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929719

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929713

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929715

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929722

                name|mode=43460005, 0=0, operation=0

     for 'enq: CF - contention' count=1 wait_time=2929736

                name|mode=43460005, 0=0, operation=0

    temporary object counter: 0

      ----------------------------------------

      UOL used : 0 locks(used=0, free=0)

      KGX Atomic Operation Log 5ebff7cd0

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5ebff7d18

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      KGX Atomic Operation Log 5ebff7d60

       Mutex 0(0, 0) idn 0 oper NONE

       Library Cache uid 824 efd 0 whr 0 slp 0

      ----------------------------------------

      SO: b6caa57e0, type: 41, owner: b665bf398, flag: INIT/-/-/0x00

      (dummy) nxc=0, nlb=0   

    ----------------------------------------

    SO: b66778a48, type: 11, owner: b66002018, flag: INIT/-/-/0x00

    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: b66002018,

                       event: 13, last message event: 27,

                       last message waited event: 27, messages read: 1

                       channel: (5eb528a48) system events broadcast channel

                                scope: 2, event: 2923059, last mesage event: 27,

                                publishers/subscribers: 0/421,

                                messages published: 1

    ----------------------------------------

    SO: b665c6de0, type: 3, owner: b66002018, flag: INIT/-/-/0x00

    (call) sess: cur b665bf398, rec 0, usr b665bf398; depth: 0

      ----------------------------------------

      SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-000A-00000006

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, req: SSX, lock_flag: 0x0

      own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

    ----------------------------------------

    SO: b667ae720, type: 16, owner: b66002018, flag: INIT/-/-/0x00

    (osp req holder)

从LGWR进程的TRACE中可以看到:waiting for 'enq: CF - contention' blocking sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0

说明LGWR进程也在等待controlfile enqueue 队列锁,等待事件为enq: CF  contention。

SO: b665cce90, type: 5, owner: b665c6de0, flag: INIT/-/-/0x00

      (enqueue) CF-00000000-00000000 DID: 0001-000A-00000006

      lv: 00 79 4a fb 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x2

      res: 0x5eb40f608, req: SSX, lock_flag: 0x0

      own: 0xb665bf398, sess: 0xb665bf398, proc: 0xb66002018, prv: 0x5eb40f628

从此段可以看到,LGWR进程请求的CF锁资源是0x5eb40f608,并且是以SSX模式请求;通过CKPT进程的TRACE可以看到其已经own: 0x8a95b9df8 持有此资源res: 0x5eb40f608, mode: SSX, lock_flag: 0x0 SSX模式持有此锁;因此LGWR进程也处于等待此资源的状态。

4.2 主机负载情况分析

从ALERT日志中提到的故障发生时的TRACE文件:

/oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc

/oracle/admin/dtvboss/bdump/dtvboss_arc0_404.trc

中可以找到故障发生时的主机负载情况:

*** 2024-10-21 03:22:11.300

Dumping diagnostic information for ospid 367:

OS pid = 367

loadavg : 0.44 0.62 1.82

swap info: free_mem = 1435.35M rsv = 59461.24M

           alloc = 55941.14M avail = 55124.91 swap_free = 58645.01M

Dumping diagnostic information for oracle@hzboss-ora1:

OS pid = 24887

loadavg : 1.91 1.01 1.84

swap info: free_mem = 7661.14M rsv = 53520.42M

           alloc = 50905.74M avail = 62991.89 swap_free = 65606.57M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 S   oracle 24887     1   0  39 20        ? 4115427        ?   Sep 24 ?          29:24 oracledtvboss (LOCAL=NO)

pstack: cannot examine 24887: no such process

*** 2024-10-21 03:23:53.644

ksuitm_check: OS PID=8690 is still alive

*** 2024-10-21 03:23:53.644

Dumping diagnostic information for oracle@hzboss-ora1 (J002):

OS pid = 8690

loadavg : 1.93 1.02 1.84

swap info: free_mem = 7660.87M rsv = 53524.68M

           alloc = 50905.95M avail = 62987.56 swap_free = 65606.30M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 O   oracle  8690     1   2  39 20        ? 4121237          00:57:24 ?          29:43 ora_j002_dtvboss

pstack: cannot examine 8690: no such process

*** 2024-10-21 03:23:54.751

ksuitm_check: OS PID=21430 is still alive

*** 2024-10-21 03:23:54.751

Dumping diagnostic information for oracle@hzboss-ora1 (J003):

OS pid = 21430

loadavg : 1.94 1.02 1.84

swap info: free_mem = 7660.27M rsv = 53521.18M

           alloc = 50906.49M avail = 62991.00 swap_free = 65605.69M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 O   oracle 21430     1   2  39 20        ? 4119652          02:59:44 ?           0:27 ora_j003_dtvboss

pstack: cannot examine 21430: unanticipated system error

*** 2024-10-21 03:23:57.839

ksuitm_check: OS PID=367 is still alive

*** 2024-10-21 03:23:57.840

Dumping diagnostic information for oracle@hzboss-ora1 (CKPT):

OS pid = 367

loadavg : 1.98 1.04 1.84

swap info: free_mem = 7660.46M rsv = 53520.98M

           alloc = 50906.02M avail = 62990.88 swap_free = 65605.85M

 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD

 0 S   oracle   367     1   0  39 20        ? 4116160        ?   Mar 30 ?         498:14 ora_ckpt_dtvboss

pstack: cannot examine 367: no such process

*** 2024-10-21 03:23:58.926

从这些负载信息可以看到,整个系统当时的空闲内存和swap还留有余地,足够当时系统的使用,所以内存不足的原因可以排除。

4.3 数据库版本情况

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/product/10.2.0/db_1

System name: SunOS

Node name: hzboss-ora1

Release: 5.10

Version: Generic_144488-12

Machine: sun4u

Instance name: dtvboss

Redo thread mounted by this instance: 1

Oracle process number: 2

COMP_ID    COMP_NAME                           VERSION         STATUS   MODIFIED                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

---------- ----------------------------------- --------------- -------- -----------------------                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

SDO        Spatial                             10.2.0.4.0      VALID    18/FEB/2011:21:05:30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

ORDIM      Oracle.interMedia                   10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

AMD        OLAP.Catalog                        10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

EM         Oracle.Enterprise.Manager           10.2.0.4.0      VALID    18/FEB/2011:21:04:33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XDB        Oracle.XML.Database                 10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CONTEXT    Oracle.Text                         10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

EXF        Oracle.Expression.Filter            10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

RUL        Oracle.Rule.Manager                 10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

OWM        Oracle.Workspace.Manager            10.2.0.4.3      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

ODM        Oracle.Data.Mining                  10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATALOG    Oracle.Database.Catalog.Views       10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATPROC    Oracle.Database.Packages.and.Types  10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

JAVAVM     JServer.JAVA.Virtual.Machine        10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XML        Oracle.XDK                          10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

CATJAVA    Oracle.Database.Java.Packages       10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

APS        OLAP.Analytic.Workspace             10.2.0.4.0      VALID    18/FEB/2011:21:05:29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

XOQ        Oracle.OLAP.API                     10.2.0.4.0      VALID    18/FEB/2011:21:05:29

  • 总结与建议

针对此次数据库实例宕机问题,总结及建议如下:

5.1 问题分析汇总

5.1.1 宕机问题总结

综上可以得出结论,由于arc0进程要写日志需要等待控制文件CF锁,lgwr估计要写日志或者切换也在等待控制文件CF锁,而ckpt进程持有controlfile enqueue lock时间过长,arch0等待超过900秒,导致oracle数据库系统判断异常。

此时oracle数据库系统会让lgwr进程kill持有controlfile enqueue lock的进程,相当于lgwr进程杀了ckpt进程,而ckpt进程又为oracle关键进程,连锁反应,引起宕机。

5.1.2 宕机的问题根源判断

在宕机前系统发出告警aiowait timed out 1 times,初步判断系统在当时的io存在一些问题。而当时系统的内存和swap资源都充足没有出现严重问题,trace中的loadavg : 0.44 0.62 1.82则说明了当时的cpu负载低,这里排除了内存和cpu的问题. 

初步建议检测当时的redo日志的切换和产生量,当时的数据库服务器的io调用频繁度,因此可以参考下数据库的2点到3点间的awr信息(如果有数据库监控系统那么则可以分析到故障发生时间点的io信息,但是当前环境未部署)。

controlfile equeue lock的超时管理机制中,受_kill_controlfile_enqueue_blocker=false,_controlfile_enqueue_timeout,_kill_enqueue_blocker这3个隐含参数的影响,当前环境中我不建议增加_controlfile_enqueue_timeout,而是建议在通过控制_kill_controlfile_enqueue_blocker,_kill_enqueue_blocker参数来实现该机制的管理,以避免宕机的风险。


控制队列锁管理机制的bug以及避免:

控制文件队列锁有一个相关的匹配Bug 7692631 - 'DATABASE CRASHES WITH ORA-494 AFTER UPGRADE TO 10.2.0.4',这里可以通过以下2个参数规避。

_kill_controlfile_enqueue_blocker值分为true和false,默认为true,如果设置为false则不会去kill超时的进程,此时_kill_controlfile_enqueue_blocker=false参数失效。

_kill_enqueue_blocker值分为别0,1,2,3. 当设置为1的时候,保证后台进程不会被kill,而只会kill前台进程,这样可以避免宕机。

5.2 建议

5.2.1在数据库层上的调整建议

经过沟通当时正在跑批,所以当时的日志切换会比较频繁,因此我们建议对此时的redo log的大小匹配处理进行调整,适当增加日志组以及日志文件大小。控制日志切换保持在20-30分钟一次。

同时建议使用数据库监控工具,以便能在故障时候能及时获取到故障前的监控数据。

在参数调整角度上我们可以将_kill_enqueue_blocker设置为1或者将_kill_controlfile_enqueue_blocker设置为false。

5.2.2在系统层上的调整建议

同时检查系统io是否存在问题,

同时建议部署io层面的监控工具。

相关文章:

ORACLE数据库实例报错ORA-00470: LGWR process terminated with error宕机问题分析报告

服务概述 10月21号03:22分,BOSS数据库实例发生异常宕机;工程师及时响应此问题并对此故障原因进行分析及相关建议,详细的故障情况及相关日志、TRACE文件的分析及总结、建议,请参阅本文档。 hzboss数据库实例宕机分析 4.1 数据库层面日志的分…...

【前端优化】vue2 webpack4项目升级webpack5,大大提升运行速度

记录一下过程 手里有个老项目,vue2webpack4 项目很大,每次运行、运行都要将近10分钟 现在又要往里面写很多东西,再不优化,开发着会更难受,所以决定先将它升级至webpack5 最初失败的尝试 直接在项目里安装了webpack5 但…...

Nginx应用场景详解与配置指南

1. 什么是Nginx? Nginx(发音为"engine-x")是一个高性能的HTTP和反向代理服务器,也是一个IMAP/POP3/SMTP代理服务器。它以高性能、稳定性、丰富的功能集、简单的配置和低资源消耗而闻名。 2. Nginx的主要应用场景 2.1 …...

vue2 切换主题色以及单页面好使方法

今天要新增一个页面要根据不同公司切换不同页面主题色&#xff0c;一点一点来&#xff0c;怎么快速更改 el-pagination 分页组件主题色。 <el-pagination :page-size"pageSize" :pager-count"pageCount"layout"sizes, prev, pager, next, jumper,…...

React学习———CSS Modules(样式模块化)

CSS Modules CSS Modules&#xff08;样式模块化&#xff09;是一种用于模块化和局部作用域化CSS样式的技术&#xff0c;让CSS只在当前组件内生效&#xff0c;避免全局样式冲突的技术方案 工作原理 文件命名&#xff1a;通常以.module.css、.module.less、.module.scss等结尾…...

MCP 与 Cloudflare 的结合:网络安全的新高度

MCP 与 Cloudflare 的结合:网络安全的新高度 在数字化时代,网络安全已经不只是某些行业的“专属问题”,而是所有企业、个人都必须面对的核心挑战。从 DDoS 攻击、数据泄露,到身份盗用,每一种网络威胁都可能带来巨大的损失。而这时候,微软 MCP(Microsoft Cloud Platform…...

JavaScript入门【1】概述

1.JavaScript是什么? <font style"color:rgb(38,38,38);">Javascript &#xff08;简称“JS”&#xff09;是⼀种直译式脚本语⾔&#xff0c;⼀段脚本其实就是⼀系列指令&#xff0c;计算机通过这些指令来达成⽬标。它⼜是⼀种动态类型的编程语⾔。JS⽤来在⽹…...

PyQt5 的使用

PyQt5 是 Python 里基于 Qt 框架的 GUI 开发工具&#xff0c;能做桌面应用&#xff0c;跨平台&#xff08;Windows/macOS/Linux 都能用&#xff09;。你可能想知道&#xff1a;怎么开始用&#xff1f;有哪些核心组件&#xff1f;怎么写界面逻辑&#xff1f;别急&#xff0c;咱们…...

JavaScript【6】事件

1.概述&#xff1a; 在 JavaScript 中&#xff0c;事件&#xff08;Event&#xff09;是浏览器或 DOM&#xff08;文档对象模型&#xff09;与 JavaScript 代码之间交互的一种机制。它代表了在浏览器环境中发生的特定行为或者动作&#xff0c;比如用户点击鼠标、敲击键盘、页面…...

STM32F10xx 参考手册

6. 什么是寄存器 本章参考资料&#xff1a;《STM32F10xx 参考手册》、《STM32F10xx数据手册》、 学习本章时&#xff0c;配合《STM32F10xx 参考手册》“存储器和总线架构”及“通用I/O(GPIO)”章节一起阅读&#xff0c;效果会更佳&#xff0c;特别是涉及到寄存器说明的部分。…...

使用Docker部署Nacos

sudo systemctl start docker sudo systemctl enable docker docker --version 步骤 2: 拉取 Nacos Docker 镜像 拉取 Nacos 镜像&#xff1a; 你可以从 Docker Hub 上拉取官方的 Nacos 镜像&#xff0c;使用以下命令&#xff1a; docker pull nacos/nacos-server 这会从 …...

深度学习中ONNX格式的模型文件

一、模型部署的核心步骤 模型部署的完整流程通常分为以下阶段&#xff0c;用 “跨国旅行” 类比&#xff1a; 步骤类比解释技术细节1. 训练模型学会一门语言&#xff08;如中文&#xff09;用 PyTorch/TensorFlow 训练模型2. 导出为 ONNX翻译成国际通用语言&#xff08;如英语…...

TIFS2024 | CRFA | 基于关键区域特征攻击提升对抗样本迁移性

Improving Transferability of Adversarial Samples via Critical Region-Oriented Feature-Level Attack 摘要-Abstract引言-Introduction相关工作-Related Work提出的方法-Proposed Method问题分析-Problem Analysis扰动注意力感知加权-Perturbation Attention-Aware Weighti…...

Redis 发布订阅模式深度解析:原理、应用与实践

在现代分布式系统架构中&#xff0c;实时消息传递机制扮演着至关重要的角色。Redis 作为一款高性能的内存数据库&#xff0c;其内置的发布订阅(Pub/Sub)功能提供了一种轻量级、高效的消息通信方案。本文将全面剖析 Redis 发布订阅模式&#xff0c;从其基本概念、工作原理到实际…...

环形缓冲区 ring buffer 概述

环形缓冲区 ring buffer 概述 1. 简介 环形缓冲区&#xff08;ring buffer&#xff09;&#xff0c;是一种用于表示一个固定尺寸、头尾相连的缓冲区的数据结构&#xff0c;适合缓存数据流。也称作环形缓冲区&#xff08;circular buffer&#xff09;&#xff0c;环形队列&…...

飞帆控件 post or get it when it has get

我在这里分享两个链接&#xff1a; post_get_it 设计 - 飞帆 有人看出来这个控件是干什么用吗&#xff1f; 控件的配置&#xff1a;...

SQL里where条件的顺序影响索引使用吗?

大家好&#xff0c;我是锋哥。今天分享关于【SQL里where条件的顺序影响索引使用吗&#xff1f;】面试题。希望对大家有帮助&#xff1b; SQL里where条件的顺序影响索引使用吗&#xff1f; 1000道 互联网大厂Java工程师 精选面试题-Java资源分享网 在 SQL 查询中&#xff0c;W…...

SAP学习笔记 - 开发豆知识02 - com.sap.cds.services.cds.CdsService 废止,那么用什么代替呢?

我看很多人代码里面用的都是这个CdsService类&#xff0c;可以自从2.0版本往上这个类就没了啊。 它的代替是什么呢&#xff1f;用的CqnService 那么怎么查的呢&#xff1f; 我也是下载好几个包&#xff0c;解压看&#xff0c;然后发现这里还可以直接看&#xff0c;挺方便的。…...

OpenResty 深度解析:构建高性能 Web 服务的终极方案

引言 openresty是什么&#xff1f;在我个人对它的理解来看相当于嵌入了lua的nginx; 我们在nginx中嵌入lua是为了不需要再重新编译,我们只需要重新修改lua脚本,随后重启即可; 一.lua指令序列 我们分别从初始化阶段&#xff0c;重写/访问阶段&#xff0c;内容阶段&#xff0c;日志…...

什么是路由器环回接口?

路由器环回接口&#xff08;LoopbackInterface&#xff09;是网络设备中的一种逻辑虚拟接口&#xff0c;不依赖物理硬件&#xff0c;但在网络配置和管理中具有重要作用。以下是其核心要点&#xff1a; 一、基本特性 1.虚拟性与稳定性 环回接口是纯软件实现的逻辑接口&#x…...

OpenHarmony:开源操作系统重塑产业数字化底座

OpenHarmony&#xff1a;开源操作系统重塑产业数字化底座 引言&#xff1a;当操作系统成为数字公共品 在万物智联时代&#xff0c;操作系统不再是科技巨头的专属领地。华为捐赠的OpenHarmony项目&#xff0c;正以开源协作模式重构操作系统产业格局。这个脱胎于商业版本的开源…...

【MySQL进阶】如何在ubuntu下安装MySQL数据库

前言 &#x1f31f;&#x1f31f;本期讲解关于如何在ubuntu环境下安装mysql的详细介绍~~~ &#x1f308;感兴趣的小伙伴看一看小编主页&#xff1a;GGBondlctrl-CSDN博客 &#x1f525; 你的点赞就是小编不断更新的最大动力 &#x1f3…...

【数据结构】_二叉树

1.二叉树链式结构的实现 1.1 前置说明 在学习二叉树的基本操作前&#xff0c;需先要创建一棵二叉树&#xff0c;然后才能学习其相关的基本操作。由于现在大家对二叉树结构掌握还不够深入&#xff0c;为了降低大家学习成本&#xff0c;此处手动快速创建一棵简单的二叉树&#x…...

给图表组件上点“颜色” —— 我与 CodeBuddy 的合作记录

我正在参加CodeBuddy「首席试玩官」内容创作大赛&#xff0c;本文所使用的 CodeBuddy 免费下载链接&#xff1a;腾讯云代码助手 CodeBuddy - AI 时代的智能编程伙伴 前段时间&#xff0c;我在开发一个 Vue3 项目的时候&#xff0c;碰到了一个小小的挑战&#xff1a;我想做一个可…...

使用 YOLO 结合 PiscTrace 实现股票走势图像识别

在智能投研和金融分析中&#xff0c;自动识别图表中的模式&#xff08;如 K 线走势、支撑/阻力位、形态结构&#xff09;成为一种新兴手段。传统的技术分析依赖大量人工判断&#xff0c;而计算机视觉技术的发展&#xff0c;特别是 YOLO 模型在图像识别领域的高效表现&#xff0…...

OpenCV中的光流估计方法详解

文章目录 一、引言二、核心算法原理1. 光流法基本概念2. 算法实现步骤 三、代码实现详解1. 初始化设置2. 特征点检测3. 光流计算与轨迹绘制 四、实际应用效果五、优化方向六、结语 一、引言 在计算机视觉领域&#xff0c;运动目标跟踪是一个重要的研究方向&#xff0c;广泛应用…...

OpenCL C++ 常见属性与函数

核心对象与属性 对象/属性描述示例cl::Platform表示OpenCL平台cl::Platform::get(&platforms)cl::Device表示计算设备cl::Device::getDefault()cl::Context管理设备、内存和命令队列的上下文cl::Context(contextDevices)cl::CommandQueue命令队列,用于提交命令cl::Command…...

Android核心系统服务:AMS、WMS、PMS 与 system_server 进程解析

1. 引言 在 Android 系统中&#xff0c;ActivityManagerService (AMS)、WindowManagerService (WMS) 和 PackageManagerService (PMS) 是三个最核心的系统服务&#xff0c;它们分别管理着应用的生命周期、窗口显示和应用包管理。 但你是否知道&#xff0c;这些服务并不是独立…...

18.自动化生成知识图谱的多维度质量评估方法论

文章目录 一、结构维度评估1.1 拓扑结构评估1.1.1 基础图论指标1.1.2 层级结构指标 1.2 逻辑一致性评估1.2.1 形式逻辑验证1.2.2 约束满足度 二、语义维度评估2.1 语义一致性评估2.1.1 标签语义分析2.1.2 关系语义评估 2.2 语义表示质量2.2.1 嵌入质量2.2.2 上下文语义评估 三、…...

【行为型之命令模式】游戏开发实战——Unity可撤销系统与高级输入管理的架构秘钥

文章目录 ⌨️ 命令模式&#xff08;Command Pattern&#xff09;深度解析一、模式本质与核心价值二、经典UML结构三、Unity实战代码&#xff08;可撤销的建造系统&#xff09;1. 定义命令接口与接收者2. 实现具体命令3. 命令管理器&#xff08;Invoker&#xff09;4. 客户端使…...