一套Windows上的11.2.0.1单实例数据库在database open阶段出现了ORA-07445:core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []内部错误,具体的出错日志如下:
LOG CONTENT=======================ALERT.LOG============================Starting ORACLE instance (normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 0Picked latch-free SCN scheme 2Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DESTARCH: Warning; less destinations available than specifiedby LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameterAutotune of undo retention is turned on.IMODE=BRILAT =842011-08-01 13:13:47.068000 +08:00LICENSE_MAX_USERS = 0SYS auditing is disabledStarting up:Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options.Using parameter settings in server-side spfile C:\APP\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEG11R2.ORASystem parameters with non-default values: _spin_count = 2000 processes = 500 event = "10500 trace name context forever,level 8:10013 trace name context forever,level 10:10015 trace name context forever,level 10" sga_max_size = 600M shared_pool_size = 152M large_pool_size = 32M java_pool_size = 4M streams_pool_size = 0 _db_file_direct_io_count = 12 sga_target = 0 memory_target = 0 control_files = "C:\APP\ORADATA\G11R2\CONTROLFILE\O1_MF_6VWCSH9J_.CTL" control_files = "C:\APP\FLASH_RECOVERY_AREA\G11R2\CONTROLFILE\O1_MF_6VWCSHNF_.CTL" db_block_checksum = "TRUE" db_block_size = 8192 db_cache_size = 196M _shared_io_pool_size = 0 compatible = "11.2.0.0.0" log_archive_dest_2 = "service=stdby optional lgwr sync affirm valid_for=(online_logfiles,all_roles)" log_buffer = 10485760 db_create_file_dest = "C:\app\oradata" db_recovery_file_dest = "C:\app\flash_recovery_area" db_recovery_file_dest_size= 500000M undo_tablespace = "UNDOTBS1" _kgl_bucket_count = 2 remote_login_passwordfile= "EXCLUSIVE" db_domain = "" session_cached_cursors = 300 audit_file_dest = "C:\APP\ADMIN\G11R2\ADUMP" optimizer_features_enable= "10.2.0.4" audit_trail = "DB" cell_offload_plan_display= "ALWAYS" db_name = "G11R2" open_cursors = 3000 _optimizer_extended_cursor_sharing_rel= "NONE" pga_aggregate_target = 300M diagnostic_dest = "C:\APP"2011-08-01 13:13:48.164000 +08:00PMON started with pid=2, OS id=984VKTM started with pid=3, OS id=3656 at elevated priorityVKTM running at (10)millisec precision with DBRM quantum (100)msGEN0 started with pid=4, OS id=5824DIAG started with pid=5, OS id=5832DBRM started with pid=6, OS id=2784PSP0 started with pid=7, OS id=2500DIA0 started with pid=8, OS id=5320MMAN started with pid=9, OS id=4128DBW0 started with pid=10, OS id=5852LGWR started with pid=11, OS id=3960CKPT started with pid=12, OS id=4472SMON started with pid=13, OS id=5788RECO started with pid=14, OS id=6036MMON started with pid=15, OS id=5740MMNL started with pid=16, OS id=2112ORACLE_BASE from environment = C:\appalter database mount exclusive2011-08-01 13:13:52.390000 +08:00Sweep [inc][135908]: completedNSS2 started with pid=19, OS id=2728Sweep [inc][135901]: completedSuccessful mount of redo thread 1, with mount id 2704081164Database mounted in Exclusive Mode2011-08-01 13:13:53.413000 +08:00Lost write protection disabled2011-08-01 13:13:54.578000 +08:00Sweep [inc][135897]: completedSweep [inc2][135908]: completedSweep [inc2][135901]: completedSweep [inc2][135897]: completed2011-08-01 13:13:55.788000 +08:00Completed: alter database mount exclusivealter database openBeginning crash recovery of 1 threads parallel recovery started with 3 processes2011-08-01 13:13:56.959000 +08:00Started redo scanCompleted redo scan read 0 KB redo, 0 data blocks need recoveryStarted redo application at Thread 1: logseq 867, block 88140, scn 9122496Recovery of Online Redo Log: Thread 1 Group 3 Seq 867 Reading mem 0 Mem# 0: C:\APP\ORADATA\G11R2\ONLINELOG\O1_MF_3_6VWCSMPO_.LOG Mem# 1: C:\APP\FLASH_RECOVERY_AREA\G11R2\ONLINELOG\O1_MF_3_6VWCSNGX_.LOGCompleted redo application of 0.00MBCompleted crash recovery at Thread 1: logseq 867, block 88140, scn 9142497 0 data blocks read, 0 data blocks written, 0 redo k-bytes read2011-08-01 13:13:58.738000 +08:00LGWR: STARTING ARCH PROCESSESARC0 started with pid=22, OS id=47842011-08-01 13:13:59.765000 +08:00ARC0: Archival startedLGWR: STARTING ARCH PROCESSES COMPLETEARC0: STARTING ARCH PROCESSESARC1 started with pid=24, OS id=2780ARC2 started with pid=25, OS id=1288ARC1: Archival startedLGWR: Primary database is in MAXIMUM AVAILABILITY modeARC2: Archival startedARC1: Becoming the 'no FAL' ARCHARC1: Becoming the 'no SRL' ARCHARC2: Becoming the heartbeat ARCHLGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWRARC3 started with pid=26, OS id=38762011-08-01 13:14:00.828000 +08:00ARC3: Archival startedARC0: STARTING ARCH PROCESSES COMPLETENSS2 started with pid=19, OS id=51562011-08-01 13:14:29.008000 +08:00ORA-16198: LGWR received timedout error from KSR2011-08-01 13:14:35.980000 +08:00Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_lgwr_3960.trc:ORA-16198: Timeout incurred on internal channel during remote archivalLGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZEDLGWR: Continuing...ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch2011-08-01 13:14:38.629000 +08:00Trying to expand controlfile section 11 for Oracle Managed FilesException [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_arc0_4784.trc (incident=136091):ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136091\g11r2_arc0_4784_i136091.trc2011-08-01 13:14:40.283000 +08:00Trace dumping is performing id=[cdmp_20110801131440]2011-08-01 13:14:52.417000 +08:00Sweep [inc][136091]: completedSweep [inc2][136091]: completed2011-08-01 13:14:59.805000 +08:00ARC2: Detected ARCH process failureARC2: STARTING ARCH PROCESSESARC0 started with pid=19, OS id=50162011-08-01 13:15:00.836000 +08:00ARC0: Archival startedARC2: STARTING ARCH PROCESSES COMPLETE2011-08-01 13:15:36.689000 +08:00Deleted Oracle managed file C:\APP\FLASH_RECOVERY_AREA\G11R2\ARCHIVELOG\2011_08_01\O1_MF_1_866_73DFKWRK_.ARC2011-08-01 13:15:38.013000 +08:00Error 12154 received logging on to the standbyErrors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc:ORA-12154: TNS:could not resolve the connect identifier specifiedARCH: Error 12154 Creating archive log file to 'stdby'Trying to expand controlfile section 11 for Oracle Managed FilesException [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc (incident=136051):ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc2011-08-01 13:15:39.680000 +08:00Trace dumping is performing id=[cdmp_20110801131539]2011-08-01 13:15:42.782000 +08:00PMON (ospid: 984): terminating the instance due to error 3972011-08-01 13:15:50.520000 +08:00Instance terminated by PMON, pid = 984=============================g11r2_ora_4852_i136051.trc=============================Dump file c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trcOracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsWindows NT Version V6.1 Service Pack 1CPU : 4 - type 586, 2 Physical CoresProcess Affinity : 0x0x00000000Memory (Avail/Total): Ph:2122M/3566M, Ph+PgF:5413M/7130M, VA:1084M/2047MInstance name: g11r2Redo thread mounted by this instance: 1Oracle process number: 17Windows thread id: 4852, image: ORACLE.EXE (SHAD)*** 2011-08-01 13:15:38.527*** SESSION ID:(197.1) 2011-08-01 13:15:38.527*** CLIENT ID:() 2011-08-01 13:15:38.527*** SERVICE NAME:() 2011-08-01 13:15:38.527*** MODULE NAME:(oradim.exe) 2011-08-01 13:15:38.527*** ACTION NAME:() 2011-08-01 13:15:38.527Dump continued from file: c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trcORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []========= Dump for incident 136051 (ORA 7445 [kcflfi()+466]) ========----- Beginning of Customized Incident Dump(s) -----Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionProcess Id: 0x000010bc Thread Id : 0x000012f4 Time : Mon Aug 01 13:15:38Excp. Code: 0xc0000094 Excp. Type: INT_DIVIDE Flags: 0x00000000------------------- Registers ----------------------------eip = 0500282e esp = 0d9f525c ebp = 0d9f577c edi = 37eefe00 esi = 00000265eax = 00000265 ebx = 00000000 ecx = 089ee234 edx = 00000000ecs = 0000001b eds = 00000023 ees = 00000023 ess = 00000023egs = 00000000 efs = 0000003beflags = 00010246------------------- End of Registers ---------------------*** 2011-08-01 13:15:38.536dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----alter database open----------- messages from pre-loading .sym files:Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are differentSymbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different----------- end of messages from pre-loading .sym files----- Call Stack Trace -----calling call entry argument values in hexlocation type point (? means dubious value)-------------------- -------- -------------------- ----------------------------Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are differentSymbol file C:\app\product\11.2.0\dbhome_1\BIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are differentEnumerateLoadedModules64 failed with error -1073741819Symbol file oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are differentSymbol file oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different__VInfreq__kcflfi() 00000000+466_kccrszf()+287 CALLrel _kcflfi() 0 318345B8 34 31C0DD40 4000 265 4 7FFFFFFF 1 0 0_kccrsd_expd()+1418 CALLrel _kccrszf() D9F7CEC 268 264_kccwnc_reuse_expan CALLrel _kccrsd_expd() D9F7CEC B 38d()+640__VInfreq__kccwnc() CALLrel _kccwnc_reuse_expan D9F7CEC B 26+235 d()_krse_arc_complete( CALLrel _kccwnc() D9F7CEC D9F6D38 B)+1615_krse_arc_driver_co CALLrel _krse_arc_complete( D9F78ACre()+1307 )_krse_arc_driver()+ CALLrel _krse_arc_driver_co D9F7CEC 1 D9F7C6C 0 0 D9F7CC8274 re() 0 0 0 0 0 0 0_krsq_arch_to_force CALLrel _krse_arc_driver() D9F7CEC 1 D9F7C6C 0 0 D9F7CC8_switch()+196 0 0 0 0 0 0 0__VInfreq__kcttsc() CALLrel _krsq_arch_to_force D9F7CEC 1+129 _switch()_kcfopd()+1504 CALLrel _kcttsc() 2_adbdrv()+16700 CALLrel _kcfopd() 0 0 0 0 D9FBBF8_opiexe()+13594 CALLrel _adbdrv() 4A C0000094 33644518 D9FBD38 6D60697 2F3FC5F0_opiosq0()+6248 CALLrel _opiexe() 4 0 D9FC704_kpooprx()+277 CALLrel _opiosq0() 3 E D9FC970 A4 0_kpoal8()+632 CALLrel _kpooprx() D9FF074 D9FD3F8 13 1 0 A4_opiodr()+1248 CALLreg 00000000 5E 1C D9FF070___dyn_tls_init_cal CALLreg 00000000 5E 1C D9FF070 1lback()+2935122_opitsk()+1404 CALL??? 00000000 C9A10E8 5E D9FF070 0 D9FED00 D9FF19C 53E52E 0 D9FF1C8_opiino()+980 CALLrel _opitsk() 0 0_opiodr()+1248 CALLreg 00000000 3C 4 D9FFBC4_opidrv()+1201 CALLrel _opiodr() 3C 4 D9FFBC4 0_sou2o()+55 CALLrel _opidrv() 3C 4 D9FFBC4_opimai_real()+124 CALLrel _sou2o() D9FFBD4 3C 4 D9FFBC4_opimai()+125 CALLrel _opimai_real() 2 D9FFBFC_OracleThreadStart@ CALLrel _opimai() 2 D9FFF3C 0 70 FFFFFFFF4()+830 FFFFFFFF___dyn_tls_init_cal CALLptr 00000000 901FF6C D9FFFD4 776437F5lback()+366382316 901FF6C 765D34CB 0___dyn_tls_init_cal CALLreg 00000000 901FF6C 765D34CB 0 0 901FF6Clback()+367384440 0___dyn_tls_init_cal CALLrel ___dyn_tls_init_cal 401326 901FF6C 0 0 0 0lback()+367384392 lback()+36738440300000000 CALL??? 00000000--------------------- Binary Stack Dump ---------------------..................从以上日志中可以看到在”Trying to expand controlfile section 11 for Oracle Managed Files“扩扎控制文件过程中出现了 _kccwnc_reuse_expan->_kccrsd_expd->_kccrszf->_kcflfi->_VInfreq__kcflfi() 函数的7445错误,kcf意为(manages and coordinates operations on the control file(s),kcf.c),是在处理日志文件中引发了INT_DIVIDE_BY_ZERO除数为零的代码bug。 通过7445和kcflfi关键词在MOS上搜索没有太大的发现,说明该Bug的处罚几率非常低,正好让我碰到说明是某些特殊参数的设置引起了该问题。 目标锁定启动日志中的非默认隐藏参数"_db_file_direct_io_count",该参数决定了直接路径读写的IO大小,从9i开始该参数的单位调整为bytes而非原先的blocks,之前因为对该参数进行一些测试所以设置了一个较小值。
Parameter: DB_FILE_DIRECT_IO_COUNT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Versions: 8.0 - 8.1 This parameter is hidden in 9.0 onwards. Parameter type: integer Parameter class: dynamic, scope = ALTER SYSTEM DEFERRED Default value: 64 Range of values: operating system-dependentDescription:~~~~~~~~~~~~DB_FILE_DIRECT_IO_COUNT is used to specify the number of blocks to be usedfor IO operations done by backup, restore or direct path read and writefunctions. The IO buffer size is a product of DB_FILE_DIRECT_IO_COUNT andDB_BLOCK_SIZE. The IO buffer size cannot exceed max_IO_size for yourplatform.Assigning a high value to this parameter results in greater use of PGA orSGA memory.o In Oracle8i, minimize the number of I/O requests by setting the DB_FILE_DIRECT_IO_COUNT instance parameter so that DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system In Oracle8i the default for this is 64 blocks. (In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs the size of direct I/Os in BYTES (not blocks). The default is 1Mb but will be sized down if the max_io_size of the system is smaller.)ORA-19863 during RMAN duplicateApplies to:Oracle Server - Enterprise Edition - Version: 10.2.0.3This problem can occur on any platform.Symptoms-- Problem Statement:Duplicate failed during the datafile restore stage:Starting restore at 2008-Apr-09 09:28:24using channel ORA_AUX_DISK_1channel ORA_AUX_DISK_1: starting datafile backupset restorechannel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup setrestoring datafile 00003 to /u06/oradata/hcmprdc/sysaux01.dbf...restoring datafile 00121 to /u06/oradata/hcmprdc/waapp.dbfchannel ORA_AUX_DISK_1: reading from backup piece/u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of Duplicate Db command at 04/09/2008 09:28:28RMAN-03015: error occurred in stored script Memory ScriptORA-19870: error reading backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1ORA-19863: device block size 1040384 is larger than max allowed: 262144CauseThe database parameter _db_file_direct_io_count in the target and auxiliary instance does not match.Solution-- To implement the solution:Ensure that parameter _db_file_direct_io_count on the target and auxiliary database the same_DB_FILE_DIRECT_IO_COUNT need to be set to the same value between the source databasewhere the backup was taken and the target database where the backup is being restored.2.0 Size of Input/Output Buffers================================a. input buffers----------------NOTE : DB_FILE_DIRECT_IO_COUNT is not available in Oracle9i onwards. In Oracle9i, it is replaced by a hidden _DB_FILE_DIRECT_IO_COUNT which governs the size of direct I/Os in BYTES (not blocks). The default is 1Mb butwill be sized down if the max_io_size of the system is smaller.The input buffer size is: buffersize = db_block_size * db_file_direct_io_countAs there are 4 input buffers, the total input buffer memory use per channel is: memory(input) = #buffers * #files * buffersize = 4 * #files * buffersizeFor example, if 2 channels are used, and each of these channels backs up 3files, then for each channel memory(input) = 4 * 3 * db_block_size * db_file_direct_io_countb. output buffers-----------------For disk channels, the output buffer size is: buffersize = db_block_size * db_file_direct_io_countFor SBT_TAPE channels, the output buffer size in Oracle8/8i is o/s dependant. (On Solaris,this defaults to 64k) On 9i/10g it defaults to 256k for all platforms. The BLKSIZE argument to 'allocate channel...' can beused to override the default value.As there are 4 output buffers, memory(output) = #buffers * buffersize = 4 * buffersize一般来说使用该隐藏参数的默认值即可,通过重置该参数后修复启动问题:
SQL> select * from v$version;BANNER--------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - ProductionPL/SQL Release 11.2.0.1.0 - ProductionCORE 11.2.0.1.0 ProductionTNS for 32-bit Windows: Version 11.2.0.1.0 - ProductionNLSRTL Version 11.2.0.1.0 - ProductionSQL> alter system reset "_db_file_direct_io_count" scope=spfile;System altered.SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.inst_id = USERENV ('Instance') AND y.inst_id = USERENV ('Instance') AND x.indx = y.indxAND x.ksppinm LIKE '%db_file_direct_io_count%'/NAME VALUE DESCRIB------------------------------ -------------------- ------------------------------_db_file_direct_io_count 1048576 Sequential I/O buf sizewindows上的11gr2默认该参数为1MB