DATA GUARD TROUBLESHOOTING
------------------------------------------------------
1) Determine if archive logs are successfully being transferred to the standby by performing a log switch on the primary and running the following query:
select dest_id,status,error from v$archive_dest where target='STANDBY';
If all remote destinations have a status of VALID then proceed to step-2.
Else proceed to Troubleshooting Log Transport Services.
2) Determine if the standby is a Physical standby or a Logical Standby. To determine the standby type run the following query on the standby:
select database_role from v$database;
If the standby is a physical standby then proceed to "Troubleshooting Redo Apply". Else proceed to "Troubleshooting Logical Apply".
Troubleshooting Log Transport Services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) Verify that the primary database is in archive log mode and has automatic archiving enabled:
select log_mode from v$database;
2) Verify that the sufficient space exist in the local archive destination as well as all destinations marked as mandatory. The following query can be used to determine all local and mandatory destinations that need to be checked:
select dest_id, destination from v$archive_dest where schedule='ACTIVE' and (binding='MANDATORY' or target='PRIMARY');
3) Determine if the last log switch to any remote destinations resulted in an error. Immediately following a log switch run the following query:
alter system switch logfile;
select dest_id, status, error from v$archive_dest where target='STANDBY';
Address any errors that are returned in the error column. Perform a log switch and re-query to determine if the issue has been resolved.
4) Determine if any error conditions have been reached by querying the v$dataguard_status view (view only available in 9.2.0 and above):
select message, to_char(timestamp,'HH:MI:SS') timestamp from v$dataguard_status where severity in ('Error','Fatal') order by timestamp;
5) Gather information about how the remote destinations are performing the archival:
select dest_id, archiver, transmit_mode, affirm,net_timeout, delay_mins, async_blocks from v$archive_dest where target='STANDBY'
6) Run the following query to determine the current sequence number, the last sequence archived, and the last sequence applied to a standby:
select ads.dest_id, max(sequence#) "Current Sequence", max(log_sequence) "Last Archived", max(applied_seq#) "Last Sequence Applied"
from v$archived_log al, v$archive_dest ad, v$archive_dest_status ads
where ad.dest_id=al.dest_id and al.dest_id=ads.dest_id group by ads.dest_id
If you are remotely archiving using the LGWR process then the archived sequence should be one higher than the current sequence. If remotely archiving using the ARCH process then the archived sequence should be equal to the current sequence. The applied sequence information is updated at log switch time.
Troubleshooting Redo Apply Services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Verify that the last sequence# received and the last sequence# applied to standby database by running the following query:
select max(al.sequence#) "Last Seq Recieved", max(lh.sequence#) "Last Seq Applied" from v$archived_log al, v$log_history lh;
If the two numbers are the same then the standby has applied all redo sent by the primary. If the numbers differ by more than 1 then proceed to step-2.
2. Verify that the standby is in the mounted state:
select open_mode from v$database;
3. Determine if there is an archive gap on your physical standby database by querying the V$ARCHIVE_GAP view as shown in the following query:
select * from v$archive_gap;
The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next gap that is currently blocking redo apply from continuing. After resolving the identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again on the physical standby database to determine the next gap sequence, if there is one. Repeat this process until there are no more gaps.
4. Verify that managed recovery is running:
select process,status from v$managed_standby;
When managed recovery is running you will see an MRP process. If you do not see an MRP process then start managed recovery by issuing the following command:
recover managed standby database disconnect;
Some possible statuses for the MRP are listed below:
ERROR - This means that the process has failed. See the alert log or v$dataguard_status for further information.
WAIT_FOR_LOG - Process is waiting for the archived redo log to be completed. Switch an archive log on the primary and requery v$managed_standby to see if the status changes to APPLYING_LOG.
WAIT_FOR_GAP - Process is waiting for the archive gap to be resolved. Review the alert log to see if FAL_SERVER has been called to resolve the gap.
APPLYING_LOG - Process is applying the archived redo log to the standby database.
Troubleshooting SQL Apply services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Verify that log apply services on the standby are currently running.
To verify that logical apply is currently available to apply changes perform the following query:
SQL> SELECT PID, TYPE, STATUS, HIGH_SCN FROM V$LOGSTDBY;
When querying the V$LOGSTDBY view, pay special attention to the HIGH_SCN column. This is an activity indicator. As long as it is changing each time you query the V$LOGSTDBY view, progress is being made. The STATUS column gives a text description of the current activity.
If the query against V$LOGSTDBY returns no rows then logical apply is not running. Start logical apply by issuing the following statement:
SQL> alter database start logical standby apply;
If the query against V$LOGSTDBY continues to return no rows then proceed to step-2.
2. To determine if there is an archive gap, query the DBA_LOGSTDBY_LOG view on the logical standby database.
select substr(file_name,1,25) file_name, substr(sequence#,1,4) ""seq#"", first_change#, next_change#, to_char(timestamp, 'hh:mi:ss') timestamp, dict_begin beg, dict_end end, substr(thread#,1,4) ""thr#"" from dba_logstdby_log order by sequence#;
Copy the missing logs to the logical standby system and register them using the ALTER DATABASE REGISTER LOGICAL LOGFILE statement on your logical standby database. For example:
SQL> ALTER DATABASE REGISTER LOGICAL LOGFILE '/u01/oradata/arch/1_57.arc';
After you register these logs on the logical standby database, you can restart log apply services. The DBA_LOGSTDBY_LOG view on a logical standby database only returns the next gap that is currently blocking SQL apply operations from continuing. After resolving the identified gap and starting log apply services, query the DBA_LOGSTDBY_LOG view again on the logical standby database to determine the next gap sequence, if there is one.
>>Repeat this process until there are no more gaps.
6. Determine is logical apply is receiving errors while performing apply operations.
Log apply services cannot apply unsupported DML statements, DDL statements, and Oracle supplied packages to a logical standby database in SQL apply mode. When an unsupported statement or package is encountered, SQL apply operations stop. To determine if SQL apply has stopped due to errors we should query the DBA_LOGSTDBY_EVENTS view. When querying the view, select the columns in order by EVENT_TIME. This ordering ensures that a shutdown failure appears last in the view. For example:
select xidusn, xidslt, xidsqn, status, status_code from dba_logstdby_events where event_time = (select max(event_time) from dba_logstdby_events);
If an error requiring database management occurred (such as adding a tablespace, datafile, or running out of space in a tablespace), then you can fix the problem manually and resume SQL apply.
If an error occurred because a SQL statement was entered incorrectly, conflicted with an existing object, or violated a constraint then enter the correct SQL statement and use the DBMS_LOGSTDBY.SKIP_TRANSACTION procedure to ensure that the incorrect statement is ignored the next time SQL apply operations are run.
7. Query DBA_LOGSTDBY_PROGRESS to verify that log apply services is progressing.
The DBA_LOGSTDBY_PROGRESS view describes the progress of SQL apply operations on the logical standby databases. For example:
SELECT APPLIED_SCN, APPLIED_TIME, READ_SCN, READ_TIME, NEWEST_SCN, NEWEST_TIME FROM DBA_LOGSTDBY_PROGRESS;
The APPLIED_SCN indicates that committed transactions at or below that SCN have been applied. The NEWEST_SCN is the maximum SCN to which data could be applied if no more logs were received. This is usually the MAX(NEXT_CHANGE#)-1 from DBA_LOGSTDBY_LOG when there are no gaps in the list. When the value of NEWEST_SCN and APPLIED_SCN are the equal then all available changes have been applied. If you APPLIED_SCN is below NEWEST_SCN and is increasing then SQL apply is currently processing changes.
8. Verify that the table that is not receiving rows is not listed in the DBA_LOGSTDBY_UNSUPPORTED.
The DBA_LOGSTDBY_USUPPORTED view lists all of the tables that contain datatypes not supported by logical standby databases in the current release.
These tables are not maintained (will not have DML applied) by the logical standby database. Query this view on the primary database to ensure that, those tables necessary for critical applications are not in this list. If the primary database includes unsupported tables that are critical, consider using a physical standby database.
=========================
TROUBLESHOOTING A PHYSICAL STANDBY DATABASE:
NOTE: Pls check Metalink
232649.1 (Data Guard Gap Detection and Resolution)
On Standby server:
Run the below query to check the type of Standby database, PHYSCIAL or LOGICAL:
sqlplus "/ as sysdba"
select database_role from v$database;
If Physical Standby then follow:
Step1: Check which logs have not been applied:
======
alter session set nls_date_format='YYYY-MM-DD HH24:MI.SS';
SELECT SEQUENCE#, APPLIED, completion_time FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
Step2:Check if there is a gap in the archive logs:
======
SELECT * FROM V$ARCHIVE_GAP;
If there is a gap, then it is most likely that the log has been compressed on the Primary server, and the Standby FAL service cannot retrieve the log.If so, then temporarily stop archivelog compression job on the primary and unzip the required archive logs. After a few minutes, the FAL service will retrieve the log and the Standby apply services will resume.Check the progress by running the SQL in step-1 above.
If the logs haven't been processed after 5-10 minutes, then you will have to perform the following tasks:
Step3: Copy the (zipped) log to the standby archive log destination on the Standby server, (unzip the archive), and register,
ALTER DATABASE REGISTER LOGFILE '/u01/oradata/stby/arch/arch_1_443.arc';
Step4: Check if this is a 'real-time apply standby:
=======
select recovery_mode from V$ARCHIVE_DEST_STATUS;
Step5: Stop/restart the standby apply services:
=======
alter database recover managed standby database cancel;
If a real-time apply standby then:
alter database recover managed standby database using current logfile disconnect from session;
Found this:
RECOVER MANAGED STANDBY DATABASE cancel;
ORA-16136: Managed Standby Recovery not active
RECOVER MANAGED STANDBY DATABASE disconnect from session;
Media recovery complete.
Else (non- realtime apply):
alter database recover managed standby database disconnect from session;
Check the progress by running the SQL in step-1 above.
Useful Standby query:
----------------------------
Startup standby database
startup nomount;
alter database mount standby database;
alter database recover managed standby database disconnect;
To remove a delay from a standby
alter database recover managed standby database cancel;
alter database recover managed standby database nodelay disconnect;
Cancel managed recovery
alter database recover managed standby database cancel;
Register a missing log file
alter database register physical logfile '<fullpath/filename>';
If FAL doesn't work and it says the log is already registered
alter database register or replace physical logfile '<fullpath/filename>';
If that doesn't work, try this...
shutdown immediate
startup nomount
alter database mount standby database;
alter database recover automatic standby database;
>> wait for the recovery to finish - then cancel
shutdown immediate
startup nomount
alter database mount standby database;
alter database recover managed standby database disconnect;
Check which logs are missing (Run this on the standby)
select local.thread#, local.sequence# from
(select thread#, sequence# from v$archived_log where dest_id=1) local where local.sequence# not in
(select sequence# from v$archived_log where dest_id=2 and thread# = local.thread#);
Disable/Enable archive log destinations
alter system set log_archive_dest_state_2 = 'defer';
alter system set log_archive_dest_state_2 = 'enable';
Turn on fal tracing on the primary db
alter system set LOG_ARCHIVE_TRACE = 128;
Stop the Data Guard broker
alter system set dg_broker_start=false;
Show the current instance role
select name, open_mode, database_role from v$database;
=====
Logical standby apply stop/start
Stop Logical standby >> alter database stop logical standby apply;
Start Logical standby >> alter database start logical standby apply;
See how up to date a physical standby is: (Run this on the primary)
set numwidth 15
select max(sequence#) current_seq from v$log;
Then run this on the
standby
set numwidth 15
select max(applied_seq#) last_seq from v$archive_dest_status;
Display info about all log destinations (run on the primary)
set lines 100 set numwidth 15 column ID format 99 column "SRLs" format 99 column active format 99 col type format a4
select ds.dest_id id , ad.status , ds.database_mode db_mode , ad.archiver type , ds.recovery_mode , ds.protection_mode , ds.standby_logfile_count "SRLs" , ds.standby_logfile_active active , ds.archived_seq# from v$archive_dest_status ds , v$archive_dest ad where ds.dest_id = ad.dest_id and ad.status != 'INACTIVE' order by ds.dest_id;
Display log destinations options (run on the primary)
set numwidth 8 lines 100 column id format 99
select dest_id id , archiver , transmit_mode , affirm , async_blocks async , net_timeout net_time , delay_mins delay , reopen_secs reopen , register,binding from v$archive_dest order by dest_id;
List any standby redo logs
set lines 100 pages 999 col member format a70
select st.group# , st.sequence# , ceil(st.bytes / 1048576) mb , lf.member from v$standby_log st , v$logfile lf where st.group# = lf.group#;
Script for Standby archivelog monitoring….(removed the duplicate rows)
select arch.thread# "Thread", arch.sequence# "Last Sequence Received", appl.sequence# "Last Sequence Applied", (arch.sequence# - appl.sequence#) "Difference" from
(select thread# ,sequence# from v$archived_log where (thread#,first_time ) in (select thread#,max(first_time) from v$archived_log group by thread#)) arch,
(select thread# ,sequence# from v$log_history where (thread#,first_time ) in (select thread#,max(first_time) from v$log_history group by thread#)) appl
where arch.thread# = appl.thread#
order by 1;