2017년 10월 12일 목요일

Greenplum 4.x 백업 에러

Greenplum 에서 백업 실패시 확인할 사항

1. gpcrondump 로그 
20170920:15:20:20:015168 gpcrondump:mdw:gpadmin-[INFO]:-Starting Dump process
20170920:15:28:56:022485 gpcrondump:mdw:gpadmin-[INFO]:-Starting gpcrondump with args: -x edu --table-file=/home/gpadmin/utilities/command/backup_tables_list_p1997 --prefix p1997 -u /data/backup -a
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Directory /data/backup/db_dumps/20170920 exists
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Checked /data/backup on master
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Configuring for single-database, include-table dump
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Validating disk space
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Creating filter file: /data/backup/db_dumps/20170920/p1997_gp_dump_20170920152856_filter
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Creating filter file: /data/backup/db_dumps/20170920/p1997_gp_dump_20170920152856_table
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Adding compression parameter
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Adding --prefix
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Adding --no-expand-children
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Dump process command line gp_dump -p 5432 -U gpadmin --gp-d=/data/backup/db_dumps/20170920 --gp-r=/data/backup/db_dumps/20170920 --gp-s=p --gp-k=20170920152856 --no-lock --gp-c --prefix=p1997_ --no-expand-children "edu" --table-file=/tmp/include_dump_tables_filewsQMRa
20170920:15:28:57:022485 gpcrondump:mdw:gpadmin-[INFO]:-Starting Dump process
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin- [WARNING]:-Dump process returned exit code 1
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[INFO]:-Timestamp key = 20170920152856
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[INFO]:-Checked master status file and master dump file.
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[INFO]:-Inserted dump record into public.gpcrondump_history in edu database
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[WARNING]:-Dump request was incomplete, not rolling back because -r option was not supplied
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[INFO]:-Sending mail to slee@pivotal.io
20170920:15:39:04:022485 gpcrondump:mdw:gpadmin-[ ERROR]:-gpcrondump error: Dump incomplete, rollback not processed
[gpadmin@mdw gpAdminLogs]$


2. gp_dump 로그 
[gpadmin@mdw 20170920]$ cat p1997_gp_dump_20170920152856.rpt

Greenplum Database Backup Report
Timestamp Key: 20170920152856
gp_dump Command Line: -p 5432 -U gpadmin --gp-d=/data/backup/db_dumps/20170920 --gp-r=/data/backup/db_dumps/20170920 --gp-s=p --gp-k=20170920152856 --no-lock --gp-c --prefix=p1997_ --no-expand-children edu --table-file=/tmp/include_dump_tables_filewsQMRa
Pass through Command Line Options: --prefix p1997_ --table-file /tmp/include_dump_tables_filewsQMRa
Compression Program: gzip
Backup Type: Full

Individual Results
        segment 2 (dbid 4) Host sdw3.gphd.local Port 40000 Database edu BackupFile /data/backup/db_dumps/20170920/p1997_gp_dump_2_4_20170920152856.gz: Failed with error:
{ Lost response from dump agent with dbid 4 on host sdw3.gphd.local after 10 minutes.
}
        segment 1 (dbid 3) Host sdw2.gphd.local Port 40000 Database edu BackupFile /data/backup/db_dumps/20170920/p1997_gp_dump_1_3_20170920152856.gz: Succeeded
        segment 0 (dbid 2) Host sdw1.gphd.local Port 40000 Database edu BackupFile /data/backup/db_dumps/20170920/p1997_gp_dump_0_2_20170920152856.gz: Succeeded
        Master (dbid 1) Host mdw Port 5432 Database edu BackupFile /data/backup/db_dumps/20170920/p1997_gp_dump_-1_1_20170920152856.gz: Succeeded
        Master (dbid 1) Host mdw Port 5432 Database edu BackupFile /data/backup/db_dumps/20170920/p1997_gp_dump_-1_1_20170920152856.gz_post_data: Succeeded

gp_dump utility finished unsuccessfully with  1  failures.
[gpadmin@mdw 20170920]$

3. sdw3의 pg_log
제 PC VM 에서 테스트하여 시간은 맞지 않지만, 해당 시점에 에러가 발생된 건입니다.

[gpadmin@sdw3 pg_log]$ gplogfilter -t gpdb-2017-09-20*.csv

----------  /data/primary/gpseg2/pg_log/gpdb-2017-09-20_120055.csv ----------
2017-09-20 12:00:55.590986 KST|gpadmin|postgres|p4513|th-1626073312|[local]||2017-09-20 12:00:55 KST|0|||seg-1|||||FATAL: |57M01|the database system is in mirror or uninitialized mode|||||||0||postmaster.c|2972|
2017-09-20 13:03:45.880945 KST|gpadmin|edu|p15859|th-1626073312|172.16.150.133|53745|2017-09-20 13:03:45 KST|1010065|con165||seg-1|||x1010065|sx1|FATAL: |28000|no pg_hba.conf entry for host "172.16.150.133", user "gpadmin", database "edu", SSL off|||||||0||auth.c|608|
2017-09-20 13:08:25.416389 KST|gpadmin|edu|p16729|th-1626073312|172.16.150.133|53747|2017-09-20 13:08:25 KST|1010139|con183||seg-1|||x1010139|sx1|FATAL: |28000|no pg_hba.conf entry for host "172.16.150.133", user "gpadmin", database "edu", SSL off|||||||0||auth.c|608|
       in:      89 lines,      89 log entries; timestamps from 2017-09-20 12:00:55.590986 to 2017-09-20 13:09:11.920432
    match:       3 lines,       3 log entries; timestamps from 2017-09-20 12:00:55.590986 to 2017-09-20 13:08:25.416389
      out:       3 lines,       3 log entries; timestamps from 2017-09-20 12:00:55.590986 to 2017-09-20 13:08:25.416389
----------  /data/primary/gpseg2/pg_log/gpdb-2017-09-20_130947.csv ----------
2017-09-20 13:09:48.439091 KST|gpadmin|postgres|p17142|th-1876691168|[local]||2017-09-20 13:09:48 KST|0|||seg-1|||||FATAL: |57M01|the database system is in mirror or uninitialized mode|||||||0||postmaster.c|2972|
       in:      95 lines,      95 log entries; timestamps from 2017-09-20 13:09:48.439091 to 2017-09-20 14:02:21.339605
    match:       1 lines,       1 log entries; timestamps from 2017-09-20 13:09:48.439091 to 2017-09-20 13:09:48.439091
      out:       1 lines,       1 log entries; timestamps from 2017-09-20 13:09:48.439091 to 2017-09-20 13:09:48.439091
[gpadmin@sdw3 pg_log]$

4. 환경 구성
VM 환경 구성
mdw: 172.16.150.130
sdw1: 172.16.150.131
sdw2: 172.16.150.132
sdw3: 172.16.150.133

5. 발생 원인
초기 VM 구성시 DHCP으로 구성하여, 초기 구성시와 현재 IP가 달라진 상태인데,
백업시에는 세그먼트 인스턴스가 자신의 인스턴스에 접속하고, 마스터에서 접속하는 경우가 있습니다.

이때 초기 구성때의 IP가 설정이 되어 있어, 백업을 시작하거나 완료되는 시점에 에러가 떨어지는 경우
반드시 에러 발생된 세그먼트의 DB로그와 마스터의 DB로그 확인할 필요가 있습니다.

sdw3 (172.16.150.133) 장비에서 sdw3에 접속할 때 에러가 떨어진 상태입니다.
즉,  Master / Standby Master / Segment 의 pg_hba.conf 파일에 접속 권한을 설정했는지 확인해야 합니다.

## sdw3 의 pg_hba.conf 파일
# "local" is for Unix domain socket connections only
local   all         all                               trust
host    all         all         127.0.0.1/24          trust
# IPv6 local connections:
host    all         all         ::1/128               trust
host all all ::1/128 trust
host all all 172.16.150.130/32 trust    ### IP를 추가
host all all 172.16.150.133/32 trust    ###IP를 추가
host all all fe80::20c:29ff:fe7c:7d38/128 trust
host all gpadmin 172.16.150.131/32 trust
host all gpadmin ::1/128 trust
host all gpadmin fe80::20c:29ff:fe78:1aca/128 trust
# standby master host ip addresses
host    all     gpadmin 172.16.150.129/32       trust
# standby master host ip addresses
host    all     gpadmin 172.16.150.130/32       trust
# standby master host ip addresses

host    all     gpadmin 172.16.150.128/32       trust
[gpadmin@sdw3 gpseg2]$

해당 IP를 추가하고 난뒤에는 정상적으로 수행 됨.

댓글 없음:

댓글 쓰기

Greenplum Backup & Restore

Greenplum에서는 gpbackup과 gprestore를 이용해서 대량의 데이터를 병렬로 백업/병렬로 복구를 지원하고 있습니다. Full 백업이외에도 incremental 백업을 지원하고 있습니다.  - incremental 백업시에는 반드시 사전...