1. PLR 설치
1) 파일 다운 로드
- network.pivotal.io 에서 버전에 맞는 pl/r 다운로드
- Data > Pivotal Greenplum > 6.1.0 > Greenplum Procedural Languages > PL/R for RHEL 7
2) 파일 Copy
$ scp ~/Downloads/plr-3.0.3-gp6-rhel7-x86_64.gppkg gpadmin@172.16.25.141:/home/gpadmin
3) PLR 설치
## Greenplum 버전 확인
$ psql -c "select version();"
## plr 패키지 설치
[gpadmin@mdw ~]$ gppkg -i plr-3.0.3-gp6-rhel7-x86_64.gppkg
20191203:09:12:24:084482 gppkg:mdw:gpadmin-[INFO]:-Starting gppkg with args: -i plr-3.0.3-gp6-rhel7-x86_64.gppkg
20191203:09:12:24:084482 gppkg:mdw:gpadmin-[INFO]:-Installing package plr-3.0.3-gp6-rhel7-x86_64.gppkg
20191203:09:12:25:084482 gppkg:mdw:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-6.1.0/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-6.1.0/.tmp/plr-3.0-1.x86_64.rpm --dbpath /usr/local/greenplum-db-6.1.0/share/packages/database --prefix /usr/local/greenplum-db-6.1.0'
20191203:09:12:35:084482 gppkg:mdw:gpadmin-[INFO]:-Installing plr-3.0.3-gp6-rhel7-x86_64.gppkg locally
20191203:09:12:36:084482 gppkg:mdw:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db-6.1.0/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-6.1.0/.tmp/plr-3.0-1.x86_64.rpm --dbpath /usr/local/greenplum-db-6.1.0/share/packages/database --prefix /usr/local/greenplum-db-6.1.0'
20191203:09:12:36:084482 gppkg:mdw:gpadmin-[INFO]:-Installing rpms cmdStr='rpm -i /usr/local/greenplum-db-6.1.0/.tmp/R-3.3.3-1.x86_64.rpm /usr/local/greenplum-db-6.1.0/.tmp/plr-3.0-1.x86_64.rpm --dbpath /usr/local/greenplum-db-6.1.0/share/packages/database --prefix=/usr/local/greenplum-db-6.1.0'
20191203:09:12:41:084482 gppkg:mdw:gpadmin-[INFO]:-Completed local installation of plr-3.0.3-gp6-rhel7-x86_64.gppkg.
20191203:09:12:41:084482 gppkg:mdw:gpadmin-[INFO]:--
==========================================================================
PL/R installation is complete! To proceed, create PL/R language in the
target database with:
"CREATE EXTENSION plr;"
--------------------------------------------------------------------------
DEPRECATED
In an old way, you can enable PL/R helper functions by running
"psql -f $GPHOME/share/postgresql/extension/plr.sql -d mydatabase"
==========================================================================
20191203:09:12:41:084482 gppkg:mdw:gpadmin-[INFO]:-plr-3.0.3-gp6-rhel7-x86_64.gppkg successfully installed.
[gpadmin@mdw ~]$
4) DB Restart & plr 생성
[gpadmin@mdw ~]$ gpstop -af
[gpadmin@mdw ~]$ . /usr/local/greenplum-db/greenplum_path.sh ##> plr 설정 환경 셋팅을 다시 읽어와야 함.
[gpadmin@mdw ~]$ gpstart -a
[gpadmin@mdw ~]$ psql
psql (9.4.24)
Type "help" for help.
gpadmin=# CREATE EXTENSION plr;
CREATE EXTENSION
gpadmin=# \q
[gpadmin@mdw ~]$
[gpadmin@mdw ~]$ echo $R_HOME
/usr/local/greenplum-db/./ext/R-3.3.3
[gpadmin@mdw ~]$
[gpadmin@mdw ~]$ gpssh -f hostfile
=> . /usr/local/greenplum-db/greenplum_path.sh
[sdw1]
[sdw2]
[ mdw]
=> echo $R_HOME
[sdw1] /usr/local/greenplum-db/./ext/R-3.3.3
[sdw2] /usr/local/greenplum-db/./ext/R-3.3.3
[ mdw] /usr/local/greenplum-db/./ext/R-3.3.3
=>
2. plr 테스트
1) 패키지 로딩 테스트
CREATE OR REPLACE FUNCTION r_test_require(fname text)
RETURNS boolean AS
$BODY$
return(require(fname,character.only=T))
$BODY$
LANGUAGE 'plr';
2) 마스터 로딩 테스트
=# SELECT R_test_require('rpart');
3) 모든 세그먼트 로딩 테스트
=# CREATE TABLE simple_series AS (SELECT generate_series(0,1000) AS id) distributed by (id);
=# select gp_segment_id, R_test_require('rpart'), count(*) from simple_series group by 1, 2;
gp_segment_id | r_test_require | count
---------------+----------------+-------
0 | t | 257
1 | t | 252
2 | t | 252
3 | t | 240
=> rpart 라이브러리가 정상적으로 로딩되었을 경우, "t"로 출력, 로딩되지 않았을 경우 "f"로 출력
4) plr 환경 확인
# SELECT * FROM plr_environ();
3. R 최신 버전 설치
기존에 Greenplum에 인스톨된 R-3.3.3 이외에 신규 버전 설치하는 방법
1) R 설치를 위해 사전에 필요한 rpm 설치
* 설치시 모든 노드에 설치 필요
$ su -
# yum install readline-devel -y
# yum install gcc-gfortran -y
# yum install gcc-c++ -y
# yum install zlib-devel -y
# yum install bzip2-devel -y
# yum install xz-devel -y
# yum install pcre-devel -y
# yum install libcurl-devel -y
## RPM 설치시, R-3.6.0 (현재까지)
# yum install -y epel-release
# yum install -y R
2) R 테스트
$ cd /usr/lib64/R/bin
$ ./R
> library(spatial)
3) 권한
* 현재 root 권한으로 설치되어 있음.
* 만약 gpadmin 계정으로 권한을 변경하면 root 계정없이 R 확장 패키지를 gpadmin 계정으로 추가할 수 있음.
* 선택 사항
# chown -R gpadmin:gpadmin /usr/lib64/R
# chown -R gpadmin:gpadmin /usr/share/doc/R-3.6.0
4. Greenplum R 환경 설정 변경
1) 설정 파일 변경
$ cp /usr/local/greenplum-db/greenplum_path.sh /usr/local/greenplum-db/greenplum_path.sh.org
$ vi /usr/local/greenplum-db/greenplum_path.sh
## 아래 부분 수정
#export R_HOME=$GPHOME/ext/R-3.3.3
export R_HOME=/usr/lib64/R
#export LD_LIBRARY_PATH=$GPHOME/ext/R-3.3.3/lib:$GPHOME/ext/R-3.3.3/extlib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$R_HOME/lib:$LD_LIBRARY_PATH
#export PATH=$GPHOME/ext/R-3.3.3/bin:$PATH
export PATH=$R_HOME/bin:$PATH
## 모든 노드에 Copy
$ scp /usr/local/greenplum-db/greenplum_path.sh sdw1:/usr/local/greenplum-db/greenplum_path.sh
$ scp /usr/local/greenplum-db/greenplum_path.sh sdw2:/usr/local/greenplum-db/greenplum_path.sh
..
모든 노드에 copy
2) 적용
$ gpstop -af
$ source /usr/local/greenlum-db/greenplum_path.sh
$ gpstart -a
3) 확인
## 마스터 인스턴스
$ psql
gpadmin=# SELECT * FROM plr_environ();
name | value
-----------------------+-------------------------------------------------------------------------------------------------------------------------------------
XDG_SESSION_ID | 64018
HOSTNAME | mdw
GPERA | 8e80243d0c4c752f_191205202827
MASTER_DATA_DIRECTORY | /data/master/gpxeg-1
SHELL | /bin/bash
TERM | xterm-256color
HISTSIZE | 1000
PG_GRANDPARENT_PID | 85665
GPHOME | /usr/local/greenplum-db/.
USER | gpadmin
LD_LIBRARY_PATH | /usr/lib64/R/lib:/usr/local/greenplum-db/./lib:/usr/local/greenplum-db/./ext/python/lib:
LS_COLORS | rs=0:di=38;5;27:ln=38;5;51:................:
MAIL | /var/spool/mail/gpadmin
PATH | /usr/local/greenplum-cc-web-6.0.0/bin:/usr/lib64/R/bin:/usr/local/greenplum-db/./bin:/usr/local/greenplum-db/./ext/python/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/gpadmin/.local/bin:/home/gpadmin/bin
_ | /bin/env
PWD | /home/gpadmin
LANG | en_US.UTF-8
PYTHONHOME | /usr/local/greenplum-db/./ext/python
PGSYSCONFDIR | /usr/local/greenplum-db-6.1.0/etc/postgresql
HISTCONTROL | ignoredups
SHLVL | 2
HOME | /home/gpadmin
PYTHONPATH | /usr/local/greenplum-db/./lib/python
LOGNAME | gpadmin
GPSESSID | 0000000000
PGDATA | /data/master/gpxeg-1
LESSOPEN | ||/usr/bin/lesspipe.sh %s
R_HOME | /usr/lib64/R
XDG_RUNTIME_DIR | /run/user/1000
GPCC_HOME | /usr/local/greenplum-cc-web-6.0.0
LC_COLLATE | en_US.utf8
LC_CTYPE | en_US.utf8
LC_MESSAGES | en_US.utf8
LC_MONETARY | C
LC_NUMERIC | C
LC_TIME | C
(36 rows)
Time: 63.645 ms
## Segment 인스턴스의 plr 설정 확인
$ gpstate -c
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:--Primary list [Mirror not used]
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:- Primary Datadir Port
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/primary/gpxeg0 6000
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:- sdw1 /data/primary/gpxeg1 6001
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/primary/gpxeg2 6000
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:- sdw2 /data/primary/gpxeg3 6001
20191206:09:05:11:012575 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
$ PGOPTIONS='-c gp_session_role=utility' psql -h sdw1 -p 6000
[gpadmin@mdw lib]$ PGOPTIONS='-c gp_session_role=utility' psql -h sdw1 -p 6000
Timing is on.
psql (9.4.24)
Type "help" for help.
gpadmin=# SELECT * FROM plr_environ();
name | value
--------------------+-----------------------------------------------------------------------------------------------------------------
XDG_SESSION_ID | 43504
GPERA | 8e80243d0c4c752f_191205202827
SHELL | /bin/bash
SSH_CLIENT | 172.16.25.141 27704 22
PG_GRANDPARENT_PID | 42323
LD_PRELOAD | /lib64/libz.so.1
GPHOME | /usr/local/greenplum-db/.
USER | gpadmin
LD_LIBRARY_PATH | /usr/lib64/R/lib:/usr/local/greenplum-db/./lib:/usr/local/greenplum-db/./ext/python/lib:
MAIL | /var/mail/gpadmin
PATH | /usr/lib64/R/bin:/usr/local/greenplum-db/./bin:/usr/local/greenplum-db/./ext/python/bin:/usr/local/bin:/usr/bin
_ | /usr/bin/env
PWD | /home/gpadmin
LANG | en_US.UTF-8
PYTHONHOME | /usr/local/greenplum-db/./ext/python
PGSYSCONFDIR | /usr/local/greenplum-db-6.1.0/etc/postgresql
SHLVL | 2
HOME | /home/gpadmin
PYTHONPATH | /usr/local/greenplum-db/./lib/python
LOGNAME | gpadmin
GPSESSID | 0000000000
SSH_CONNECTION | 172.16.25.141 27704 172.16.25.133 22
PGDATA | /data/primary/gpxeg0
LESSOPEN | ||/usr/bin/lesspipe.sh %s
R_HOME | /usr/lib64/R
XDG_RUNTIME_DIR | /run/user/1000
LC_COLLATE | en_US.utf8
LC_CTYPE | en_US.utf8
LC_MESSAGES | en_US.utf8
LC_MONETARY | C
LC_NUMERIC | C
LC_TIME | C
(32 rows)
Time: 80.369 ms
gpadmin=#
4) 패키지 버전 확인
$ gpssh -f hostfile
=> echo "packageVersion('rpart')" | R --no-save
[sdw1] > packageVersion('rpart')
[sdw2] [1] ‘4.1.15’
[sdw2] > packageVersion('rpart')
[sdw2] [1] ‘4.1.15’
[sdw2] >
5) PLR 로딩 테스트는 2.1)를 참조
5. R 패키지 다운로드 및 설치 (의존성 포함)
- 개별적으로 패키지를 다운로드 받을 수 있지만, 의존성이 안 맞을 경우 에러 발생
- 의존성까지 포함된 R 패키지까지 일괄 다운로드
1) R 패키지 다운로드
$ R
> mainDir <- "/tmp"
> subDir <- "r-pkg"
> dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
> getPackages <- function(packs){
packages <- unlist(
# Find (recursively) dependencies or reverse dependencies of packages.
tools::package_dependencies(packs, available.packages(), which=c("Depends", "Imports"), recursive=TRUE)
)
packages <- union(packs, packages)
return(packages)
}
## 1개의 패키지 download 시
> packages <- getPackages(c("data.table"))
> download.packages(packages, destdir=file.path(mainDir, subDir))
Content type 'application/x-gzip' length 4944372 bytes (4.7 MB)
==================================================
downloaded 4.7 MB
## multi 패키지 download 시, getPackages에서 다운 받을 패키지를 아래와 같은 형태로 추가
> packages <- getPackages(c("forecast", "lubridate"))
> download.packages(packages, destdir=file.path(mainDir, subDir))
trying URL 'https://cran.seoul.go.kr/src/contrib/data.table_1.12.6.tar.gz'
Content type 'application/x-gzip' length 4944372 bytes (4.7 MB)
==================================================
downloaded 4.7 MB
trying URL 'https://cran.seoul.go.kr/src/contrib/PivotalR_0.1.18.3.1.tar.gz'
Content type 'application/x-gzip' length 1888833 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
....
....
....
[,1] [,2]
[1,] "data.table" "/data/r-pkg/data.table_1.12.6.tar.gz"
[2,] "PivotalR" "/data/r-pkg/PivotalR_0.1.18.3.1.tar.gz"
[3,] "forecast" "/data/r-pkg/forecast_8.10.tar.gz"
[4,] "tseries" "/data/r-pkg/tseries_0.10-47.tar.gz"
[5,] "TTR" "/data/r-pkg/TTR_0.23-5.tar.gz"
.....
.....
[51,] "stringi" "/data/r-pkg/stringi_1.4.3.tar.gz"
[52,] "backports" "/data/r-pkg/backports_1.1.5.tar.gz"
[53,] "ellipsis" "/data/r-pkg/ellipsis_0.3.0.tar.gz"
[54,] "zeallot" "/data/r-pkg/zeallot_0.1.0.tar.gz"
2) R 다운로드 패키지 모든 노드에 복사 및 설치
$ cat hostfile
sdw1
sdw2
$ gpssh -f hostfile -e "mkdir -p /tmp/r-pkg"
$ gpscp -f hostfile /tmp/r-pkg/* =:/tmp/r-pkg
$ gpssh -f hostfile -e "ls -la /tmp/r-pkg"
$ gpssh -f hostfile
$ R CMD INSTALL `ls *.gz | perl -pe 's/\n/ /g'`
$ R CMD INSTALL `ls *.gz | egrep -v "cli|ellipsis|forecast" | perl -pe 's/\n/ /g'`
3) Install 최적 방안
- 의존성이 있어서, 순서가 중요함.
- 에러 발생될 때 마다 해당 라이브러리를 먼저 설치한다.
- 아래 스크립트를 사용하면 보다 빨리 설치가 가능함.
[gpadmin@sdw1 r-pkg]$ cat install.R
#! /usr/bin/env Rscript
download.path <- "/tmp/r-pkg"
pk.list <- list.files(download.path)
pk.list <- setdiff(pk.list, installed.packages()[, "Package"])
for(i in pk.list){
pk.path <- file.path(download.path, i)
install.packages(pk.path, repos = NULL, type="source")
}
$ Rscript install.R
## 실행 결과
...
1: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/download.R' had non-zero exit status
2: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/forecast_8.10.tar.gz' had non-zero exit status
3: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/ggplot2_3.2.1.tar.gz' had non-zero exit status
4: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/install.R' had non-zero exit status
5: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/pillar_1.4.2.tar.gz' had non-zero exit status
6: In install.packages(pk.path, repos = NULL, type = "source") :
installation of package '/tmp/r-pkg/tibble_2.1.3.tar.gz' had non-zero exit status
## 설치가 되지 않은 패키지에 대해서 수작업으로 설치
# Rscript install.R 으로 설치해도 되나, 시간을 줄이기 위해서 에러난 패키지만 수동으로 설치
$ R CMD INTALL /tmp/r-pkg/tibble_2.1.3.tar.gz
$ R CMD INTALL /tmp/r-pkg/pillar_1.4.2.tar.gz
## 설치가 되는 부분까지 끝나면 다시 install.R 수행
$ Rscript install.R
4) 기타 설치 방안
- 3번 안 설치를 권고.. 해보니깐 3)안이 가장 빠른 방법
- 3번으로 설치시 skip 해도 됨.
** 실행
$ R CMD INSTALL `ls *.gz | perl -pe 's/\n/ /g'`
** 에러 발생시
ERROR: dependency ‘zoo’ is not available for package ‘lmtest’
* removing ‘/usr/lib64/R/library/lmtest’
Making 'packages.html' ... done
$ R CMD INSTALL zoo_1.8-6.tar.gz
$ R CMD INSTALL lmtest_0.9-37.tar.gz
5) R 패키지 로딩 확인
$ gpssh -f hostfile
=> echo "packageVersion('forecast')" | R --no-save | egrep "forecast|\["
[sdw1] > packageVersion('forecast')
[sdw1] [1] ‘8.10’
[sdw2] > packageVersion('forecast')
[sdw2] [1] ‘8.10’
[ mdw] > packageVersion('forecast')
[ mdw] [1] ‘8.10’
=>
## 정상일때
$ ssh sdw1
# R
> library(forecast)
Registered S3 method overwritten by 'xts':
method from
as.zoo.xts zoo
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Registered S3 methods overwritten by 'forecast':
method from
fitted.fracdiff fracdiff
residuals.fracdiff fracdiff
>
## 정상 로딩이 안 되었을때
$ ssh mdw
# R
> library(forecast)
Error in library(forecast) : there is no package called ‘forecast’
>
## 라이브러리 갯수 확인할 때
[gpadmin@mdw r-pkg]$ gpssh -f ~gpadmin/hostfile
=> cd /usr/lib64/R/library
[sdw1]
[sdw2]
[ mdw]
=> ls -la | wc -l
[sdw1] 85
[sdw2] 85
[ mdw] 85
=>
참고 자료
1) PLR 기본 개념
- http://pivotalsoftware.github.io/gp-r/
2) R 패키지 다운로드
- https://rfriend.tistory.com/441
3) R 의존성 인스톨 방법
- http://oskardevelopers.blogspot.com/2018/03/rit-offline-r-cran-repository.html
기타 사항
1) R 패키지를 Source로 다운하여 설치를 하였는데, R 사용까지는 문제가 없으나,
Greenplum의 plr로 사용시 의존성이 이전 버전을 참고하는 경우가 있어서 yum install 을 권고