EXCELSIOR

스쿱(sqoop)과 MS-SQL 연동하기 본문

DataBase/Hadoop

스쿱(sqoop)과 MS-SQL 연동하기

Excelsior-JH 2016. 12. 8. 14:04

1. MS-SQL 설정

스쿱(sqoop)과 연동하기 위해서 MS-SQL SERVER 에서 설정을 몇가지 해줘야 한다.


먼저, DB에서 [보안]-[로그인]([Security]-[Logins])에 들어가서 다음과 같은 설정을 해준다.

'Enforce password policy'를 체크 해제 해준다. 스쿱과 연동하기 위해 hadoop2라는 새로운 계정을 생성했다.


그런다음 [SQL Sever]-[Properties]-[Security]에 들어가서 'Login auditing'을 None으로 설정한다.


마지막으로 SQL Server에서 네트워크 프로토콜을 확인한다. 

스쿱(sqoop)은 SQL Server와 연동되기 위해서 TCP/IP 프로토콜을 사용하도록 되어있다. 



2. 스쿱과 MS-SQL 연동

아래와 같은 명령어를 통해 제대로 연결되었는지 확인해 본다.

/usr/local/hadoop2/sqoop-1.4.6.bin__hadoop-2.0.4-alpha# bin/sqoop list-databases --connect jdbc:sqlserver://주소입력:1433 --username hadoop2 --password 비밀번호
Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/12/08 12:40:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/12/08 12:40:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/12/08 12:40:47 INFO manager.SqlManager: Using default fetchSize of 1000

master
tempdb
model
msdb
test


3. 데이터 임포트

1) HDFS에 데이터 임포트

test DB의 customer테이블을 임포트 해본다. 스쿱 설치 디렉터리에 아래와 같은 파일을 생성한다. customer_import.sh

스쿱은 PK의 최솟값과 최댓값을 조회한 후 쿼리문을 생성한다. 따라서 --split-by 옵션을 이용하여 설정한다.

--username hadoop2 --password 비밀번호 --connect jdbc:sqlserver://주소 --table customer --split-by ID -m 1

customer_import sh를 생성한 뒤 스쿱에서 import를 선언한 뒤 --options-file옵션을 이용해 설정파일을 지정한다.

/usr/local/hadoop2/sqoop-1.4.6.bin__hadoop-2.0.4-alpha# bin/sqoop import --options-file customer_import.sh 16/12/08 13:50:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/12/08 13:50:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/12/08 13:50:29 INFO manager.SqlManager: Using default fetchSize of 1000 16/12/08 13:50:29 INFO tool.CodeGenTool: Beginning code generation 16/12/08 13:50:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [customer] AS t WHERE 1=0 16/12/08 13:50:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.7.3 Note: /tmp/sqoop-root/compile/112e84e10b6942d8fa6ccf97bfc8581c/customer.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/12/08 13:50:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/112e84e10b6942d8fa6ccf97bfc8581c/customer.jar 16/12/08 13:50:31 INFO mapreduce.ImportJobBase: Beginning import of customer 16/12/08 13:50:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 16/12/08 13:50:31 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 16/12/08 13:50:31 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 16/12/08 13:50:35 INFO db.DBInputFormat: Using read commited transaction isolation 16/12/08 13:50:35 INFO mapreduce.JobSubmitter: number of splits:1 16/12/08 13:50:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1481171359329_0002 16/12/08 13:50:35 INFO impl.YarnClientImpl: Submitted application application_1481171359329_0002 16/12/08 13:50:35 INFO mapreduce.Job: The url to track the job: http://0.0.0.0:8089/proxy/application_1481171359329_0002/ 16/12/08 13:50:35 INFO mapreduce.Job: Running job: job_1481171359329_0002 16/12/08 13:50:40 INFO mapreduce.Job: Job job_1481171359329_0002 running in uber mode : false 16/12/08 13:50:40 INFO mapreduce.Job: map 0% reduce 0% 16/12/08 13:50:45 INFO mapreduce.Job: map 100% reduce 0% 16/12/08 13:50:45 INFO mapreduce.Job: Job job_1481171359329_0002 completed successfully 16/12/08 13:50:45 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=138521 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=1504907 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=2528 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=2528 Total vcore-milliseconds taken by all map tasks=2528 Total megabyte-milliseconds taken by all map tasks=2588672 Map-Reduce Framework Map input records=101096 Map output records=101096 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=47 CPU time spent (ms)=2180 Physical memory (bytes) snapshot=186568704 Virtual memory (bytes) snapshot=1925181440 Total committed heap usage (bytes)=146800640 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1504907 16/12/08 13:50:45 INFO mapreduce.ImportJobBase: Transferred 1.4352 MB in 13.9622 seconds (105.2585 KB/sec) 16/12/08 13:50:45 INFO mapreduce.ImportJobBase: Retrieved 101096 records.

스쿱이 종료된 후 HDFS에 저장된 결과를 조회하면 아래와 같이 데이터가 임포트된 것을 확인할 수 있다.

/usr/local/hadoop2# bin/hdfs dfs -ls /user/root/customer/
Found 2 items
-rw-r--r--   1 root supergroup          0 2016-12-08 13:50 /user/root/customer/_SUCCESS
-rw-r--r--   1 root supergroup    1504907 2016-12-08 13:50 /user/root/customer/part-m-00000



'DataBase > Hadoop' 카테고리의 다른 글

Apache Tajo (아파치 타조) 개념 및 설치  (0) 2016.12.12
SQL-On-Hadoop(SQL-온-하둡) 개념 및 종류  (0) 2016.12.09
스쿱(sqoop) 설치  (0) 2016.12.07
아파치 스쿱(Apache Sqoop)  (1) 2016.12.07
하이브 버킷(Hive Bucket)  (0) 2016.11.30
Comments