일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
- C
- tensorflow
- scrapy
- 파이썬
- collections
- effective python
- recursion
- RNN
- NumPy
- 하이브
- Java
- GRU
- graph
- hadoop2
- codingthematrix
- 선형대수
- 그래프이론
- 하둡2
- LSTM
- 딥러닝
- hive
- HelloWorld
- python
- yarn
- 코딩더매트릭스
- 주식분석
- C언어
- Sort
- 알고리즘
- 텐서플로
- Today
- Total
EXCELSIOR
스쿱(sqoop)과 MS-SQL 연동하기 본문
1. MS-SQL 설정
스쿱(sqoop)과 연동하기 위해서 MS-SQL SERVER 에서 설정을 몇가지 해줘야 한다.
먼저, DB에서 [보안]-[로그인]([Security]-[Logins])에 들어가서 다음과 같은 설정을 해준다.
'Enforce password policy'를 체크 해제 해준다. 스쿱과 연동하기 위해 hadoop2라는 새로운 계정을 생성했다.
그런다음 [SQL Sever]-[Properties]-[Security]에 들어가서 'Login auditing'을 None으로 설정한다.
마지막으로 SQL Server에서 네트워크 프로토콜을 확인한다.
스쿱(sqoop)은 SQL Server와 연동되기 위해서 TCP/IP 프로토콜을 사용하도록 되어있다.
2. 스쿱과 MS-SQL 연동
아래와 같은 명령어를 통해 제대로 연결되었는지 확인해 본다.
/usr/local/hadoop2/sqoop-1.4.6.bin__hadoop-2.0.4-alpha# bin/sqoop list-databases --connect jdbc:sqlserver://주소입력:1433 --username hadoop2 --password 비밀번호 Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/hadoop-2.7.3/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 16/12/08 12:40:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/12/08 12:40:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/12/08 12:40:47 INFO manager.SqlManager: Using default fetchSize of 1000 master tempdb model msdb test
3. 데이터 임포트
1) HDFS에 데이터 임포트
test DB의 customer테이블을 임포트 해본다. 스쿱 설치 디렉터리에 아래와 같은 파일을 생성한다. customer_import.sh
스쿱은 PK의 최솟값과 최댓값을 조회한 후 쿼리문을 생성한다. 따라서 --split-by 옵션을 이용하여 설정한다.
--username hadoop2 --password 비밀번호 --connect jdbc:sqlserver://주소 --table customer --split-by ID -m 1
customer_import sh를 생성한 뒤 스쿱에서 import를 선언한 뒤 --options-file옵션을 이용해 설정파일을 지정한다.
/usr/local/hadoop2/sqoop-1.4.6.bin__hadoop-2.0.4-alpha# bin/sqoop import --options-file customer_import.sh 16/12/08 13:50:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/12/08 13:50:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/12/08 13:50:29 INFO manager.SqlManager: Using default fetchSize of 1000 16/12/08 13:50:29 INFO tool.CodeGenTool: Beginning code generation 16/12/08 13:50:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [customer] AS t WHERE 1=0 16/12/08 13:50:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.7.3 Note: /tmp/sqoop-root/compile/112e84e10b6942d8fa6ccf97bfc8581c/customer.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/12/08 13:50:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/112e84e10b6942d8fa6ccf97bfc8581c/customer.jar 16/12/08 13:50:31 INFO mapreduce.ImportJobBase: Beginning import of customer 16/12/08 13:50:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 16/12/08 13:50:31 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 16/12/08 13:50:31 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 16/12/08 13:50:35 INFO db.DBInputFormat: Using read commited transaction isolation 16/12/08 13:50:35 INFO mapreduce.JobSubmitter: number of splits:1 16/12/08 13:50:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1481171359329_0002 16/12/08 13:50:35 INFO impl.YarnClientImpl: Submitted application application_1481171359329_0002 16/12/08 13:50:35 INFO mapreduce.Job: The url to track the job: http://0.0.0.0:8089/proxy/application_1481171359329_0002/ 16/12/08 13:50:35 INFO mapreduce.Job: Running job: job_1481171359329_0002 16/12/08 13:50:40 INFO mapreduce.Job: Job job_1481171359329_0002 running in uber mode : false 16/12/08 13:50:40 INFO mapreduce.Job: map 0% reduce 0% 16/12/08 13:50:45 INFO mapreduce.Job: map 100% reduce 0% 16/12/08 13:50:45 INFO mapreduce.Job: Job job_1481171359329_0002 completed successfully 16/12/08 13:50:45 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=138521 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=1504907 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=2528 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=2528 Total vcore-milliseconds taken by all map tasks=2528 Total megabyte-milliseconds taken by all map tasks=2588672 Map-Reduce Framework Map input records=101096 Map output records=101096 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=47 CPU time spent (ms)=2180 Physical memory (bytes) snapshot=186568704 Virtual memory (bytes) snapshot=1925181440 Total committed heap usage (bytes)=146800640 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1504907 16/12/08 13:50:45 INFO mapreduce.ImportJobBase: Transferred 1.4352 MB in 13.9622 seconds (105.2585 KB/sec) 16/12/08 13:50:45 INFO mapreduce.ImportJobBase: Retrieved 101096 records.
스쿱이 종료된 후 HDFS에 저장된 결과를 조회하면 아래와 같이 데이터가 임포트된 것을 확인할 수 있다.
/usr/local/hadoop2# bin/hdfs dfs -ls /user/root/customer/ Found 2 items -rw-r--r-- 1 root supergroup 0 2016-12-08 13:50 /user/root/customer/_SUCCESS -rw-r--r-- 1 root supergroup 1504907 2016-12-08 13:50 /user/root/customer/part-m-00000
'DataBase > Hadoop' 카테고리의 다른 글
Apache Tajo (아파치 타조) 개념 및 설치 (0) | 2016.12.12 |
---|---|
SQL-On-Hadoop(SQL-온-하둡) 개념 및 종류 (0) | 2016.12.09 |
스쿱(sqoop) 설치 (0) | 2016.12.07 |
아파치 스쿱(Apache Sqoop) (1) | 2016.12.07 |
하이브 버킷(Hive Bucket) (0) | 2016.11.30 |