Excelsior-JH 2016. 11. 11. 22:24

1. HDFS에 파일 저장하기

  • /usr/local/hadoop2/에 있는 'NOTICE.txt' 파일을 HDFS에 저장한다. 
  • 하둡2는 하둡1과는 달리 /user 디렉터리도 관리자가 직접 생성해야 한다. 
  • hadoop 명령어의 fsshell은 사용 중지됐기 때문에 hdfs의 dfs 옵션을 이용한다.
bin/hdfs dfs -mkdir /user 
bin/hdfs dfs -mkdir /user/root  ## 반드시 /user/xxxx <- xxxx는 사용자 이름과 일치해야한다.
bin/hdfs dfs -mkdir /user/root/conf  ##user -> root -> conf 디렉터리 생성
bin/hdfs dfs -put NOTICE.txt /user/root/conf  ## NOTICE.txt 파일 저장

2. wordcount 예제실행

#입력값은 conf/ 출력값은 output폴더(자동생성)에 저장
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount conf output 


16/11/11 22:11:52 INFO client.RMProxy: Connecting to ResourceManager at localhost/
16/11/11 22:11:52 INFO input.FileInputFormat: Total input paths to process : 1
16/11/11 22:11:53 INFO mapreduce.JobSubmitter: number of splits:1
16/11/11 22:11:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478867956244_0002
16/11/11 22:11:53 INFO impl.YarnClientImpl: Submitted application application_1478867956244_0002
16/11/11 22:11:53 INFO mapreduce.Job: The url to track the job:
16/11/11 22:11:53 INFO mapreduce.Job: Running job: job_1478867956244_0002
16/11/11 22:12:00 INFO mapreduce.Job: Job job_1478867956244_0002 running in uber mode : false
16/11/11 22:12:00 INFO mapreduce.Job:  map 0% reduce 0%
16/11/11 22:12:04 INFO mapreduce.Job:  map 100% reduce 0%
16/11/11 22:12:09 INFO mapreduce.Job:  map 100% reduce 100%
16/11/11 22:12:09 INFO mapreduce.Job: Job job_1478867956244_0002 completed successfully
16/11/11 22:12:09 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=11392
		FILE: Number of bytes written=260741
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=15090
		HDFS: Number of bytes written=8969
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1854
		Total time spent by all reduces in occupied slots (ms)=2388
		Total time spent by all map tasks (ms)=1854
		Total time spent by all reduce tasks (ms)=2388
		Total vcore-milliseconds taken by all map tasks=1854
		Total vcore-milliseconds taken by all reduce tasks=2388
		Total megabyte-milliseconds taken by all map tasks=1898496
		Total megabyte-milliseconds taken by all reduce tasks=2445312
	Map-Reduce Framework
		Map input records=437
		Map output records=1682
		Map output bytes=20803
		Map output materialized bytes=11392
		Input split bytes=112
		Combine input records=1682
		Combine output records=614
		Reduce input groups=614
		Reduce shuffle bytes=11392
		Reduce input records=614
		Reduce output records=614
		Spilled Records=1228
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=72
		CPU time spent (ms)=1210
		Physical memory (bytes) snapshot=455045120
		Virtual memory (bytes) snapshot=3837673472
		Total committed heap usage (bytes)=349700096
	Shuffle Errors
	File Input Format Counters 
		Bytes Read=14978
	File Output Format Counters 
		Bytes Written=8969

3. 결과 확인

bin/hdfs dfs -cat output/part-r-00000 | tail -5

works	1
writing,	1
written	7
you	2
zlib	1
