README.txt
上传用户:quxuerui
上传日期:2018-01-08
资源大小:41811k
文件大小:2k
- *************************************************
- *** Input Files (Note: tab-separated columns) ***
- *************************************************
- [:~]$ cat datajoin/input/A
- A.a11 A.a12
- A.a21 A.a22
- B.a21 A.a32
- A.a31 A.a32
- B.a31 A.a32
- [:~]$ cat datajoin/input/B
- A.a11 B.a12
- A.a11 B.a13
- B.a11 B.a12
- B.a21 B.a22
- A.a31 B.a32
- B.a31 B.a32
- *****************************
- *** Invoke SampleDataJoin ***
- *****************************
- [:~]$ $HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input datajoin/output Text 1 org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
- Using TextInputFormat: Text
- Using TextOutputFormat: Text
- 07/06/01 19:58:23 INFO mapred.FileInputFormat: Total input paths to process : 2
- Job job_kkzk08 is submitted
- Job job_kkzk08 is still running.
- 07/06/01 19:58:24 INFO mapred.LocalJobRunner: collectedCount 5
- totalCount 5
- 07/06/01 19:58:24 INFO mapred.LocalJobRunner: collectedCount 6
- totalCount 6
- 07/06/01 19:58:24 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
- 07/06/01 19:58:24 INFO mapred.LocalJobRunner: actuallyCollectedCount 5
- collectedCount 7
- groupCount 6
- > reduce
- *******************
- *** Output File ***
- *******************
- [:~]$ cat datajoin/output/part-00000
- A.a11 A.a12 B.a12
- A.a11 A.a12 B.a13
- A.a31 A.a32 B.a32
- B.a21 A.a32 B.a22
- B.a31 A.a32 B.a32