Hive-Demo - 源码 - 源码 - 免费下载

开通博客赚积分

发布资源赚积分

Hive-Demo

文件大小： unknow

源码售价： 5 个金币积分规则积分充值

充值1元得10金币

资源说明：Following along with the Hive tutorial at StrataConf / HadoopWorld

README

README for “Hadoop Data Warehousing with Hive”

Strata + Hadoop World 2012 Tutorial Exercises

Dean Wampler

academy@thinkbiganalytics.com

@thinkBigA

Welcome! Please follow these instructions to download the tutorial presentation and exercises.

About this Hive Tutorial

This Hive Tutorial is adapted from a longer Think Big Academy course on Hive. (The Academy is the education arm of Think Big Analytics.) We offer various public and private courses on Hadoop programming, Hive, Pig, etc. We also provide consulting on Big Data problems and their solutions, especially using Hadoop. If you want to learn more, visit thinkbiganalytics.com or send us email.

We’ll log into Amazon Elastic MapReduce (EMR) clusters[1] to do the exercises.
Feel free to pair program with a neighbor, if you want.

NOTE: The exercises should work with any version of Hive, v0.7.1 or later.

Getting Started

Download the following zip file that contains a PDF of the tutorial presentation, the exercises, the data used for the exercises, and a Hive cheat sheet:

Hive Tutorial, Exercises, Data, etc..

Unzip the tutorial.zip in a convenient place on your laptop.

If you are on Windows, you’ll need the ssh client application putty to log into the EMR servers. You can download and install it from here:

Putty Installer.

Manifest for Tutorial Zip File

Item
Whazzat?

README.html
What you’re reading!

ThinkBigAcademy-Hive-Tutorial.pdf
The tutorial presentation.

exercises
The exercises we’ll use. They are also installed on the clusters, but you’ll open them “locally” in an editor, then use copy and paste.

data
The data files we’ll use. They are here only for your reference later. We’ll use the copies already on the clusters.

HiveCheatSheat.html
A Hive cheat sheet.

exercises/.hiverc
Drop this file in the home directory on any machines where you will normally run the hive command-line interface (CLI). Hive will run the commands it contains when it starts. This file is a great place to put commands you always run on startup, such as property settings. Already on the cluster.

Log into one of the Amazon Elastic MapReduce Clusters

We have several EMR clusters running and you’ll log into one of them according to the first one or two letters of your last name, using the following table[2]:

Letters
Server Name
JobFlow ID

A
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Ba - Bh
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Bi - Bz
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Ca - Ch
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Ci - Cz
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

D
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

E - F
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

G
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

H
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

I - J
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

K - L
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Ma - Mh
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Mi - Mz
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

N - P
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Q - R
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Sa - Sh
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Si - Sz
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

T - V
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Wa - Wh
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

Wi - Z
ec2-50-19-185-170.compute-1.amazonaws.com
j-1R3E26P0T3IBK

(We’ll explain the JobFlow ID later.)

Once you have picked the correct server, use the following ssh command, for Linux, Mac OSX, or use the equivalent putty command to log into your server. You’ll be user hadoop:

ssh hadoop@ec2-NN-NN-NNN-NNN.compute-1.amazonaws.com

The password is:

strata

Finally, since you are sharing the primary user account on the cluster, create a personal work directory using mkdir for any file editing that you’ll do today. Pick a name for the directory without spaces, i.e., like a typical user name. You will use that same name for another purpose shortly, as we’ll see. After creating it, change to that directory with the cd command:

mkdir myusername
cd myusername

Please don’t break anything! ;^) Remember, you’re sharing this cluster.

Feel free to snoop around if you’re waiting for others. Note that all the Hadoop software is installed in the hadoop user’s $HOME directory, /home/hadoop.

Quick Cheat Sheet on Linux Shell Commands

If you’re not accustomed to the Linux or Mac OSX bash shell, here are a few hints[3]:

Print your current working directory

pwd

List the contents of a directory

Add the -l option to show a longer listing with more information. If you omit the directory, the current directory is used:

ls some-directory
ls -l some-directory

Change to a different directory

Four variants; using i) an absolute path, ii) a subdirectory of the current directory, iii) the parent directory of the current directory, and iv) your home directory:

cd /home/hadoop
cd exercises
cd ..
cd ~

Page through the contents of a file.

Hit the space bar to page, q to quit:

more some-file

Dump the contents without paging

I.e., “concatenate” or “cat” the file:

cat some-file

For More Information

For more information on Amazon Elastic MapReduce commands, see the
Quick Reference Guide
and the Developer Guide.

For more details on Hive, see Programming Hive or the Hive Wiki.

Visit The AWS EMR Page and the EMR Documentation page for more information about EMR. ↩

I used the following information to determine a good distribution of users across these clusters. Note that these EMR clusters will only be available during the time of the tutorial. ↩

You should learn how to use bash if you want to use Hadoop. ↩

Item	Whazzat?
`README.html`	What you’re reading!
`ThinkBigAcademy-Hive-Tutorial.pdf`	The tutorial presentation.
`exercises`	The exercises we’ll use. They are also installed on the clusters, but you’ll open them “locally” in an editor, then use copy and paste.
`data`	The data files we’ll use. They are here only for your reference later. We’ll use the copies already on the clusters.
`HiveCheatSheat.html`	A Hive cheat sheet.
`exercises/.hiverc`	Drop this file in the home directory on any machines where you will normally run the `hive` command-line interface (CLI). Hive will run the commands it contains when it starts. This file is a great place to put commands you always run on startup, such as property settings. Already on the cluster.

Letters	Server Name	JobFlow ID
`A`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Ba - Bh`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Bi - Bz`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Ca - Ch`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Ci - Cz`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`D`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`E - F`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`G`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`H`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`I - J`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`K - L`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Ma - Mh`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Mi - Mz`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`N - P`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Q - R`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Sa - Sh`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Si - Sz`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`T - V`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Wa - Wh`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`
`Wi - Z`	`ec2-50-19-185-170.compute-1.amazonaws.com`	`j-1R3E26P0T3IBK`

部分文件列表（点击文件名可查看文件内容）

					
									本源码包内暂不包含可直接显示的源代码文件，请下载源码包。