博主自主知识产权《springboot深入浅出系列课程》(16章97节文档) 已经上线,请关注

hadoop数据类型及自定义

mapreduce 字母哥 0评论

Hadoop数据类型

hadoop内置数据类型

BooleanWritable:标准布尔型数值
ByteWritable:单字节数值
DoubleWritable:双字节数值
FloatWritable:浮点数
IntWritable:整型数
LongWritable:长整型数
Text:使用UTF8格式存储的文本
NullWritable:当<key, value>中的key或value为空时使用

用户自定义数据类型的实现

1.对于只需要作为“值”出现的数据类型,实现Writable接口即可
2.对于有可能作为“键”出现的数据类型,需要实现WritableComparable接口

##实现Writable接口:


/* DataInput and DataOutput 类是java.io的类 */ public interface Writable { void readFields(DataInput in); void write(DataOutput out); }

下面是一个小例子:


public class Point3D implement Writable { public float x, y, z; public Point3D(float fx, float fy, float fz) { this.x = fx; this.y = fy; this.z = fz; } public Point3D() { this(0.0f, 0.0f, 0.0f); } public void readFields(DataInput in) throws IOException { x = in.readFloat(); y = in.readFloat(); z = in.readFloat(); } public void write(DataOutput out) throws IOException { out.writeFloat(x); out.writeFloat(y); out.writeFloat(z); } public String toString() { return Float.toString(x) + ", " + Float.toString(y) + ", " + Float.toString(z); } }

2、实现WritableComparable接口


public interface WritableComparable<T> { public void readFields(DataInput in); public void write(DataOutput out); public int compareTo(T other); }

先给出下面的简单例子,再做说明和扩展。


public class Point3D inplements WritableComparable { public float x, y, z; public Point3D(float fx, float fy, float fz) { this.x = fx; this.y = fy; this.z = fz; } public Point3D() { this(0.0f, 0.0f, 0.0f); } public void readFields(DataInput in) throws IOException { x = in.readFloat(); y = in.readFloat(); z = in.readFloat(); } public void write(DataOutput out) throws IOException { out.writeFloat(x); out.writeFloat(y); out.writeFloat(z); } public String toString() { return Float.toString(x) + ", " + Float.toString(y) + ", " + Float.toString(z); } public float distanceFromOrigin() { return (float) Math.sqrt( x*x + y*y +z*z); } //影响map输出的排序,默认是升序,return值加一个负号变降序 public int compareTo(Point3D other) { return Float.compareTo(distanceFromOrigin(),other.distanceFromOrigin()); } public boolean equals(Object o) { if( !(o instanceof Point3D)) { return false; } Point3D other = (Point3D) o; return this.x == o.x && this.y == o.y && this.z == o.z; } /* 实现 hashCode() 方法很重要 * Hadoop的Partitioners会用到这个方法,后面再说 */ public int hashCode() { return Float.floatToIntBits(x) ^ Float.floatToIntBits(y) ^ Float.floatToIntBits(z); } }

自定义Hadoop数据类型后,需要明确告诉Hadoop来使用它们。这是 JobConf 所能担当的了。

void setOutputKeyClass(Class<T> theClass)
void setOutputValueClass(Class<T> theClass)

通常(默认条件下),这个函数对Map和Reduce阶段的输出都起到作用,当然也有专门的 setMapOutputKeyClass() / setReduceOutputKeyClass() 接口。

喜欢 (1)or分享 (0)
发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址