本例中udf来自《hive编程指南》其中13章自定义函数中一个例子。
按照步骤,第一步,建立一个项目,创建 GenericUDFNvl 类。
/** * 不能接受第一个参数为null的情况 * 测试过,不是很好用 */package hive.udf;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;@Description( name = "nvl", value = "_FUNC_(value,default_value) - Returns default value if value is nul else returns value", extended = "Example:\n> SELECT _FUNC_(NULL, 'bla') FROM src LIMIT 1;")public class GenericUDFNvl extends GenericUDF { private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver; private ObjectInspector[] argumentOIs; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { argumentOIs = arguments; if (arguments.length != 2) { throw new UDFArgumentLengthException("The operator 'NVL' accepts 2 arguments."); } returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true); if (!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1]))) { throw new UDFArgumentTypeException(2, "THe 1st and 2nd args of function NVL should have the same type, " + "but they are different: \"" + arguments[0].getTypeName() + "\" and \"" + arguments[1].getTypeName() + "\""); } return returnOIResolver.get(); } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { Object retVal = returnOIResolver.convertIfNecessary("", argumentOIs[0]); //if (retVal == null) { retVal = returnOIResolver.convertIfNecessary(arguments[1], argumentOIs[1]); //} return retVal; } @Override public String getDisplayString(String[] children) { StringBuilder sb = new StringBuilder(); sb.append("if "); sb.append(children[0]); sb.append(" is null "); sb.append("returns"); sb.append(children[1]); return sb.toString(); }}
创建完成之后,在项目中点右键->Export->JAR file,再下一步中选中刚刚创建的这个文件,将文件导出为.jar文件。
接下来,进入hive的 CLI,执行
hive> add jar /home/user/udfnvl.jar;
hive> create temporary function nvl as "hive.udf.GenericUDFNvl";
hive> desc function nvl;
OKnvl(value,default_value) - Returns default value if value is nul else returns valueTime taken: 0.169 secondshive> desc function extended nvl;OKnvl(value,default_value) - Returns default value if value is nul else returns valueExample:> SELECT nvl(NULL, 'bla') FROM src LIMIT 1;Time taken: 0.051 seconds
以上的整个过程比较简单,有很多UDF的例子,可以在github中找到,如https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEncode.java
但也有一些需要注意的地方,就是导出项目jar包时需要关注一下jdk的版本,需要与执行环境一致,否则会报 Unsupported major.minor version 52.0 这样的错误。