Apache iceberg example

10/20/2023

Hive catalog tablesĪs described before, tables created by the HiveCatalog with Hive engine feature enabled are directly visible by the Hive engine, so there is no need to create an overlay. Or an implementation of the Tables interface,Īnd Hive needs to be configured accordingly to operate on these different types of table. Iceberg tables are created using either a Catalog, The CREATE EXTERNAL TABLE command is used to overlay a Hive table “on top of” an existing Iceberg table. Register an AWS GlueCatalog called glue: SET -impl=.GlueCatalog Register a HadoopCatalog called hadoop: SET =hadoop SET _hive.warehouse=hdfs://:8020/warehouse Register a HiveCatalog called another_hive: SET _hive.type=hive Type of catalog: hive, hadoop, or left unset if using a custom catalogĬatalog implementation, must not be null if type is emptyĪny config key and value pairs for the catalog

To globally register different catalogs, set the following Hadoop configurations: Config Key See CREATE EXTERNAL TABLE and CREATE TABLE for more details.

So that different table types can work together in the same Hive environment. The table can be loaded directly using the table’s root location if iceberg.catalog is set to location_based_tableįor cases 2 and 3 above, users can create an overlay of an Iceberg table in the Hive metastore,.The table will be loaded using a custom catalog if iceberg.catalog is set to a catalog name (see below).The table will be loaded using a HiveCatalog that corresponds to the metastore configured in the Hive environment if no iceberg.catalog is set.To support this, a table in the Hive metastore can represent three different ways of loading an Iceberg table,ĭepending on the table’s iceberg.catalog property: Users might want to read these cross-catalog and path-based tables through the Hive engine for use cases like join. Those tables do not belong to any catalog. Iceberg also allows loading a table directly based on its path in the file system. In contrast, Iceberg supports multiple different data catalog types such as Hive, Hadoop, AWS Glue, or custom catalog implementations. Catalog Managementįrom the Hive engine’s perspective, there is only one global data catalog that is defined in the Hadoop configuration in the runtime environment. You will also need to set the following property in the Hive configuration: .properties=hive.io.,hive.io. To use the Tez engine on Hive 2.3.x, you will need to manually build Tez from the branch-0.9 branch due to a backwards incompatibility issue with Tez 0.10.1. To use the Tez engine on Hive 3.1.2 or later, Tez needs to be upgraded to >= 0.10.1 which contains a necessary fix Tez-4248. The table level configuration overrides the global Hadoop configuration. createTable (tableId, schema, spec, tableProperties ) ENGINE_HIVE_ENABLED, "true" ) // =trueĬatalog.

Here is an example of doing it programmatically:Ĭatalog catalog =. Table property configurationĪlternatively, the property can be set to true and added to the table properties when creating the Iceberg table. Starting with Apache Iceberg 0.11.0, when using Hive with Tez you also have to disable vectorization ( =false). To enable Hive support globally for an application, set =true in its Hadoop configuration.įor example, setting this in the hive-site.xml loaded by Spark will enable the storage handler for all tables created by Spark. There are two ways to enable Hive support: globally in Hadoop Configuration and per-table using a table property. turned on or off in the table properties. The storage handler is kept in sync (added or removed) every time Hive engine support for the table is updated, i.e. To avoid the appearance of broken tables in Hive, Iceberg will not add the storage handler to a table unless Hive support is enabled. If the Iceberg storage handler is not in Hive’s classpath, then Hive cannot load or update the metadata for an Iceberg table when the storage handler is set.

Please refer to Hive’s documentation for more information. There are many others ways to achieve this including adding the jar file to Hive’s auxiliary classpath so it is available by default. These are provided by the iceberg-hive-runtime jar file.įor example, if using the Hive shell, this can be achieved by issuing a statement like so: add jar /path/to/iceberg-hive-runtime.jar To enable Iceberg support in Hive, the HiveIcebergStorageHandler and supporting classes need to be made available on Hive’s classpath. Here is the current compatibility matrix for Iceberg Hive support: Feature Iceberg supports reading and writing Iceberg tables through Hive by using a StorageHandler.

0 Comments

Apache iceberg example

Leave a Reply.

Author

Archives

Categories