Hive container is running beyond physical memory limits

Categories: BigData

I use the hive commandline tool to make queries against hive tables. Recently, a query failed with the error message “container is running beyond physical memory limits”.

It took me quite a while to figure out what was happening, and how to work around it. My notes can be found here.

It’s a shame that Tez/Hive don’t handle this automatically. Relational databases never report “out of memory” when running a query just because the source table is particularly large. On the other hand, this table was so large that no relational database could ever have held it…

UPDATE: Shortly after solving the above problem, I struck another out-of-memory problem in Hive which is discussed here. Fun, fun, fun…