Logging in to internet sites (and private servers) with just a password is really not acceptable these days, at least for someone (like me) claiming to be interested in IT security. I therefore recently bought a Yubikey-4 authentication token.
Sadly, the documentation available from the manufacturer, and the internet in general, was not very helpful. I have therefore created some extensive notes on the Yubikey-4 which may be useful if you are also considering buying one (or have already done so).
I recently had a customer who suggested (for various reasons) storing large amounts of write-once data in HBase, using an (implicit) schema with long and complicated column names. I had immediate concerns about efficient use of disk storage with this approach (these were quite large amounts of data). Various sites warn about long column-names with HBase, but I could not find any actual statistics on it. A colleague and I therefore measured the efficiency of HBase with various column name lengths, and compared it to Avro.
I was part of a project that tried to do streaming processing with Spark a year or so ago. That didn’t go at all well; we had little resources and time, and (IMO) Spark-streaming was simply not mature enough for production.
One of the nasty problem we had was that landing data into Hive created large numbers of small files; Walmart Labs solve that by using KairosDB as the target storage instead. KairosDB is a layer on Cassandra, ie HBase-like.
Another serious problem with Spark-streaming is session-detection; it is possible but only with significant complexity. If I understand correctly, they solve that via the lambda archtecture - rough session detection in streaming, and better detection in the batch pass.
They still apparently had to fiddle with lots of Spark-streaming parameters though (batch duration, memory.fraction, locality.wait, executor/core ratios), and write custom monitoring code. And they were running on a dedicated spark cluster, not yarn. My conclusion from this is: yes Spark-streaming can work for production use-cases, but it is hard.
After my experiences, and some confirmation from this article, a solution based on Flink, Kafka-streaming, or maybe Apache Beam seems simpler to me. Those are all robust enough to process data fully in streaming mode, ie the kappa architecture.
Java bytecode (production) - includes Java, Scala, Groovy, Kotlin
LLVM bitcode, ie apps compiled from C, C++, Rust and other languages via the LLVM compiler (experimental)
Python, Ruby, and R (experimental)
Code in these languages can call into other code running within Graal, regardless of the language it was written in! Arranging for additional libraries (including the language standard libraries) to be available requires some steps, but is possible.
Not only does this allow running apps in a “standalone” environment, it means that any larger software package which embeds the Graal VM and allows user code to run in that VM can support any language that Graal supports. Examples include database servers which embed the VM for stored procedure logic.
With Oracle, it is important to look at the licencing terms-and-conditions. This does initially seem to be OK; the code is completely licensed under the GPL2-with-classpath-exception, like OpenJDK. Oracle does warn that there is “no support” for the open-source code (aka “community edition”) and recommends that a support licence be bought for the “enterprise edition” instead - but OpenJDK is reliable enough, and so the Graal “community edition” will hopefully be so too.