Blog posts
- 2023-03-08 Artificial Intelligence According to John Oliver
- 2019-03-18 Talend Software Suite
- 2018-09-09 More databases - MemSQL and RocksDB
- 2018-07-21 The Snowflake Data Warehouse
- 2018-07-19 Storage Space Efficiency in Avro and HBase
- 2018-07-08 A Lambda Architecure with Spark Streaming from Walmart Labs
- 2018-04-21 Google Cloud Functions, BigQuery, and Related Matters
- 2018-04-14 Apache Beam and Google Dataflow Overview
- 2018-04-12 Apache Beam and CSV Headers
- 2018-04-11 Google Cloud Platform and AppEngine
- 2018-01-05 Accessing HDFS from Spark in Zeppelin
- 2017-12-06 Remapping DNS Lookups in a JRE
- 2017-09-20 Accessing Hive via JDBC
- 2017-09-03 Hive container is running beyond physical memory limits
- 2017-08-27 Spark Overview
- 2017-08-09 Kafka Connect JDBC Source Where Clauses
- 2017-05-18 Kafka Manager
- 2017-05-16 RabbitMQ Threading Model
- 2017-05-07 Vagrant, Kafka, Kerberos
- 2017-04-23 Apache NiFi
- 2017-03-27 Kafka Connect
- 2017-03-18 The Lambda and Kappa Design Patterns for Persistent State
- 2017-03-11 Elasticsearch5 TransportClient Mode
- 2016-10-17 Elasticsearch Overview
- 2016-06-26 Apache HBase Overview
- 2016-06-14 Apache Hive v2.0 Released
- 2016-06-06 The Kafka Message Broker
- 2016-05-16 OrangeFS
- 2016-05-07 Apache Tez - an Overview
- 2016-04-17 NoSQL - an Overview
- 2015-11-04 Big Data Overview
Articles
- 2019-03-18 Gitblit - a simple Git Repository Manager
- 2019-03-18 Talend Basic Install on Linux - Wizard
- 2019-03-18 Talend Basic Install on Linux - Manual
- 2019-03-18 Talend Basic Install on Linux
- 2019-03-18 Talend Suite Overview
- 2018-11-28 Introduction to Data Vault for Data Warehousing
- 2018-09-02 RocksDB key/value store Overview
- 2018-09-02 MemSQL Database Overview
- 2018-08-25 Apache Kudu Overview
- 2018-07-21 The Snowflake Data Warehouse
- 2018-07-19 Storage Space Efficiency in Avro and HBase
- 2018-05-06 Google Databases Overview
- 2018-04-23 Google Cloud Storage Overview
- 2018-04-21 Dealing with Mutable Records in a BigQuery Data Warehouse
- 2018-04-19 Analytic Functions, Partitioning and Windowing in SQL and BigQuery
- 2018-04-18 Google BigQuery Overview
- 2018-04-17 Google Cloud Functions Overview
- 2018-04-13 Apache Beam and Google Dataflow Overview
- 2018-04-12 Apache Beam - Reading the First Line of a File
- 2017-09-20 Hive, JDBC and Array-typed Columns
- 2017-09-15 Hive/Tez tasks with OutOfMemoryError
- 2017-09-03 Hive container is running beyond physical memory limits
- 2017-09-01 Spark RDD Random Notes
- 2017-08-27 Spark Overview
- 2017-05-08 The RabbitMQ Threading Model
- 2017-05-08 RabbitMQ Exchanges and Queues
- 2017-05-07 Vagrant, Kafka and Kerberos
- 2017-04-18 Kafka Serialization and the Schema Registry
- 2017-04-17 Apache Nifi Architecture
- 2017-03-27 Kafka Connect
- 2017-03-18 Lamba and Kappa Architectures
- 2017-03-11 An Elasticsearch5 Transport Client
- 2017-01-30 Elasticsearch Aliases
- 2016-10-23 Apache Cassandra Overview
- 2016-10-17 Elasticsearch Overview
- 2016-10-01 Big-Data-related Links
- 2016-06-28 Apache Hive Overview
- 2016-06-20 Apache HBase Overview
- 2016-06-06 Apache Kafka 0.10.0 Overview
- 2016-05-05 Apache Tez Overview
- 2016-04-16 NoSQL Overview
- 2016-01-28 Apache HDFS Overview
- 2016-01-28 Apache Yarn Overview
- 2016-01-28 Apache Hadoop Overview
- 2016-01-28 Apache Hadoop MapReduce
- 2016-01-12 Storage Area Networks and Associated Filesystems
- 2015-10-28 Big Data Processing
- 2015-10-28 Column Store Databases
- 2015-10-28 Big Data Storage
- 2015-05-19 Zookeeper Overview