Thomas S. Hubregtsen  Thomas S. Hubregtsen photo       

contact information

Research Staff Member
Austin Research Laboratory, Austin, TX, USA

links


profile


I am a Research Staff Member in the IBM Research Austin Laboratory, where my current work revolves around the areas of Big Data analytics and High Performance Computing. At the same time, I am a PhD student at Delft University of Technology under the supervision of H. Peter Hofstee.

Before joining IBM as a full-time employee I performed 2 internships during which I investigated whether user-addressable flash could close a performance gap for Apache Spark between data objects (Spark Resilient Distributed Data) stored in memory and such objects written out or spilled to a file system.

During this research I combined four new technologies:
1. Power 8 systems (announced in April 2014)
2. The OpenPOWER firmware/OS stack (brand new as of April 2014)
3. Apache Spark (1.0 after May 2014)
4. IBM FlashSystem 840 enabled via CAPI (Coherent Accelerator Processor Interface) (Not yet generally available (GA))

The Flash system was connected into the Spark core by modifying both the Spark and Flash source code, and the experiments were run on Ubuntu 14-04LE on a Power8. 70% of the overhead due to spilling was removed, leading to a 1.15x speedup. In future work, the focus will be on removing the remaining overhead for spilling, and the Flash system will be buried deeper into Apache Spark. By doing so, the expectation is that user-addressable flash in combination with a relative small amount of DRAM will allow Apache Spark, and Big Data systems in general, to achieve a performance that will match in-memory computation, but with an order of magnitude larger storage capacity, enabling the solution of bigger problems with lower costs.

Thesis link: Evaluation of different storage systems for Apache Spark and Apache Hadoop

Previous work includes setting up a simulation environment for network topologies in data centers for the Hong Kong University of Science and Technology, evaluating GPU technology for big data computation on oil fields with PDS/Shell and developing a soft real-time de-interlacer for the broadcasting industry with Axon.

In my free time I love to travel, hike and run. I am also a certified snowboard instructor.