Theodoros (Theo) Rekatsinas

AboutPapersStudentsTeaching, Awards, Bio & Misc

Update: I am on leave at Apple, where I lead the Knowledge Platform - ML team. I will also be joining the Department of Computer Science at ETH Zurich.

I am an Assistant Proferssor at UW-Madison. I’ve also had the pleasure to cofound Inductiv, a company around HoloClean, that is now part of Apple. My lab works on the foundations of structured intelligence systems:

  • Software 2.0 for Data Quality: We are exploring the fundamental connections between data cleaning and machine learning. The HoloClean project introduced Machine Learning to the problem of data cleaning: We showed how to model data cleaning as statistical learning problem, how attention-based mechanisms and self-supervised learning can automate data cleaning and introduced multiple theoretical results on how to deal with noisy/dirty data. More recently we are exploring the synergies between data cleaning and machine learning deployments in the Picket project. This talk at the Stanford MLsys Seminar provides an overview.

  • Deep Learning over Billion-scale Structured Data: We are developing a system to make the use of deep learning models over billion-edge structured data easier, faster, and cheaper. We have started with the Marius project that focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius is under active development and available as an open-source project. You can learn more about Marius from our recent OSDI`21 and MLOpsWorld talks.

News

    • January, 2022 Excited to talk about Data Debugging in ML at my alma mater, ECE @ NTUA.
    • June, 2021 New talk about Marius and Machine Learning Over Billion-Edge Graphs at MLOpsWorld.
    • March, 2021 Excited to be talking about Software 2.0 for Data Quality at the Stanford ML Sys seminar.
    • February, 2021 Excited to talk about our work on Data Quality at CMU (ML with Large Datasets)
Contact: theo.rekatsinas [at] inf.ethz.ch
  • thodrek