Update: I am currently on leave at Apple, where I lead the Knowledge Platform - Graph ML team. I will also be joining the Department of Computer Science at ETH Zurich where I will be part of the Systems Group.
Previously, I was an Assistant Professor at UW-Madison and a member of the Database Group. I’ve also had the pleasure to be a co-founder of Inductiv (acquired by Apple), a company developing AI for identifying and correcting errors in data.
I am always looking for good students! If interested in working on the topics below please reach out at theo.rekatsinas[at]inf.ethz.ch.
My lab focuses on the foundations of structured intelligence systems:
Software 2.0 for Data Quality: We are exploring the fundamental connections between data cleaning and machine learning. The HoloClean project introduced Machine Learning to the problem of data cleaning: We showed how to model data cleaning as statistical learning problem, how attention-based mechanisms and self-supervised learning can automate data cleaning and introduced multiple theoretical results on how to deal with noisy/dirty data. More recently we are exploring the synergies between data cleaning and machine learning deployments in the Picket project. This talk at the Stanford MLsys Seminar provides an overview.
Neural Relational Engines over Billion-scale Data: We are developing a new paradigm of systems to make the use of deep learning models over billion-scale structured data easier, faster, and cheaper. We have started with the Marius project that focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius is under active development and available as an open-source project. You can learn more about Marius from our recent OSDI`21 and MLOpsWorld talks.
- January, 2022 Excited to talk about Data Debugging in ML at my alma mater, ECE @ NTUA.
- June, 2021 New talk about Marius and Machine Learning Over Billion-Edge Graphs at MLOpsWorld.
- March, 2021 Excited to be talking about Software 2.0 for Data Quality at the Stanford ML Sys seminar.
- February, 2021 Excited to talk about our work on Data Quality at CMU (ML with Large Datasets)