Update: I am on leave at Apple, where I lead the Knowledge Platform - ML team. I will also be joining the Department of Computer Science at ETH Zurich.
I am an Assistant Proferssor at UW-Madison. I’ve also had the pleasure to cofound Inductiv, a company around HoloClean, that is now part of Apple. My lab works on the foundations of structured intelligence systems:
Software 2.0 for Data Quality: We are exploring the fundamental connections between data cleaning and machine learning. The HoloClean project introduced Machine Learning to the problem of data cleaning: We showed how to model data cleaning as statistical learning problem, how attention-based mechanisms and self-supervised learning can automate data cleaning and introduced multiple theoretical results on how to deal with noisy/dirty data. More recently we are exploring the synergies between data cleaning and machine learning deployments in the Picket project. This talk at the Stanford MLsys Seminar provides an overview.
Deep Learning over Billion-scale Structured Data: We are developing a system to make the use of deep learning models over billion-edge structured data easier, faster, and cheaper. We have started with the Marius project that focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius is under active development and available as an open-source project. You can learn more about Marius from our recent OSDI`21 and MLOpsWorld talks.
- January, 2022 Excited to talk about Data Debugging in ML at my alma mater, ECE @ NTUA.
- June, 2021 New talk about Marius and Machine Learning Over Billion-Edge Graphs at MLOpsWorld.
- March, 2021 Excited to be talking about Software 2.0 for Data Quality at the Stanford ML Sys seminar.
- February, 2021 Excited to talk about our work on Data Quality at CMU (ML with Large Datasets)