Selected Papers
2022
- Saga: A Platform for Continuous Construction and Serving of Knowledge At Scale Ihab Ilyas, Theodoros Rekatsinas, Vishnu Konda, Jeffrey Pound, Xiaoguang Qi, Mohamed Soliman. SIGMOD 2022
- Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine Roger Waleffe et al., In Submission
- Machine Learning and Data Cleaning: Which Serves the Other? Ihab Ilyas and Theodoros Rekatsinas Journal of Data and Information Quality 2022 (invited)
- Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins Sahaana Suri et al. VLDB 2022
- Can Transfer Learning be used to build a Query Optimizer? Yunjia Zhang et al. CIDR 2022
2021
- Picket: Guarding Against Corrupted Data in Tabular Data during Learning and Inference Zifan Liu et al. VLDB Journal 21
- Demo of Marius: A System for Large-scale Graph Embeddings Anze Xie et al. VLDB demo 21
- Marius: Learning Massive Graph Embeddings on a Single Machine Jason Mohoney et al. OSDI 21
- On Robust Mean Estimation under Coordinate-level Corruption Zifan Liu et al. ICML 21
2020
- Unsupervised Relation Extraction from Language Models using Constrained Cloze Completion Ankur Goswami et al. EMNLP-Findings 20
- Data-Dependent Differentially Private Parameter Learning for Directed Graphical Models Amrita Chowdhury et al. ICML 20
- A Statistical Perspective on Discovering Functional Dependencies in Noisy Data Yunjia Zhang et al. SIGMOD 20
- Attention-based Learning For Missing Data Imputation in HoloClean Richard Wu et al. MLSys 20
- Principal Component Networks: Parameter Reduction Early in Training Roger Waleffe and Theodoros Rekatsinas Manuscript
2019
- Approximate Inference in Structured Instances with Noisy Categorical Observations Alireza Heidari et al. UAI 19
- HoloDetect: Few-Shot Learning for Error Detection Alireza Heidari et al. SIGMOD 19
- A Formal Framework For Probabilistic Unclean Databases De Sa et al. ICDT 19
- Data Integration and Machine Learning: A Natural Synergy Xin Luna Dong and Theodoros Rekatsinas VLDB, SIGMOD, KDD Tutorials 19
2018
- Deep Learning For Entity Matching: A Design Space Exploration Sidharth Mudgal et al. SIGMOD 18
- Fonduer: Knowledge Base Construction from Richly Formatted Data Sen Wu et al. SIGMOD 18
2017 and older
- HoloClean: Holistic Data Repairs with Probabilistic Inference Theodoros Rekatsinas et al. VLDB 17
- SLiMFast: Guaranteed Results for Data Fusion and Source Reliability Theodoros Rekatsinas et al. SIGMOD 17
- Forecasting Rare Disease Outbreaks from Open Source Indicators Theodoros Rekatsinas et al. Journal of Statistical Analysis and Data Mining, Best of SDM Special Issue, 2016
- SourceSight: Enabling Effective Source Selection Theodoros Rekatsinas et al. SIGMOD demo 16
- HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He, Theodoros Rekatsinas et al. ICML 15
- SourceSeer: Forecasting Rare Disease Outbreaks Using Multiple Data Sources Rekatsinas et al. SDM 15 Best Paper Award
- Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration Rekatsinas et al. CIDR 15
- Characterizing and selecting fresh data sources Rekatsinas et al. SIGMOD 14
- SPARSI: partitioning sensitive data amongst multiple adversaries Rekatsinas et al. VLDB 13
- Local structure and determinism in probabilistic databases Rekatsinas et al. SIGMOD 12