Tutorial - Data Integration and Machine Learning: A Natural Synergy

In the era of Software 2.0, the ties between machine learning and data integration have become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. Data integration and machine learning making each other more effective is a true example of a powerful synergy.

  Download slides (PDF, 21MB)

Tutorial at SIGMOD 2018 on June 12, 2018

Table of contents

The tutorial is organized in 3 parts as described below. For more details, see the overview paper.

Topic Presenters Slides References
Introduction & preliminaries Luna Dong section slides section references
ML for DI: Overview Luna Dong section slides
ML for DI: ML for Entity Linkage Luna Dong section slides section references
ML for DI: ML for Data Extraction Luna Dong section slides section references
ML for DI: ML for Data Fusion Theo Rekatsinas section slides section references
ML for DI: Conclusion Theo Rekatsinas section slides
DI for ML: Training Data Creation Theo Rekatsinas section slides section references
DI for ML: Data Cleaning Theo Rekatsinas section slides section references
Conclusions and research directions Theo Rekatsinas section slides

ReferencesAll slides

Citation

If you wish to refer to the tutorial in your scientific publication, please refer to our overview paper:

@inproceedings{Dong:2018:DIM:3183713.3197387,
 author = {Dong, Xin Luna and Rekatsinas, Theodoros},
 title = {Data Integration and Machine Learning: A Natural Synergy},
 booktitle = {Proceedings of the 2018 International Conference on Management of Data},
 series = {SIGMOD '18},
 year = {2018},
 isbn = {978-1-4503-4703-7},
 location = {Houston, TX, USA},
 pages = {1645--1650} 
} 

Organizers