Big Data Integration
Abstract – Until recently, structured (e.g., relational) and unstructured (e.g., textual) data were managed very differently: Structured data was queried declaratively using languages such as SQL, while unstructured data was searched using boolean queries over inverted indices. Today, we witness the rapid emergence of entity-centric techniques to bridge the gap between different types of content and manage both unstructured and structured data more effectively in Big Data environments.
I will start this talk by giving a few examples of entity-centric data management. I will then describe two recent systems that were built in my lab and revolve around entity-centric data management techniques: ZenCrowd, a socio-technical platform that automatically connects textual contents to structured entities, and TripleProv, a scalable, efficient, and provenance-enabled back-end to manage graphs of entities.
Biography – Philippe Cudre-Mauroux is the director of the eXascale Infolab at the University of Fribourg and the CTO of Scigility. He received his Ph.D. from the Swiss Federal Institute of Technology EPFL, where he won both the Doctorate Award and the EPFL Press Mention in 2007. Before joining the University of Fribourg, he worked on information management infrastructures at IBM Watson Research (NY), Microsoft Research Asia, and MIT. He was Program Chair of the International Semantic Web Conference in 2012 and General Chair of the International Symposium on Data-Driven Process Discovery and Analysis in 2012 and 2013. He recently won the Verisign Internet Infrastructures Award, a Swiss National Center in Research award, as well as a Google Faculty Research Award. His research interests are in next-generation, Big Data management infrastructures for non-relational data. Webpage