Delete search term

Header

Quick navigation

Main navigation

Datalab Seminar: Entity Matching with Transformer Architectures

Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations. In this talk we analyze how well four of the most recent attention-based transformer architectures BERT, XLNet, RoBERTa and DistilBERT perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values. To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical, state-of-the-art deep learning methods in EM by an average margin of 27.5.

Speaker:

Ursin Brunner is Lead ML Engineer at ti&m AG and right now finishing his part time Master at Zurich University of Applied Sciences (ZHAW). After studying Computer Science at ZHAW he was working as a professional Software Engineer at Zühlke Engineering and several other companies over the last years. Amazed by the recent successes in the field of Machine Learning / AI he he decided 2016 to start a part time Master in Data Science while continuing working in the private sector. With a focus on Natural Language Processing and Deep Learning, Ursin is interested in any ML technology that makes a difference: from predicting Avalanches with ML to beating the pros in Starcraft 2 with Reinforcement Learning, the field gets more and more impact on our lives.

Prof. Dr. Kurt Stockinger is Professor of Computer Science, Director of Studies in Data Science at Zurich University of Applied Sciences (ZHAW) and Deputy Head of the ZHAW Datalab. His research focuses on Data Science with emphasis on Big Data, data warehousing, advanced analytics and natural language query processing on knowledge bases. He is also on the Advisory Board of Callista Group AG. Previously Kurt Stockinger worked at Credit Suisse in Zurich, Switzerland, at Lawrence Berkeley National Laboratory in Berkeley, California, at California Institute of Technology, California as well as at CERN in Geneva, Switzerland. He holds a Ph.D. in computer science from CERN / University of Vienna.

Datalab Seminars

Date

Start date: 26 February 2020, 12.00 pm

Location

Room TB 610, Technikumstrasse 9, 8400 Winterthur