Dr. Jan Milan Deriu
Dr. Jan Milan Deriu
ZHAW
School of Engineering
Centre for Artificial Intelligence
Technikumstrasse 71
8400 Winterthur
Network
ORCID digital identifier
Social media
Projects
- DeepText: Intelligent Text Analysis with Deep Learning / Deputy project leader / abgeschlossen
- Call-E – Virtual Call Agent / Team member / abgeschlossen
- Unified Model for Evaluation of Text Generation Systems (UniVal) / Deputy project leader / laufend
- Pre-Study on Generation of Hockey News / Deputy project leader / abgeschlossen
- End-to-End Low-Resource Speech Translation for Swiss German Dialects / Deputy project leader / abgeschlossen
- Holistic Analysis of Organised Misinformation Activity in Social Networks (HAMiSoN) / Project leader / laufend
Publications
-
Zhang, Yi; Deriu, Jan Milan; Katsogiannis-Meimarakis, George; Kosten, Catherine; Koutrika, Georgia; Stockinger, Kurt,
2024.
ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems.
Proceedings of the VLDB Endowment.
17(4), pp. 685-698.
Available from: https://doi.org/10.14778/3636218.3636225
-
Deriu, Jan Milan; Rodrigo, Alvaro; Otegi, Arantxa; Echegoyen, Guillermo; Rosset, Sophie; Agirre, Eneko; Cieliebak, Mark,
2020.
Survey on evaluation methods for dialogue systems.
Artificial Intelligence Review.
54(1), pp. 755-810.
Available from: https://doi.org/10.1007/s10462-020-09866-x
-
Luley, Paul-Philipp; Deriu, Jan Milan; Yan, Peng; Schatte, Gerrit A.; Stadelmann, Thilo,
2023.
From concept to implementation : the data-centric development process for AI in industry [paper].
In:
2023 10th IEEE Swiss Conference on Data Science (SDS).
10th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 22-23 June 2023.
IEEE.
pp. 73-76.
Available from: https://doi.org/10.1109/SDS57534.2023.00017
-
Bollinger, Tobias; Deriu, Jan Milan; Vogel, Manfred,
2023.
Text-to-speech pipeline for Swiss German : a comparison [paper].
In:
8th Swiss Text Analytics Conference – SwissText 2023, Neuchâtel, Switzerland, 12-14 June 2023.
arXiv.
Available from: https://doi.org/10.48550/arXiv.2305.19750
-
Deriu, Jan; von Däniken, Pius; Tuggener, Don; Cieliebak, Mark,
2023.
Correction of errors in preference ratings from automated metrics for text generation [paper].
In:
Rogers, Anna; Boyd-Graber, Roger; Okazaki, Naoaki, eds.,
Findings of the Association for Computational Linguistics: ACL 2023.
61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, 9-14 July 2023.
Association for Computational Linguistics.
pp. 6456-6474.
Available from: https://doi.org/10.18653/v1/2023.findings-acl.404
-
Peñas, Anselmo; Deriu, Jan; Sharma, Rajesh; Valentin, Guilhem; Reyes-Montesinos, Julio,
2023.
Holistic analysis of organised misinformation activity in social networks [paper].
In:
Ceolin, Davide; Caselli, Tommaso; Tulin, Marina, eds.,
Disinformation in Open Online Media.
5th Multidisciplinary International Symposium on Disinformation in Open Online Media, Amsterdam, The Netherlands, 21-22 November 2023.
Cham:
Springer.
pp. 132-143.
Lecture Notes in Computer Science ; 14397.
Available from: https://doi.org/10.1007/978-3-031-47896-3_10
-
Plüss, Michel; Deriu, Jan Milan; Schraner, Yanick; Paonessa, Claudio; Hartmann, Julia; Schmidt, Larissa; Scheller, Christian; Hürlimann, Manuela; Samardžic, Tanja; Vogel, Manfred; Cieliebak, Mark,
2023.
STT4SG-350 : a speech corpus for all Swiss German dialect regions [paper].
In:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, 9-14 July 2023.
Association for Computational Linguistics.
pp. 1763-1772.
Available from: https://doi.org/10.18653/v1/2023.acl-short.150
-
von Däniken, Pius; Deriu, Jan Milan; Cieliebak, Mark,
2023.
ZHAW-CAI at CheckThat! 2023 : ensembling using kernel averaging [paper].
In:
Aliannejadi, Mohammad; Faggioli, Guglielmo; Ferro, Nicola; Vlachos, Michalis, eds.,
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023).
14th Conference and Labs of the Evaluation Forum (CLEF), Thessaloniki, Greece, 18-21 September 2023.
CEUR Workshop Proceedings.
pp. 534-545.
CEUR Workshop Proceedings ; 3497.
Available from: https://doi.org/10.21256/zhaw-29046
-
von Däniken, Pius; Deriu, Jan Milan; Agirre, Eneko; Brunner, Ursin; Cieliebak, Mark; Stockinger, Kurt,
2022.
Improving NL-to-Query systems through re-ranking of semantic hypothesis [paper].
In:
Abbas, Mourad; Freihat, Abed Alhakim, eds.,
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022).
5th International Conference on Natural Language and Speech Processing (ICNLSP), online, 16-17 December 2022.
Association for Computational Linguistics.
pp. 57-67.
Available from: https://doi.org/10.21256/zhaw-26147
-
Plüss, Michel; Hürlimann, Manuela; Cuny, Marc; Stöckli, Alla; Kapotis, Nikolaos; Hartmann, Julia; Ulasik, Malgorzata Anna; Scheller, Christian; Schraner, Yanick; Jain, Amit; Deriu, Jan Milan; Cieliebak, Mark; Vogel, Manfred,
2022.
SDS-200 : a Swiss German speech to Standard German text corpus [paper].
In:
Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022).
13th Language Resources and Evaluation Conference (LREC), Marseille, France, 20-25 June 2022.
European Language Resources Association.
pp. 3250-3256.
Available from: https://doi.org/10.21256/zhaw-26131
-
Deriu, Jan Milan; Tuggener, Don; von Däniken, Pius; Cieliebak, Mark,
2022.
Probing the robustness of trained metrics for conversational dialogue systems [paper].
In:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 22-27 May 2022.
Association for Computational Linguistics.
pp. 750-761.
Available from: https://doi.org/10.18653/v1/2022.acl-short.85
-
Tuggener, Don; Mieskes, Margot; Deriu, Jan Milan; Cieliebak, Mark,
2021.
Are we summarizing the right way? : a survey of dialogue summarization data sets [paper].
In:
Proceedings of the Third Workshop on New Frontiers in Summarization.
Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic (online), 7-11 November 2021.
Association for Computational Linguistics.
pp. 107-118.
Available from: https://doi.org/10.21256/zhaw-23506
-
Ulasik, Malgorzata Anna; Hürlimann, Manuela; Dubel, Bogumila; Kaufmann, Yves; Rudolf, Silas; Deriu, Jan Milan; Mlynchyk, Katsiaryna; Hutter, Hans-Peter; Cieliebak, Mark,
2021.
ZHAW-CAI : ensemble method for Swiss German speech to Standard German text [paper].
In:
Benites de Azevedo e Souza, Fernando; Tuggener, Don; Hürlimann, Manuela; Cieliebak, Mark; Vogel, Manfred, eds.,
Proceedings of the Swiss Text Analytics Conference 2021.
Swiss Text Analytics Conference – SwissText 2021, Online, 14-16 June 2021.
CEUR Workshop Proceedings.
Available from: https://doi.org/10.21256/zhaw-23889
-
Deriu, Jan Milan; Tuggener, Don; von Däniken, Pius; Campos, Jon Ander; Rodrigo, Alvaro; Belkacem, Thiziri; Soroa, Aitor; Agirre, Eneko; Cieliebak, Mark,
2020.
In:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16-20 November 2020.
Association for Computational Linguistics.
pp. 3971-3984.
Available from: https://doi.org/10.18653/v1/2020.emnlp-main.326
-
Deriu, Jan Milan; Mlynchyk, Katsiaryna; Schläpfer, Philippe; Rodrigo, Alvaro; von Grünigen, Dirk; Kaiser, Nicolas; Stockinger, Kurt; Agirre, Eneko; Cieliebak, Mark,
2020.
A methodology for creating question answering corpora using inverse data annotation [paper].
In:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), online, 5-10 July 2020.
Association for Computational Linguistics.
pp. 897-911.
Available from: https://doi.org/10.18653/v1/2020.acl-main.84
-
Campos, Jon Ander; Otegi, Arantxa; Soroa, Aitor; Deriu, Jan Milan; Cieliebak, Mark; Agirre, Eneko,
2020.
DoQA : accessing domain-specific FAQs via conversational QA [paper].
In:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), online, 5-10 July 2020.
Association for Computational Linguistics.
pp. 7302-7314.
Available from: https://doi.org/10.18653/v1/2020.acl-main.652
-
Sileo, Damien; Pradel, Camille; Peñas, Anselmo; Echegoyen, Guillermo; Otegi, Arantxa; Deriu, Jan Milan; Cieliebak, Mark; Barrena, Ander; Agirre, Eneko,
2019.
Matching words and knowledge graph entities with meta-embeddings [paper].
In:
Proceedings of CAp2019.
Conference on Machine Learning (CAp), Toulouse, July 3 to 5, 2019.
PFIA.
pp. 34-39.
-
Deriu, Jan Milan; Cieliebak, Mark,
2019.
Towards a metric for automated conversational dialogue system evaluation and improvement [paper].
In:
2th International Conference on Natural Language Generation (INLG 2019), Tokyo, Japan, October 29 - November 1, 2019.
Available from: https://www.inlg2019.com/assets/papers/132_Supplementary_Attachment.pdf
-
Cieliebak, Mark; Galibert, Olivier; Deriu, Jan Milan,
2019.
Towards understanding lifelong learning for dialogue systems [paper].
In:
IWSDS 2019 Proceedings.
IWSDS 2019 : International Workshop on Spoken Dialogue Systems Technology, Siracusa, Italy, Apr 24, 2019 - Apr 26, 2019.
IWSDS.
-
Grubenmann, Ralf; Tuggener, Don; von Däniken, Pius; Deriu, Jan Milan; Cieliebak, Mark,
2018.
SB-CH : a Swiss German corpus with sentiment annotations [poster].
In:
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018.
11th Edition of the Language Resources and Evaluation Conference (LREC 2018), Miyazaki, 7-12 May 2018.
European Language Resources Association.
-
Deriu, Jan Milan; Cieliebak, Mark,
2018.
Syntactic manipulation for generating more diverse and interesting texts [paper].
In:
Proceedings of the 11th International Conference on Natural Language Generation.
11th International Conference on Natural Language Generation (INLG 2018), Tilburg, The Netherlands, 5-8 November 2018.
Association for Computational Linguistics.
pp. 22-34.
Available from: https://doi.org/10.21256/zhaw-4875
-
Benites de Azevedo e Souza, Fernando; Grubenmann, Ralf; von Däniken, Pius; von Grünigen, Dirk; Deriu, Jan Milan; Cieliebak, Mark,
2018.
Twist Bytes : German dialect identification with data mining optimization [paper].
In:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018).
27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, August 20-26, 2018.
VarDial.
pp. 218-227.
Available from: https://doi.org/10.21256/zhaw-4850
-
Cieliebak, Mark; Deriu, Jan Milan; Egger, Dominic; Uzdilli, Fatih,
2017.
A Twitter corpus and benchmark resources for german sentiment analysis [paper].
In:
5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017.
Association for Computational Linguistics.
pp. 45-51.
Available from: https://doi.org/10.18653/v1/W17-1106
-
Graf, Hans Daniel; Koc, Yusuf; Panighetti, Sandro; Togni, Matteo; von Grünigen, Dirk; Weilenmann, Martin; Xhoxhaj, Erland; Zürrer, Daniel; Benites de Azevedo e Souza, Fernando; Deriu, Jan Milan; Neureiter, Nico; von Däniken, Pius; Cieliebak, Mark; Eich, Walter; Neuhaus, Stephan; Stockinger, Kurt,
2017.
Four different ways to build a chatbot about movies [poster].
In:
SwissText 2017: 2nd Swiss Text Analytics Conference, Winterthur, 9. Juni 2017.
-
von Grünigen, Dirk; Weilenmann, Martin; Deriu, Jan Milan; Cieliebak, Mark,
2017.
Potential and limitations of cross-domain sentiment classification [paper].
In:
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media.
Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, 3-7 April 2017.
Stroudsburg:
Association for Computational Linguistics.
pp. 17-24.
Available from: https://doi.org/10.18653/v1/W17-1103
-
Müller, Simon; Huonder, Tobias; Deriu, Jan Milan; Cieliebak, Mark,
2017.
In:
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
11th International Workshop on Semantic Evaluation, Vancouver, Canada, 3-4 August 2017.
Association for Computational Linguistics.
pp. 766-771.
Available from: https://doi.org/10.21256/zhaw-1529
-
Deriu, Jan Milan; Cieliebak, Mark,
2016.
In:
Basili, Roberto; Montemagni, Simonetta, eds.,
Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016).
Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Napoli, Italy, 5-7 December 2016.
Italian Journal of Computational Linguistics.
Available from: https://doi.org/10.21256/zhaw-1527
-
Paonessa, Claudio; Schraner, Yanick; Deriu, Jan Milan; Hürlimann, Manuela; Vogel, Manfred; Cieliebak, Mark,
2023.
Dialect transfer for Swiss German speech translation.
arXiv.
Available from: https://doi.org/10.48550/arXiv.2310.09088
-
von Däniken, Pius; Deriu, Jan Milan; Tuggener, Don; Cieliebak, Mark,
2022.
On the effectiveness of automated metrics for text generation systems [paper].
In:
Findings of the Association for Computational Linguistics: EMNLP 2022.
Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emirates, 7-11 December 2022.
Association for Computational Linguistics.
pp. 1503-1522.
Available from: https://doi.org/10.21256/zhaw-27042
-
Deriu, Jan Milan; Rodrigo, Alvaro; Otegi, Arantxa; Guillermo, Echegoyen; Rosset, Sophie; Agirre, Eneko; Cieliebak, Mark, eds.,
2019.
Survey on evaluation methods for dialogue.
ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Available from: https://doi.org/10.21256/zhaw-18985
-
Venzin, Valentin; Deriu, Jan Milan; Didier, Orel; Cieliebak, Mark,
2019.
Fact-aware abstractive text summarization using a pointer-generator network [paper].
In:
4th Swiss Text Analytics Conference (SwissText 2019), Winterthur, June 18-19 2019.
Swisstext.
Available from: https://doi.org/10.21256/zhaw-18988
-
Deriu, Jan Milan; Cieliebak, Mark,
2017.
End-to-end trainable system for enhancing diversity in natural language generation [paper].
In:
End-to-End Natural Language Generation Challenge (E2E NLG), 2017.
ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Available from: https://doi.org/10.21256/zhaw-4889
-
Deriu, Jan Milan; Lucchi, Aurelien; De Luca, Valeria; Severyn, Aliaksei; Müller, Simone; Cieliebak, Mark; Hofmann, Thomas; Jaggi, Martin,
2017.
Leveraging large amounts of weakly supervised data for multi-language sentiment classification [paper].
In:
Proceedings of the 26th International Conference on World Wide Web.
26th International World Wide Web Conference Committee (IW3C2), Perth, Australia, 3-7 April 2017.
Association for Computing Machinery.
pp. 1045-1052.
Available from: https://doi.org/10.1145/3038912.3052611
-
Deriu, Jan Milan; Cieliebak, Mark,
2017.
In:
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
SemEval 2017 - International Workshop on Semantic Evaluation, Vancouver, Canada, 3-4 August 2017.
Association for Computational Linguistics.
pp. 334-338.
Available from: https://doi.org/10.18653/v1/S17-2054