Delete search term

Header

Quick navigation

Main navigation

Text und Data Mining

Text and data mining (TDM) searches through large amounts of text and data using computer-assisted processes. Unstructured data is processed and automatically examined for patterns, trends and connections.

Text and data mining (TDM) refers to various methods employed to search through and evaluate large quantities of texts or data. With the help of computer-assisted analysis procedures, mostly unstructured data is initially prepared in a systematic and machine-readable manner before finally being automatically analysed for patterns, trends and other research-relevant correlations.

When using copyrighted materials such as texts, images or audiovisual media as a data source for TDM, both legal and technical terms of use have to be observed. Generally speaking, the web interfaces of the respective providers are not suitable for directly downloading large quantities of data. If, for example, you would like to analyse large amounts of content from licensed e-resources of the University Library, please note the information we provide in the Self-Service portal (KI 3355)

Many publishers have general rules on the use of text and data mining in their publications. There, you will often also find information on interfaces and their use (registration, default loading and downloading rates, etc.). (List not exhaustive).  

In addition to licensed content, there are also freely accessible databases that allow the use of TDM (list not exhaustive):

  • Arxiv
    Free access to preprints from the fields of physics, mathematics, computer science, statistics, financial mathematics and biology.
  • BioMed Central
    Open access journals from BioMed Central, Chemistry Central and SpringerOpen from the fields of biology and medicine.
  • Europeana
    Digital library with digitised material on scientific and cultural heritage from more than 2,000 European institutions.
  • HathiTrust Digital Library
    Digitised material from more than 100 academic institutions around the world.
  • Public Library of Science (PLOS)
    Access to content from the journals of the Public Library of Science, an open-access scientific publisher.
  • PubMed Central: Databases and Text Mining Tools
    Various freely accessible mining tools that can be used to search through PubMed Central, an archive with freely accessible content from the fields of biology and biomedicine.

Open access to self-created content in the sense of open science facilitates TDM processes. Clear rights management with standardised, machine-readable and open-content Creative Commons licences helps to ensure the legally secure application of TDM methods to data and text corpora.

Additional information: