Data

Biomedical diachronic concept embeddings
The biomedical diachronic concept embeddings are embeddings trained on different PUBMED subsets to detect and explore changes of medical knowledge. To do so, biomedical publications were extracted from PUBMED, sliced according to their date of publication and mapped with their corresponding medical concept using MetaMap. On this data we trained embeddings on the different slices whereas each slice encapsulates the semantic knowledge of a concept within a particular period in time. To generate the chronological embeddings we followed the procedure stated here. To use the embeddings a compressed file is provided, which is an HDFS file and can be accessed by a python's h5py package after uncompressing using "tar" utility. Attached is the logical structure of the file where the name of leaf nodes are exactly the same as that of the tables (t_period). All the 14 tables (including complete embedding set) are included into one group named as "embedding_group". Download the diachronic embeddings here.

fiktive Nephrologie-Verlaufsnotizen (fictitious clinical notes)
This small dataset is a collection of fictitious clinical notes from the nephrology domain written by different students (medical & lingustic). Documents imitate the style and content of clinical notes of the nephrology, thus they are suitable for testing NLP applications as real patient data is difficult to share. Note, our fictitious clinical notes are not necessarily correct from medical perspective. Download

German NegEx trigger set
This set of trigger words has been created for negation detection in German clinical notes and discharge summaries. More information can be found here. Download

Tools & Models

Graph-KD
Graph-KD is a tool to explore graph structures for knowledge discovery. It bases on neo4j and includes shortest paths, node exploration and knowledge inference. The tool can be downloaded here and tested online here. Further details can be found in our paper Graph-KD: Exploring Relational Information for Knowledge Discovery which was presented in our paper at ISWC 2019 Posters & Demonstrations [pdf].

Dependency Tree Parser for German medical text
Using the Stanford parser we created a domain-adapted dependency tree parser specialized for German medical text. The model has been pre-trained on a large general dataset in German and then re-trained on a small set of clinical documents of the nephrology domain. The model and a more detailed description can be found here.

Biomedical-CharTranslator
Many NLP tasks apply a concept normalization (alignment), which links a given mention to an identical concept within an ontology. Applying this task to another language than English might be more challenging, as non-English data is often underrepresented. Beside that, in the biomedical domain many terms are of Greek and Latin origin. Taking this into account and knowing characteristics between two languages, a large range of biomedical terms can be easily translated from one into another language. Our biomedical CharTranslator bases on this idea and uses a simple neural translator on character level. In this way concept normalization can be improved by translating "unknown" words and extending the search by including English data. The tool and models can be found here.