Among the large amounts of unstructured data generated across the world and available nowadays, textual data represent an important source of information. This fact is particularly true in the biomedical domain, where a constant increasing demand to access the textual content is observed: the situation is relevant for accessing and processing Electronic Health Records, online discussion forums, and scientific literature. Indeed, dealing with biomedical texts requires us to take into account a great variety of texts, languages and users.
For several years now, a lot of NLP research has focused on mining and retrieving information (i.e., medical entities and domain-specific relations), which are relevant for biologists, physicians, terminologists, epidemiologists, and patients. We will propose an overview of the NLP methods used for tackling several such research problems through text mining applications. First, we will present the resources and rule-based approaches we designed for extracting drug-related information from clinical texts, and for acquiring domain-specific semantic relations from digital libraries. Then we will present the cross-lingual approach we are developing for building multilingual terminologies from a patient-centered Ukrainian corpus.


Thierry Hamon is an Associate Professor in Computer Science at Université Paris 13 and a member of the LIMSI-CNRS research lab. He received his PhD in computer science in 2000 for a dissertation on semantic variation and compositionality in specialized corpora. He is a member of the executive board of the French NLP association (ATALA) and the moderator of the mailing list LN.  His main research interests address the design of Natural Language Processing approaches for terminology building from English, French and Ukrainian specialized textual corpora, but also for terminology matching. He also proposes approaches for text mining, information retrieval, and layman understanding of technical terms. The experiments are usually performed with clinical texts, online discussion forums, or scientific literature. Thierry is developing software which he is making publicly available as Perl modules on CPAN. For example, a term extractor (YaTeA), a system for querying biomedical Linked Data in natural language, a platform which combines existing NLP tools for linguistic and semantic annotation of specialized corpora.