Nexus Linguarum project
Project factsheet
English name: |
Nexus Linguarum: European network for Web-centred linguistic data science |
Project type: |
COST Action CA 18209 |
Duration: |
28 October 2019 – 27 April 2024 |
Extended information: |
|
Chair of the Action: |
Jorge Gracia |
Polish members of the Management Committee
- Agata Filipowska (Poznan University of Economics), MC Member
- Krzysztof Nowak (Institute of Polish Language Polish Academy of Sciences), MC Member
- Barbara Lewandowska-Tomaszczyk (University of Łódź), MC Substitute
- Maciej Ogrodniczuk (Institute of Computer Science, Polish Academy of Sciences), MC Substitute
- Arleta Suwalska (University of Łódź), MC Substitute
Project description
The main aim of this Action is to promote synergies across Europe between linguists, computer scientists, terminologists, and other stakeholders in industry and society, in order to investigate and extend the area of linguistic data science. We understand linguistic data science as a subfield of the emerging “data science”, which focuses on the systematic analysis and study of the structure and properties of data at a large scale, along with methods and techniques to extract new knowledge and insights from it. Linguistic data science is a specific case, which is concerned with providing a formal basis to the analysis, representation, integration and exploitation of language data (syntax, morphology, lexicon, etc.). In fact, the specificities of linguistic data are an aspect largely unexplored so far in a big data context.
In order to support the study of linguistic data science in the most efficient and productive way, the construction of a mature holistic ecosystem of multilingual and semantically interoperable linguistic data is required at Web scale. Such an ecosystem, unavailable today, is needed to foster the systematic cross-lingual discovery, exploration, exploitation, extension, curation and quality control of linguistic data. We argue that linked data (LD) technologies, in combination with natural language processing (NLP) techniques and multilingual language resources (LRs) (bilingual dictionaries, multilingual corpora, terminologies, etc.), have the potential to enable such an ecosystem that will allow for transparent information flow across linguistic data sources in multiple languages, by addressing the semantic interoperability problem.