Project factsheet
English name: |
Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams |
Project type: |
A European collaborative project, FP7-ICT-2011-7 (grant agreement number 287863) |
Duration: |
1 November 2011 ‒ 31 October 2014 |
Project Web page: |
|
Principal investigator: |
Thierry Declerck |
Polish participation
Polish name: |
Wielkoskalowa, wielojęzyczna analiza trendów i agregacja strumieni danych w czasie rzeczywistym |
Project type: |
A Ministry of Science and Higher Education support for the Polish participation in the project (grant agreement W137/7.PR/2014) |
Duration: |
1 November 2013 – 31 October 2014 |
Principal investigator: |
Maciej Ogrodniczuk |
Institution: |
Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences |
Project summary
The recent massive growth in online media and the rise of user-authored content (e.g weblogs, Twitter, Facebook) has lead to challenges of how to access and interpret these strongly multilingual data, in a timely, efficient, and affordable manner. Scientifically, streaming online media pose new challenges, due to their shorter, noisier, and more colloquial nature. Moreover, they form a temporal stream strongly grounded in events and context. Consequently, existing language technologies fall short on accuracy, scalability and portability.
The goal of this project is to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media. TrendMiner achieves this through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data are required due to use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty are weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and affordability are addressed through a cloud-based infrastructure for real-time text mining from stream media.
Results are validated in two high-profile case studies: financial decision support (with analysts, traders, regulators, and economists) and political analysis and monitoring (with politicians, economists, and political journalists). The techniques are generic with many business applications: business intelligence, customer relations management, community support. The project also benefits society and ordinary citizens by enabling enhanced access to government data archives, summarisation of online health information and tracking of hot societal issues.