LivingKnowledge goal is to bring a new quality into search and knowledge management technology for more concise, complete and contextualised search results.
The focus for the second LivingKnowledge application "Media Content Analysis" is to improve the work processes in the area of media content analysis. The significance of this methodology lies precisely in its capacity to describe the mediated public discourse and various forms and aspects of diversity. LK technology will for the first time empower social scientists to analyze large corpora of text in a timely fashion. The work includes a complete mock-up for two use cases with manual identification and analysis as required by the application, as a demonstration of the requirements and planned outcome of the automatic / semi-automatic analysis to be performed by the web media content analyzer prototypes V1 and V2 later on in the project.
Knowledge is a very complex object, made up of many different components. Natural language is used to state knowledge, facts describe factual or abstract truths about the world, classifications are used to organise knowledge.
Our goal is to study the foundations and to develop the formalisms, mechanisms and structures for effective representation and management of knowledge, which is aware of time and evolution, opinions, diversity and bias.
Text is the main source of information for the case study of LivingKnowledge. Therefore, we employ state-of-the-art information extraction technology as well as advanced linguistic information, like syntactic parsing and semantic role labeling. The role of these linguistic levels is twofold: (i) they are the basic building blocks for the extraction of more advanced knowledge, like for example opinionated-sentence classifiers or event extractions and (ii) they feed the next modules of the LivingKnowledge architecture, e.g. opinion clustering and aggregation. The ultimate goal of text analysis in LivingKnowledge will be the modeling of events and opinions at the discourse level to automatically extract dependencies/implications between facts and opinions.
The focus here is developing methods for diversity-aware search and exploration of information. Diversifying search results is an important and challenging problem, required to increase user satisfaction for ambiguous queries or for queries where the results contain multiple subtopics or opinions. This can be achieved through detecting and removing near-duplicates in the query results to reduce redundancy, as well as by increasing the dissimilarity of the retrieved results. In addition to search, navigation and browsing are important for exploring large data sets. In this case, the focus will be on identifying the main dimensions of diversity, and allowing faceted exploration of results across these dimensions.
We will explore requirements and evaluate our progress in the LivingKnowledge testbed which, in turn, will constitute the basic building block for two complementary applications. Our Future Predictor will combine and test all methods necessary to answer factual queries regarding future events and statements, based on information available already on the Web. Our Media Research Analyser will address questions about the public image of a company, possibly changing over time, or the effectiveness of a PR campaign as reflected through user generated content in blogs and other public forums.
Application Scenario: Future Predictor
Application scenario: Media Content Analysis
Although natural language is the main vehicle for conveying information in documents on the web, increasingly people are using visual information to express ideas or support their message. A photograph or illustration can sometimes create a much greater impact than a large volume of text and in the project, we are not ignoring the multimedia nature of documents. Although extracting information from images is a more challenging task than extracting information from text, we are extending and applying current research on image analysis in support of text analysis to better understand the nature of facts and opinions, diversity and bias.