Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classications such as the ones used in digital libraries. We analyze the natural language labels within classication by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classications into lightweight ontologies required by semantic matching, search and classication algorithms.
The LK project is funded by the European Commission under Project No. 231126