Works in Progress Webinar: Ariadne, an innovative approach to scalable semantic embedding
The OCLC Research project Ariadne Semantic Embedding is a demonstration of a practical solution to support libraries in this field. Join this webinar to see the potential of this scalable semantic embedding method for other applications such as entity disambiguation, citation recommendation, clustering and collection exploration.
Shenghui Wang, Research Scientist, OCLC Research
Being able to measure similarity or relatedness is important to many tasks in modern digital library systems, such as information retrieval, entity disambiguation, de-duplication, clustering, recommendation, subject prediction, etc. Big search engines like Google currently benefit from semantic embedding technologies for better information retrieval. These embedding technologies enable us to represent words, entities, bibliographic records in compact, semantically meaningful vector spaces, where semantic similarity/relatedness is computable and easily usable for the various tasks mentioned above.
The recent success of Word2Vec has initiated the development of complex and powerful deep learning models. However, deep learning models require substantial computation requirements if applied to large bibliographic collections and need careful tuning for optional hyperparameter settings. Unfortunately, most libraries (and even large-scale aggregators) do not have the processing capacity nor the skills to embrace powerful deep learning and therefore stick with the traditional keyword-based approach.
The OCLC Research project Ariadne Semantic Embedding is a demonstration of a practical solution to support libraries in this field. We revisit previously utilized embedding methods and propose a conceptually simple and computationally lightweight approach. Our experiments show highly competitive results with various state-of-the-art embedding methods on different tasks, including the standard Semantic Textual Similarity (STS) benchmark and an extreme multi-label classification (automatic MeSH subject prediction) task, at a fraction of the computational cost. We will show the potential of this scalable semantic embedding method for other applications such as entity disambiguation, citation recommendation, clustering and collection exploration.
This webinar will be of interest to those following the future of search and discovery, as well as those interested in data science topics.
Works in Progress: An OCLC Research Occasional Webinar Series
These webinars are exclusively for OCLC Research Library Partners, but the recordings are publicly available to all.
Works in Progress: An OCLC Research Occasional Webinar Series to talk about work happening in OCLC Research – we'd like to present our work informally and get feedback from you, our Partners. We'd also like this to be a venue for Partner institutions. What are you working on that everyone should know about? What input would help you move forward? Let us know!