License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2017.18
URN: urn:nbn:de:0030-drops-79515
Go to the corresponding OASIcs Volume Portal

Devezas, José ; Nunes, Sérgio

Information Extraction for Event Ranking

OASIcs-SLATE-2017-18.pdf (0.7 MB)


Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to effectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events.

BibTeX - Entry

  author =	{Jos{\'e} Devezas and S{\'e}rgio Nunes},
  title =	{{Information Extraction for Event Ranking}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{18:1--18:14},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Ricardo Queir{\'o}s and M{\'a}rio Pinto and Alberto Sim{\~o}es and Jos{\'e} Paulo Leal and Maria Jo{\~a}o Varanda},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-79515},
  doi =		{10.4230/OASIcs.SLATE.2017.18},
  annote =	{Keywords: Named Entity Recognition, Relation Extraction, Knowledge Base Population, Entity-Based Ranking, Academic Events}

Keywords: Named Entity Recognition, Relation Extraction, Knowledge Base Population, Entity-Based Ranking, Academic Events
Collection: 6th Symposium on Languages, Applications and Technologies (SLATE 2017)
Issue Date: 2017
Date of publication: 04.10.2017

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI