Empirical Analysis of Ranking Models for an Adaptable Dataset Search

NEVES, A. B. ; GUERRA, R. ; Leme, L. A P. P. ; LOPES, G. R. ; NUNES, B. P. ; CASANOVA, Marco A. . Empirical Analysis of Ranking Models for an Adaptable Dataset Search. In: The Semantic Web – 15th International Conference, ESWC 2018, 2018, Heraklion. Berlin Heidelberg. Berlin: Springer, 2018. p. 50-64. doi: 10.1007/978-3-319-93417-4_4

Empirical Analysis of Ranking Models for an Adaptable Dataset Search

Authors

Angelo B. Neves (UFF)
Rodrigo G. G. de Oliveira (UFF)
Luiz André P. Paes Leme (UFF)
Giseli Rabello Lopes (UFRJ)
Bernardo P. Nunes (PUC-Rio & UNIRIO)
Marco A. Casanova (PUC-Rio)

Abstract

Currently available datasets still have a large unexplored potential for interlinking. Ranking techniques contribute to this task by scoring datasets according to the likelihood of finding entities related to those of a target dataset. Ranked datasets can be either manually selected for standalone linking discovery tasks or automatically inspected by programs that would go through the ranking looking for entity links. This work presents empirical comparisons between different ranking models and argues that different algorithms could be used depending on whether the ranking is manually or automatically handled and, also, depending on the available metadata of the datasets. Experiments indicate that ranking algorithms that performed best with nDCG do not always have the best Recall at Position k, for high recall levels. The best ranking model for the manual use case (with respect to nDCG) may need 13% more datasets for 90% of recall, i.e., instead of just a slice of 34% of the datasets at the top of the ranking, reached by the best model for the automatic use case (with respect to recall@k), it would need almost 47% of the ranking.

Keywords:

Linked Data, Entity linking, Recommendation, Dataset, Ranking, Empirical evaluation

doi: 10.1007/978-3-319-93417-4_4