Qichwabase

Revision as of 16:20, 1 September 2023 by Elwinlhq (talk | contribs)

Welcome to Qichwabase, a wikibase instance hosted on wikibase.cloud. It aims to be a knowledge base for the Quechua language and community. It is a collaborative project that is being developed by a team of researchers and volunteers. The main goal of Qichwabase is to model Quechua language lexical data as Wikibase lexemes collection, for transfer to Wikidata, as soon the dataset reaches the envisaged quality.

Qichwabase as a Wikibase instance

Qichwabase is a valuable resource for anyone interested in the Quechua language and knowledge. It can be used to learn about Quechua words and phrases, to find translations of Quechua words into other languages, and to explore its usefulness in various escenarios [1], such as Question Answering, Dialogue Systems, Entity linking, Knowledge Validation, and Collaborative Community.

Creation

Qichwabase is still under development, but it already contains a significant amount of knowledge, we have started modeling open Quechua lexical data from Runasimi Dictionary, and plan to include data from other sources, such as attestations and frequency information from Quechua webcorpora[2]. Currently, Qichwabase includes:

  • Over 1 million triples (or statements) about Quechua words
  • Information and examples about the usage of Quechua words
  • Translations of Quechua words into other languages, e.g. English, German, Italian, and Spanish

Classes and Properties

The Ontology Classes and Properties listed here do not include the Ontolex core classes used by default in a Wikibase. For how lexicographical data is represented in a wikibase, see the documentation pages at Wikidata.

  • Ontology classes and their instances (query).
  • Ontology properties (Special:ListProperties)
  • Ontological relations between items describing lexical categories (query).

Hosting

For hosting the Qichwabase we chose Wikibase, which allows knowledge to be represented as a semantically structured data. For instance, we rely on the SPARQL enpoint for querying and exploiting the knowledge. You can explore the queries in the SPARQL enpoint or try out the following queries:

  • See Quechua lexemes with lemma and pos using this query.
  • See lexemes with senses and multilingual sense descriptions using this query.
  • See a bar chart of POS distribution (fine-grained categories) using this query.
  • See a bar chart of POS distribution (broader categories) using this query.
  • See lexemes that have usage examples, together with the example source references using this query.
  • See lexemes that have wikidata alignment, and retrieve translation equivalents from Wikidata using this federated query.
  • See lexemes that have lexical forms using this query.
  • See Quechua varieties (dialects) as described in Qichwabase using this query.
  • See distribution of dialectal lemma variants (bar chart): query.

See also SPARQL queries page.

Curation

Wibibase provides a set of tools that we can use for validating the knowledge before it is entered on Qichwabase. For instance, we are defining EntitySchemas in order to create forms to be filled in. The ShEx constraints are defined on Project:Cradle and/or defined as an EntitySchema wiki page. See some EntitySchemas:

Origins of Qichwabase

Qichwabase is product of a miniproject worked on at SD-LLOD-22 in June 2022, where it was awarded the Best Project Prize. Main goal is to model Quechua language lexical data as Wikibase lexemes collection, for transfer to Wikidata, as soon the dataset reaches the envisaged quality.

References

  1. Getting Quechua Closer to Final Users through Knowledge Graphs, arXiv, https://arxiv.org/abs/2208.12608
  2. Huaman et al.: QICHWABASE: A Quechua Language and Knowledge Base for Quechua Communities, arXiv, https://arxiv.org/abs/2305.06173