This dataset is based on grammatical data from seven Indigenous languages spoken in the Caquetá-Putumayo River Basins of Colombia. The languages belong to three language families—Witotoan (Murui-Muina, Ocaina, Nonuya), Boran (Bora, Muinane), and Arawak (Resígaro)—as well as one linguistic isolate, Andoke. Initially based on data from the Hunter-Gatherer Language Database (HGDL), the dataset has since been expanded and restructured using a set of grammatical categories specifically designed to capture key morphosyntactic features of languages from Northwest Amazonia.
The dataset was collected and verified by linguist Dr. Katarzyna I. Wojtylak through a rigorous process of cross-checking against existing linguistic literature, her own field expertise, and consultations with specialists in these languages. While the initial data drew from the HGDL framework, it became evident that additional grammatical categories were necessary to more accurately represent the structures of the region’s languages. As a result, a new typological framework was developed, incorporating features particularly relevant to the grammatical description of Northwest Amazonian languages, such as classifier systems, evidentiality, switch reference, and complex predicates.
The dataset is organized into a hierarchical structure reflecting these expanded grammatical categories. Each feature is systematically coded as present, absent, or unknown, allowing for comparative analysis both within and across language families. By distinguishing inherited traits from contact-induced features, this dataset provides insights into the historical and contemporary dynamics of grammatical structures in this multilingual region.
Through both quantitative and qualitative analyses, the dataset contributes to the study of areal patterns, grammatical diffusion, and the interplay between genealogical inheritance and language contact. It offers a valuable resource for typologists, historical linguists, and researchers focusing on Amazonian languages.
The dataset is structured into CSV files, categorizing each language’s grammatical features in a format suitable for comparative linguistic research. Available under a CC BY license, this dataset represents a significant contribution to the understanding of grammatical diversity in Northwest Amazonia.
(2025-03-18)