Social Sciences in China (Chinese Edition)
No. 4, 2025
Theoretical Foundations of Data-Driven Linguistics
(Abstract)
Liu Haitao
Linguistic research in the era of data-based intelligence should be grounded in authentic language data, rooted in the fundamental nature of human language, and enriched by the accumulated insights of prior linguistic scholarship. The necessity and feasibility of extracting linguistic patterns from language data are evident, with linearity and systematicity serving as the two core principles of data-driven linguistics. This approach conceptualizes language as a human-driven probabilistic system, focusing on its probabilistic nature to explore both linear laws and systematic structural patterns of language. It examines the relationship between linear word chains and two-dimensional network structures. At the same time, it advances a dual perspective attentive to both human and machine needs in the context of data-based intelligence. Through the synergistic combination of data-driven and data-based methodologies, it employs systems science approaches to uncover the operational principles of human language systems and examine their applicability in artificial intelligence (AI) and related fields. This endeavor ultimately aims to establish a scientific discipline of “speech dynamics,” contributing to the development of explainable AI and deepening our understanding of the operational mechanisms of human language systems.
