Built by a linguist for linguists.
GlossKit was founded by Shuan Osman Karim, a historical linguist and language documentation researcher whose work focuses on Kurdish, Gorani, and other underdocumented Iranic languages. The platform grows out of practical research experience with recording, transcription, morphological analysis, lexicon building, and the need to carry structured data forward into syntax and export.
Its design reflects the perspective of someone working across historical linguistics, documentary practice, and computational methods rather than a generic software template for language data.

Shuan Osman Karim, Ph.D.
Alexander von Humboldt Postdoctoral Fellow
Julius-Maximilians-Universität Würzburg
Research areas
Founder story
Why this work led to GlossKit
Shuan Osman Karim's academic work sits at the intersection of historical linguistics, morphology, documentary practice, and Iranic language research. He completed his Ph.D. in Linguistics at The Ohio State University, worked at the University of Cambridge on the ERC-funded ALHOME project, and is now an Alexander von Humboldt Postdoctoral Fellow at Julius-Maximilians-Universität Würzburg.
Across that research path, one recurring problem is hard to ignore: the movement from recordings to transcripts, glossed texts, lexicon evidence, syntactic analysis, and export is often fragmented across separate tools and hand-built processes. Those tools can be powerful, but keeping data aligned across stages takes more time and technical work than many projects can afford.
GlossKit grows out of the need for infrastructure designed by someone who understands linguistic analysis from the inside. It is intended to support workflows where recordings, reviewed transcripts, morphology, lexicon curation, syntax initialization, and archival packaging remain connected rather than being repeatedly reconstructed from scratch.
That perspective is especially important for work on underdocumented and minoritized languages. GlossKit is not meant to replace human judgment, community governance, or careful analysis. It is meant to make serious documentation work more coherent, more traceable, and easier to carry forward into reusable research data.
Historical and comparative linguistics also depend on the quality and structure of primary documentation. The better recordings, transcripts, analyses, and metadata stay connected, the stronger the foundation becomes for lexicon building, morphosyntactic comparison, historical reconstruction, and future publication.
Research background
The research experience behind the product
The founder background matters because GlossKit is not being designed as generic software for text processing. It is being shaped by research questions and documentation workflows that demand linguistic structure, review, and provenance.
Historical and comparative linguistics
Shuan Osman Karim's research focuses on historical reconstruction based on morphosyntactic variation among endangered, underdocumented, and minoritized Iranic languages. That background informs GlossKit's emphasis on preserving analytical structure, provenance, and revision history rather than flattening primary data into one-off outputs.
Kurdish, Gorani, and Iranic languages
His work on Kurdish, Gorani, and related Iranic languages keeps GlossKit grounded in the realities of documentation for underdocumented and minoritized language communities. The platform is shaped by workflows that need to stay useful across variation, dialect continua, and historically layered datasets.
Morphology and morphosyntax
Research on ezafe, definiteness, applicatives, and nominal morphosyntax directly informs GlossKit's attention to reviewed token-level analysis, structured glossing, and evidence-bearing lexicon work. The product is designed for linguists who need analysis workflows that remain interpretable and editable.
Language documentation and computational workflows
GlossKit also grows out of work on innovative documentation methods, machine learning, and data-processing bottlenecks in language research. That combination motivates a modular web workflow that can connect recording, transcription review, analysis, and export without replacing human linguistic judgment.
Selected publications
Research outputs that inform GlossKit
This is a short curated list rather than a full bibliography. For the complete publications record, visit the founder's publications page.
The synchrony and diachrony of New Western Iranian nominal morphosyntax
Karim, Shuan Osman. 2021. The synchrony and diachrony of New Western Iranian nominal morphosyntax. PhD diss., The Ohio State University.
Gorani in its Historical and Linguistic Context
Karim, Shuan Osman, and Saloumeh Gholami. 2024. Gorani in its Historical and Linguistic Context. Berlin: De Gruyter.
Pattern Borrowing in The Southern Kurdish Zone
Karim, Shuan Osman. 2024. “Pattern Borrowing in The Southern Kurdish Zone.” In Gorani in its Historical and Linguistic Context, edited by Shuan Osman Karim and Saloumeh Gholami. Berlin: De Gruyter.
An applicative analysis of Soranî "absolute prepositions"
Karim, Shuan Osman, and Ali Salehi. 2022. “An applicative analysis of Soranî ‘absolute prepositions’.” In Neglected Syntactic Functions and Non-Syntactic Functions of Applicative Morphology, edited by Sara Pacchiarotti and Fernando Zúñiga. Berlin: De Gruyter.
The typology of verbal person/number syncretism in Western Iranic languages
Mohammadirad, Masoud, and Shuan Osman Karim. 2025. “The typology of verbal person/number syncretism in Western Iranic languages.” Language Dynamics and Change.
GlossKit origin
From recurring bottlenecks to a modular workflow
GlossKit is the product of recurring documentation bottlenecks. Recording, transcription, morphology, lexicon work, syntax, and export are each demanding in their own right, but the larger challenge is how often those stages are disconnected in practice.
Existing tools remain valuable and often indispensable, yet many projects still depend on brittle conversions, duplicate metadata entry, and workflows that make it difficult to preserve review history across stages. GlossKit aims to coordinate those stages in one modular web-based workflow without pretending that every analytical decision can or should be automated.
The project is designed to preserve human review, linguistic transparency, and community-centered documentation. That means keeping analytical structure visible, making machine-assisted steps reviewable, and supporting workflows that can carry well-formed data toward export and future publication-oriented work.
GlossKit: From Recording to Publication
The founder's project list presents GlossKit as an ongoing computational linguistics project. On the public site, that translates into a clear product direction: bridge recording, reviewed analysis, and export-oriented data workflows without losing linguistic detail along the way.
Collaborators and team
Founder-led, shaped through research collaboration
GlossKit is currently founder-led and shaped through research collaboration, fieldwork experience, and feedback from linguists working with underdocumented languages.
The project is being developed in conversation with the practical needs of documentation and analysis work rather than being presented as a large formal company team before that structure exists.
Interested in GlossKit or future collaboration?
GlossKit is being developed for researchers, documentation teams, and community-centered language projects that need better infrastructure for reviewed linguistic data.