A modular workflow for language documentation and linguistic analysis.

GlossKit is designed to coordinate recordings, transcript review, morphology, lexicon curation, syntax initialization, and export packaging through a shared project workspace.

GlossKit organizes work around projects, texts, consultants, and modular services. Recordings or texts enter a project workflow, transcription proceeds through queue-backed processing and review, morphology produces approved analyses and finalized glosses, lexicon curation turns reviewed evidence into dictionary entries, syntax builds editable first-pass parses, and the export layer assembles structured outputs from the project state that exists across modules.

Platform workflow overview

GlossKit organizes work around projects, texts, consultants, and modular services. Recordings or texts enter a project workflow, transcription proceeds through queue-backed processing and review, morphology produces approved analyses and finalized glosses, lexicon curation turns reviewed evidence into dictionary entries, syntax builds editable first-pass parses, and the export layer assembles structured outputs from the project state that exists across modules.

Core Implemented

Upload

Recording intake

Upload recordings or create text records inside a project-oriented workflow. Recording metadata and text state are meant to stay connected from the beginning.

In development

Segment

Segmentation workflow

Prepare recordings for transcript work and segment-oriented analysis. The backend includes segmentation machinery, but the polished end-to-end workflow still needs deeper validation.

Core Implemented

Transcribe

Queued ASR review

Queue transcription jobs, retrieve transcript outputs, and revise transcript text through a human review step. The system is designed around review rather than unattended automation.

Core Implemented

Gloss

Morphology workspace

Move from segmented text into token-level analysis, approval workflows, review cards, and finalized glosses for downstream use. The backend audit identifies Morphology as one of the more developed analysis modules.

Core Implemented

Curate Lexicon

Evidence-first curation

Import finalized glosses as evidence, group candidate items, and curate dictionary entries with linked examples. This keeps lexicon building tied to reviewed analytical evidence.

In development

Parse Syntax

UD-style initialization

Generate editable first-pass dependency parses from finalized morphology and review them sentence by sentence. The workflow is real, but parse quality and export ergonomics remain under active development.

In development

Export

Structured outputs

Assemble analysis outputs and project metadata into export-oriented files and bundles. Export exists today, but coverage varies by module and data state.

In development

Archive / Publish

Packaging for deposit

Prepare project materials for archival handoff and future publication-oriented work. The platform provides a foundation for this, but comprehensive publication workflows are still planned and expanding.

Module overview

The platform currently centers on a gateway service that coordinates project state and module handoffs. Around that core, the present backend includes an ASR layer for ingestion and transcription workflows, a more developed morphology workspace for approval-based analysis, an evidence-first lexicon workflow, a rule-driven syntax module for editable UD-style initialization, and an archival service that assembles multi-module export packages.

Core Implemented

Gateway / Project Workspace

GlossKit includes a central project workspace layer that coordinates users, projects, collaborators, consultants, texts, and cross-module state. It acts as the organizing layer that connects recordings and texts to downstream analysis services.

Current backend foundation

Project and collaborator management, text and segment state, consultant and location metadata, and cross-module orchestration.

Still being developed

Stronger deployment hardening, tighter integration validation, and more robust service-boundary enforcement.

Technical credibility

The current backend already persists project, text, collaborator, consultant, and cross-module completion state through a dedicated gateway service.

In development

ASR & Segmentation

GlossKit includes an ASR layer for recording upload, media normalization, queued transcription jobs, transcript retrieval, and transcript editing. It already functions as a real ingestion and transcription component, but some training-related and legacy API paths remain partial or placeholder implementations.

Current backend foundation

Recording upload, audio normalization, queued transcription, transcript editing, and EAF-related export handling.

Still being developed

Stronger segmentation validation, tighter correction-to-model feedback loops, and clearer retirement of older placeholder route surfaces.

Technical credibility

The service already uses a queue-backed worker model for transcription rather than presenting transcription as a simple front-end mockup.

Core Implemented

Morphology

GlossKit’s morphology workspace supports token-level analysis, approvals, review cards, project lexicon learning, and final gloss locking for downstream workflows. The backend audit identifies Morphology as one of the more developed analysis modules.

Current backend foundation

Project-scoped morphology documents, approval-based glossing, lexicon learning from approvals, finalized gloss export, and shared base-library selection.

Still being developed

Stronger standalone service isolation, clearer consolidation of legacy and newer base-library concepts, and richer downstream syntax payloads.

Technical credibility

The morphology layer already persists approved analyses, morpheme occurrences, and versioned grammar and lexicon state in a structured backend model.

Core Implemented

Lexicon

GlossKit supports evidence-based lexicon curation from finalized morphological analysis rather than silently auto-generating dictionary entries. It is currently implemented within the morphology backend as a deliberate downstream curation workflow.

Current backend foundation

Import from locked glosses, evidence grouping, candidate review, curated entry creation, and provenance-bearing examples.

Still being developed

Cleaner separation from older legacy lexicon paths, richer export workflows, and stronger source-metadata handling on evidence items.

Technical credibility

The current design explicitly distinguishes evidence tables from curated dictionary entries, which is a stronger foundation than simple auto-populated word lists.

In development

Syntax & UD Treebanking

GlossKit can generate editable first-pass dependency parses from finalized morphology in a UD-style structure and preserve user edits and approvals. The backend includes a working foundation for this review workflow, but the automatic first guess remains heuristic rather than high-confidence.

Current backend foundation

Morphology-to-syntax parsing, sentence persistence, token-level editing, save and approve flows, and CoNLL-U-like serialization.

Still being developed

Stronger parse heuristics, fuller preservation of morphology detail, and more complete user-facing export and review ergonomics.

Technical credibility

The syntax module already persists documents, sentences, token annotations, and safe reparsing behavior rather than functioning as a disposable preview layer.

In development

Archival Export

GlossKit includes an archival export layer that can assemble project metadata and selected outputs from multiple modules into a structured bundle. The archival layer provides an early export-oriented foundation, though it remains less broadly tested than some core analysis modules.

Current backend foundation

Multi-service archive bundling, inclusion of syntax outputs, EAF and audio-related artifacts, curated lexicon content, and optional text-oriented files.

Still being developed

Larger-scale export handling, broader test coverage, and clearer behavior when some downstream modules are incomplete.

Technical credibility

The archival service already resolves project data across multiple backend modules and assembles structured export bundles rather than exporting isolated files only.

Planned

Phonology

Phonology is part of the broader product direction for GlossKit, especially where documentation workflows need closer integration with segmental analysis, alignment, and sound-focused annotation. It is not yet a verified standalone backend module in the current audit.

Current backend foundation

Product direction only.

Still being developed

Module definition, workflow scope, and implementation path.

Technical credibility

The current backend audit does not verify a separate phonology service, so this is presented as planned work rather than active functionality.

Planned

Grammar Writing

GlossKit is intended to support later-stage grammar-writing and publication-oriented workflows by carrying structured analyses forward into exportable materials. The current audit shows some text and LaTeX-oriented export capability, but not a dedicated grammar-writing module.

Current backend foundation

Early foundations through export-oriented structured data and optional text-based outputs.

Still being developed

Dedicated grammar-writing workflows, templates, and module-specific authoring support.

Technical credibility

The platform already preserves structured intermediate analysis that could support later grammar-writing workflows, but the dedicated module remains roadmap work.

Current functionality vs roadmap

The current backend already supports substantial project, analysis, and export foundations, while several higher-risk areas remain under active development.

Currently supports

  • Project and collaborator organization
  • Consultant and location metadata
  • Recording upload and queued transcription
  • Transcript editing
  • Approval-based morphology workflows
  • Finalized gloss export to downstream modules
  • Evidence-based lexicon curation
  • Editable first-pass syntax parses
  • Multi-module archive bundling

Under active development

  • Stronger segmentation and correction loops
  • Improved syntax initialization quality
  • Richer module-to-module provenance
  • Broader export coverage and publication-oriented packaging
  • Further deployment hardening and access-control boundary review

Planned

  • Phonology-oriented workflows
  • Dedicated grammar-writing support
  • Expanded publication workflows built on reviewed structured data

Technical architecture

GlossKit uses a modular service architecture rather than a monolithic pipeline. A central gateway coordinates project state and brokers requests to specialized backends for ASR, morphology, syntax, and archival export. The current architecture includes relational persistence across modules, queue-backed transcription processing, versioned morphology data, structured syntax serialization, and downstream bundle assembly for export-oriented workflows.

Product visuals on this page are conceptual illustrations unless explicitly labelled as captured from the current application.
Conceptual GlossKit project workspace illustration

Project Workspace Concept

An illustrative workspace view for organizing documentation materials and related tasks.

GlossKit morphology interface concept

Morphological Review Concept

A conceptual interface direction for interlinear glossing and word-level review.

GlossKit syntax view concept

Syntax Review Concept

A conceptual interface direction for dependency-oriented syntactic analysis.

GlossKit audio segmentation concept

Segmentation Concept

A conceptual interface direction for segmentation and carefully reviewed transcription work.

ELAN archival export showing multi-tier time-aligned annotation with phrase, word, morpheme, gloss, POS, and free translation tiers

ELAN-Style Export Example

A conceptual export example showing the kind of multi-tier structure GlossKit is intended to support for archival preparation and further review.

Why modular architecture matters

Language documentation work rarely moves in a straight line. Transcription is revised, morphological analyses are approved over time, lexicon entries are curated from evidence, syntax needs correction, and export expectations vary by project and archive. A modular architecture makes those boundaries explicit, so workflows can remain connected without pretending that every stage has the same assumptions, pace, or maturity.

See how the workflow fits together.

GlossKit is intended for researchers and collaborators who need more than isolated tools or ad hoc file exchange. Request beta access if you want to follow the platform as it develops or discuss a future use case.