Tylogi
Research Blog

RESEARCH DOSSIER / LONG-TERM AI

Research Blog

An overview of Tylogi's long-term intelligence research, model training, and SillyBench evaluation loop.

This page outlines Tylogi's research direction. Our core focus is "long-term intelligence": AI that can preserve identity, memory, emotional continuity, and usable context across long interactions rather than excelling only in single-turn responses.

We explore how these capabilities move from theory into practical products. This includes designing model architectures, recording key research decisions publicly, and building evaluation systems that measure whether models become more stable, immersive, and useful over time.

Models

qwen3-4b-tylogiorm is a post-trained roleplay model based on Qwen/Qwen3-4B. It is designed for users who care more about character consistency, distinctive voice, and narrative continuity than general assistant coverage.

Its goal is not to become a universal frontier assistant. It is narrower and more practical: make a small model genuinely strong at immersive roleplay.

In the archived SillyBench tests, this specialization produced a large jump over the untuned Qwen3-4B base model and reached a level that competes with much larger reference models on roleplay-specific evaluation.

SillyBench

SillyBench is the evaluation layer of the research loop. It targets failures that matter in long-context roleplay: persona drift, emotional discontinuity, immersion breaks, repetitive language, and narrative collapse over many turns rather than factual QA accuracy.

Its value is to move beyond vague claims. The repository contains methodology notes, scenario assets, benchmark data, execution tools, and an operational interface so results can keep informing training and product decisions.

In practice, it helps us decide whether a model feels more alive for reasons that can be identified, compared, and revisited later.

[ INTERACT TO CREATE LIFE ]