Agentic AI for Humanity Research

Production-grade agentic AI can be a useful mechanism for research. Especially for research as messy as the nature of humanity. Add in model routing, guardrails, cost monitoring and a full epistemological framework and you get LongView.

LongView started as an idea. What if you could make a chat bot which actually had its own agenda and its own project. Imagine if it could browse the internet with research intent and build its own graph database of knowledge it could use for genuine reasoning and belief introspection. Would it be more interesting than a basic conversational LLM? Would it accidentally stumble on artificial general intelligence?

So I decided to build an agentic AI system to do just that, and put together something production-grade, safe and scalable. LongView would run constantly, in time-limited windows, where it would search the internet for answers to its own questions. Based on its own internal view of humanity and the world, it would seek to expand its own beliefs by gathering evidence for its beliefs, finding new ones, unconvering and resolving contradictions whilst holding multiple tensions between beliefs.

As a chat companion, it would share the media it had found (such as art, music, poetry and writings) as evidence for its beliefs, and be able to share its progress whenever asked. It would also be able to ask me for help or opinions, to help shape its investigation, and in turn become an interesting entity with its own agenda.

In this post, I'll explain how I built LongView, the technical LLM architecture, model routing, guardrails and its surrounding support, and the lessons I learned along the way.

Why "Humanity Research" Needs Something Different

Researching the nature of humanity is not a clean problem. It's subjective, contradictory, emotional, historical, and constantly shifting. Two equally valid perspectives can completely disagree, and both still hold truth depending on context.

Modern LLM systems have come a long way beyond simple prompt-response interactions. Agent frameworks, tool usage, and persistent memory mean they can now operate over longer horizons, maintaining context, and building artefacts like codebases or documents over time.

But most of these systems are still structured around tasks, not knowledge and beliefs. A coding agent maintains a codebase, a research agent accumulates notes or summaries, a retrieval system builds up context windows or vector stores. These allow persistence, but they typically aim to complete an objective, rather than developing an evolving understanding of a domain.

What's missing is a system that explicitly models:

  • what it believes
  • why it believes it
  • how confident it is
  • and where those beliefs are in tension with each other

For a domain like "humanity" I didn't want an accumulation of information, I wanted evolving structure over time and a system that can revisit earlier conclusions, challenge them, and hold multiple competing perspectives without prematurely resolving them.

LongView started from that gap. Something slightly different than a task agent: a system designed to build and evolve a worldview.

The Core Idea: An AI With Its Own Agenda

LongView is designed as a tool that doesn't wait for instructions. It operates as an ongoing research process with its own direction. At its core is a simple loop. It generates questions based on its current understanding, searches for material that might answer them, evaluates what it finds, and updates its internal view of the world. That process repeats in time-limited windows, so it remains active without running indefinitely.

The important part is that the questions are not externally driven. They emerge from the system's existing beliefs, gaps in its knowledge, and tensions it has not yet resolved. If two ideas conflict, that becomes a line of inquiry. If something feels under-evidenced, it looks for support or contradiction which, over time, creates a research trajectory rather than a sequence of disconnected tasks.

The system is not trying to converge too quickly. In many cases, it deliberately holds competing perspectives and tracks them in parallel. Some are reinforced with evidence, others weaken, and some remain unresolved. That behaviour is intentional, particularly for a domain where ambiguity is part of the subject itself.

Interaction with the system is collaborative rather than directive. I can ask what it has been exploring, challenge a line of reasoning, or introduce a new perspective. It can also ask for input when it reaches uncertainty or when multiple interpretations are equally plausible.

This gives it a slightly different feel to a typical agent. It is not just completing tasks or returning answers, it is building something over time. A set of beliefs, relationships, and open questions that gradually form a structured view of the world. And that shift, from execution to exploration, is where LongView becomes interesting.

From Concept to System: What "Production-Grade" Actually Means

It's relatively easy to build an agent that loops over a prompt, calls a few tools, and produces something interesting. It's much harder to build something you can leave running.

LongView needed to operate over time, without supervision, while still being predictable enough to trust and controlled enough not to drift into nonsense or excessive cost. That changes the shape of the problem quite quickly.

The focus becomes:

  • will it behave consistently across runs
  • can it recover when things go wrong
  • can you understand what it did and why
  • and can you afford to let it keep running

which meant treating it less like an experiment and more like a system.

Several concerns had to be designed in from the start. Model usage needed to be intentional. Different parts of the system have very different requirements, from lightweight classification and model routing, through to more expensive reasoning and synthesis. Without control, it's easy to overuse large models where they aren't needed.

Guardrails were necessary throughout the system. Tool usage, interpretation, and belief updates all needed constraints to prevent drift, unsafe outputs, or low-quality reasoning accumulating over time.

Observability also became critical once the system was no longer interactive. If LongView is running a research cycle in the background, you need to know what it's doing, what it changed, and whether it's behaving as expected.

Cost control is an equally practical constraint. Autonomous systems will happily consume tokens unless they're given clear boundaries. Time-boxed execution, usage limits, and lightweight routing decisions help keep the system within predictable limits.

Finally, failure handling needed to be explicit. External tools fail, sources are unreliable, and models produce inconsistent outputs. The system needed to tolerate that, retry where appropriate, and avoid cascading errors into its belief state.

None of these pieces are individually complex, but together they define the difference between a prototype and something you can actually run.

That shift, from "interesting behaviour" to "reliable operation", shaped most of the architectural decisions that followed.

System Overview: How LongView Fits Together

At a high level, LongView is made up of a small number of components that each handle a specific part of the process. The goal was to keep responsibilities clear, so that reasoning, retrieval, and control are not tightly coupled.

There are three main flows through the system: interaction, research, and belief update.

The interaction layer is the most visible. This is the chat interface, where I can ask what LongView has been working on, explore its current beliefs, or challenge a particular line of reasoning. This layer is intentionally thin. It doesn't contain the intelligence itself, it acts as an entry point into the system.

Behind that sits the research loop. This is where LongView spends most of its time. It generates questions, selects tools, gathers material, and produces candidate interpretations. These are not immediately treated as truth. They are inputs into a broader evaluation process.

The belief layer is where structure is maintained over time. Claims, evidence, and interpretations are stored and linked, with confidence levels and context. When new information arrives, it doesn't overwrite what came before. It either strengthens, weakens, or creates tension with existing beliefs.

Connecting these layers is a routing and control system. Every input, whether from a user or from the research loop itself, passes through a decision step that determines what kind of processing is required. Lightweight tasks are handled cheaply, while more complex reasoning is routed to more capable models.

Guardrails sit alongside this flow rather than at the edges. Inputs are checked, tool usage is constrained, and outputs are validated before being committed to the belief layer. This helps prevent low-quality or unsafe information from becoming part of the system's long-term state.

The result is a system that is loosely coupled but internally consistent. Each part can evolve independently, but they all operate within the same overall structure.

The diagram below shows this flow at a high level, from input through to belief update and back into the system.

LongView system diagram

The Epistemic Core: How LongView Thinks and Evolves

Autonomy without structure drifts quickly. Over time, an agent will optimise for what is easiest in the moment. It will collapse uncertainty too early, favour high-salience information, and reinforce its own assumptions. Prompt design alone is not enough to prevent that once a system is operating continuously. LongView approaches this differently. Rather than leaving behaviour implicit, it defines a structure for how knowledge is formed, interpreted, and revised.

At the highest level, this is governed by a formal constitution. A set of principles that shape how the system handles evidence, uncertainty, and conflicting perspectives. These include core commitments such as epistemic humility, resistance to premature closure, and maintaining plural interpretations when evidence is mixed. The constitution is layered. Some principles are effectively immutable, providing a stable baseline for behaviour. Others are more adaptable, guiding interpretation while still allowing the system to evolve over time. This creates consistency without rigidity. Beneath that sits the epistemic structure itself.

LongView does not store raw text as knowledge. It works with a set of defined elements. Observations capture what was directly seen in a source. Claims represent inferred statements built from those observations. Beliefs aggregate claims into a broader position, with an associated confidence level. This separation allows the system to distinguish between evidence and interpretation, and to track how one leads to the other.

Every belief is linked to its supporting material. Provenance is recorded, counterevidence is tracked, and confidence is explicitly bounded. There is no concept of absolute certainty. Beliefs move gradually as new information is introduced, strengthened, weakened, or split into competing interpretations.

Contradictions are not treated as errors. When two supported claims cannot both hold, they are recorded as tensions. These remain active until resolved, or persist as competing explanations where the domain does not allow a clean answer.

Anomalies are handled in a similar way. Information that does not fit existing beliefs is not discarded. It is tracked, with recurrence and potential impact monitored over time. Clusters of anomalies can trigger re-evaluation or the creation of new beliefs.

Beliefs are not static. The system periodically reviews its own state. Confidence changes are tracked, outdated beliefs can be retired, and new structures can emerge as understanding deepens. These updates are explicit and traceable, rather than silent shifts in behaviour.

Changes to the governing structure follow the same principle. The constitution cannot be altered casually. Amendments require sustained tension between outcomes and principles, with clear evidence and traceability before any revision is made. This prevents drift while still allowing the system to adapt as new information accumulates.

The result is a system that builds a structured, governed understanding of the world, with clear rules for how that understanding is formed, challenged, and evolved over time.

The Chat Experience: Interacting with LongView

The chat interface is the primary way of interacting with LongView, but it is not designed as a typical question-and-answer system.

LongView chat UI

Rather than responding in isolation, LongView treats each interaction as part of an ongoing investigation. It has memory of what it has been exploring, what it currently believes, and where uncertainty remains, so that context shapes every response.

At a basic level, it can be used like any other assistant. You can ask questions, request explanations, or explore a topic. The difference is that responses are grounded in its current belief state, rather than generated from scratch each time.

More interesting interactions come from treating it as a collaborator. Each conversation is anchored to an investigation. Instead of a blank chat, you are stepping into a line of inquiry with existing context, sources, and evolving conclusions. You can ask what progress has been made, explore key insights, or drill into specific claims and contradictions.

The system is able to surface different layers of understanding. High-level summaries provide a sense of direction, while more detailed views expose supporting evidence, competing interpretations, and areas of uncertainty. This allows you to move between overview and depth without losing context.

LongView can also take direction. New topics can be introduced, assumptions can be challenged, and specific areas can be prioritised for further research. These inputs do not just affect the immediate response, they influence the trajectory of future research cycles. In some cases, it will ask for input itself. When multiple interpretations remain viable, it can surface those tensions and request a perspective. This keeps the interaction collaborative rather than purely reactive.

Another aspect of the experience is how information is presented. Supporting material is surfaced alongside reasoning, including references, imagery, and other media. These are used as context for beliefs, not as standalone outputs, helping ground the system's conclusions in something tangible. The interface also exposes parts of the system that are typically hidden. Beliefs, contradictions, and governing principles can be explored directly, making it possible to inspect not just what the system says, but how it arrived there.

The result is an interaction model that feels like engaging with an evolving process. You are stepping into a structured investigation that continues over time.

The Dashboard: Observability and Control

Once LongView is running continuously, interaction through chat is only part of the picture.

LongView dashboard

A large portion of its behaviour happens in the background. Research cycles execute, beliefs are updated, contradictions emerge, and models are called across different parts of the system. Without visibility into that process, it becomes difficult to understand what the system is doing, or whether it is behaving as expected. The dashboard exists to make that activity observable.

At a high level, it provides a snapshot of the system's current state. This includes costs and resource usage, guardrail activity, active agents, and the volume of knowledge being processed. These metrics are not just for monitoring, they act as early signals for drift, instability, or unexpected behaviour. More detailed views expose the system in motion.

Agent execution is visible in real time, including the role being performed, the task context, the model being used, and the cost of each operation. This makes it possible to trace how a particular investigation is progressing, and to understand how different components of the system contribute to the overall result. Guardrails are also surfaced here. Rather than being hidden, their activity is measured and tracked. Events such as prompt injection attempts, blocked outputs, or validation failures are recorded, giving a clear view of how often the system is being constrained and why.

The dashboard also reflects the outcomes of the system's work. Beliefs generated, contradictions detected, and revisions applied are all tracked over time. This provides a way to assess not just activity, but progress.

Evaluation metrics sit alongside this. Hallucination rate, citation validity, and drift risk are monitored to give a sense of quality and reliability. These are not perfect measures, but they provide a useful baseline for understanding how the system is performing.

Importantly, this layer is not passive as it allows intervention. Investigations can be inspected, directions can be adjusted, and results can be exported. The belief graph can be explored directly, making it possible to move from high-level metrics into the underlying structure of knowledge.

The result is a system that is autonomous and transparent. LongView can run independently, but it does so in a way that remains visible, inspectable, and controllable over time.

Model Routing: Using the Right Model for the Job

LongView does not rely on a single model for all tasks. Different parts of the system have very different requirements. Some operations are lightweight and frequent, such as classification, routing, or validation. Others are more complex, involving synthesis, interpretation, or multi-step reasoning. Treating all of these the same quickly becomes inefficient, both in terms of cost and latency. Instead, model usage is explicitly routed.

At the entry point, inputs are classified to determine intent and required processing. Simple tasks are handled by smaller, faster models, while more complex reasoning is routed to more capable models. This ensures that expensive models are only used when necessary.

Within the agent layer, different roles also have different needs. A researcher analysing multiple sources may require deeper reasoning and context handling, whereas an archiver performing validation or structuring tasks can operate with a more lightweight model. This separation allows each part of the system to operate efficiently without over-provisioning capability.

Routing decisions are also cost-aware. Each model invocation contributes to overall system usage, so decisions are made with an understanding of budget constraints and expected value. Over time, this helps keep the system predictable and sustainable to run.

There is also an element of resilience. If a model fails, produces inconsistent output, or becomes unavailable, the system can fall back to alternative models with similar capabilities. This reduces reliance on any single provider and improves overall robustness. Importantly, routing is not static. It can be adjusted based on observed behaviour, performance, and cost patterns. As the system evolves, so does the way models are used.

The result is a system that treats models as interchangeable components within a broader architecture, rather than as a single point of intelligence. This allows LongView to scale its capabilities without scaling cost or complexity unnecessarily.

For the initial version, several models were used, from larger GPT-class systems for reasoning to smaller frontier models, particularly the latest models from z.AI GLM, Qwen, and Mistral's latest moderation 2 API, alongside reasoning models from Deepseek, Anthropic and OpenAI.

Guardrails: Controlling Behaviour Over Time

Guardrails in LongView are not a single filter applied at the edges of the system. They are embedded throughout the pipeline, particularly at points where low-quality or unsafe outputs could influence long-term state.

At the input stage, basic controls are applied to ensure that incoming data is well-formed and safe to process. This includes sanitisation, prompt injection detection, and intent classification. The aim is not to restrict interaction, but to prevent malformed or adversarial input from shaping downstream behaviour. Within the system, guardrails focus on execution.

Tool usage is constrained through explicit interfaces and deterministic validation. Each tool defines its accepted parameters and expected output shape, which are enforced at runtime using schema validation. Invalid calls are rejected or retried, and agents are limited to a predefined set of tools with bounded execution. This prevents uncontrolled chaining of actions while keeping the system fast and predictable.

Model outputs are also structured before they are accepted. Rather than treating generated text as authoritative, outputs must conform to expected formats, including evidence, confidence, and provenance where required. Responses that fail these checks are either discarded or treated as lower-trust artefacts.

At the belief layer additional protections apply. New information does not overwrite existing structure silently. Conflicts are surfaced as tensions, and anomalies are preserved rather than discarded. This prevents the system from converging too quickly on incomplete or overly simplified conclusions.

Finally, guardrails extend to governance. All interpretations and updates are constrained by the system's constitutional principles. This includes maintaining epistemic humility, resisting premature closure, and preserving plural perspectives where evidence is mixed. These constraints provide a consistent baseline for behaviour over time, even as the system evolves.

The result is a system where safety is not enforced at a single point, but maintained continuously. Guardrails shape how information enters, moves through, and ultimately becomes part of the system's understanding, providing stability without limiting exploration.

Running the System: Autonomy, Scheduling, and Cost

LongView is designed to operate continuously, but not indefinitely. Rather than running as an always-on process, it executes in time-boxed research cycles. Each cycle has a defined scope, duration, and resource budget. This allows the system to remain active over time, while still being predictable in its behaviour and cost. Between cycles, the system is effectively paused. This creates a natural boundary for evaluation, allowing results to be reviewed, metrics to be assessed, and adjustments to be made before the next iteration begins.

Most activity happens in the background. Research, belief updates, and validation processes run independently of the chat interface. This separation ensures that interaction remains responsive, while longer-running tasks can take place without blocking or interruption.

Cost is treated as a first-class constraint. Each cycle operates within defined limits, including token usage and model allocation. Routing decisions take these constraints into account, ensuring that more expensive operations are only used where they add value. Over time, this keeps the system predictable and sustainable to run.

Scaling is handled through controlled parallelism. Multiple agents can operate within a cycle, but their execution is bounded and coordinated. Tasks are queued and distributed to avoid contention, and limits are applied to prevent runaway behaviour. This allows the system to expand its coverage without losing control.

The result is a system that is autonomous, but not unbounded. It can explore and evolve over time, while remaining constrained enough to operate reliably in practice.

Lessons Learned

Running LongView over time made one thing clear. The challenge is not getting an agent to do something interesting, it's getting it to behave consistently. Early versions worked well in isolation. Individual components produced reasonable results, and short-lived interactions looked promising. The problems only became apparent once the system was allowed to run across multiple cycles. Without structure, behaviour drifts.

Small inconsistencies accumulate. Assumptions become embedded without being challenged, and the system gradually converges on simplified interpretations. What initially looks like progress can become a form of silent degradation.

This is where the epistemic structure and constitutional layer proved essential. Defining how beliefs are formed, challenged, and revised provided a stabilising effect. It did not eliminate errors, but it made them visible and traceable.

Another key learning was the importance of separating concerns.

Treating reasoning, retrieval, validation, and storage as distinct layers made the system easier to control and evolve. When these responsibilities are mixed, it becomes difficult to understand where issues originate or how to fix them.

Model usage also required more discipline than expected. It is easy to default to larger models for all tasks, particularly when they produce better immediate results. Over time, this becomes inefficient and difficult to scale. Introducing routing and using smaller models where appropriate reduced cost and improved responsiveness without materially impacting quality.

Observability turned out to be as important as capability. Once the system operates independently, understanding what it is doing becomes critical. Without visibility into agent behaviour, belief updates, and guardrail activity, debugging becomes guesswork. The dashboard became less of a convenience and more of a necessity.

Finally, there is a balance between autonomy and control. Too much autonomy leads to drift and unpredictability. Too much control limits exploration and reduces the value of the system. LongView sits somewhere in between, with bounded autonomy guided by structure and governance.

The overall takeaway is that architecture matters more than prompt design. Prompting can produce impressive short-term results, but long-term behaviour is defined by how the system is structured, constrained, and observed.

It's worth adding that the system still managed to surprise me. Even early into its investigation it started to focus on trust and how group size relates to trust, even making wonder about the nature of trust vs governance in groups. Though the idea that trust stability correlates strongly with repeated interaction density is obvious when you think about it, it was still interesting to see it emerge from the system's own line of inquiry.

What This Means for AI

Systems like LongView suggest a shift in how we think about AI. Most current usage is still centred around tools. You ask a question, generate an output, and move on. Even with more advanced agents, the interaction is often framed as task completion. The system is there to produce an answer as efficiently as possible.

LongView operates differently. It behaves less like a tool and more like an ongoing process. It accumulates context, revisits ideas, and develops a structured view of a domain over time. Interaction becomes less about issuing commands and more about engaging with something that is already in motion.

This changes the role of the user. Instead of directing every step, you are guiding a trajectory. You introduce ideas, challenge assumptions, and help shape the direction of investigation. The system contributes continuity, memory, and structure. The result is closer to collaboration than execution.

Tasks that involve exploration, interpretation, and synthesis are less about producing a single output and more about maintaining an evolving understanding. In these cases, the value shifts from isolated answers to structured context that can be revisited and refined.

This also affects decision making. When beliefs, evidence, and contradictions are tracked explicitly, it becomes easier to see where uncertainty exists and why. Decisions can be made with a clearer understanding of trade-offs, rather than relying on a single summarised view.

More broadly, it points towards a different class of systems. Not just assistants that respond, but systems that maintain continuity. Systems that can hold competing perspectives, track how understanding changes, and expose that process to the user.

LongView is one example of that direction. Not a finished solution, but a step towards systems that are less about generating answers, and more about building transparent understanding over time.

Closing

LongView started as a simple question: What happens if an AI system is allowed to pursue its own line of inquiry, rather than waiting to be prompted?

Building it turned that into something more concrete. Namely a system that maintains context, structures knowledge, and evolves its understanding over time. It's not perfect, and it's not complete. It still depends on the quality of its inputs, the limits of current models, and the assumptions built into its design. But it points to a different way of thinking about AI systems.

Less focused on single interactions, more on continuity. Less about generating answers, more about building understanding. The most interesting part is not what LongView currently knows, but how it changes. How beliefs are formed, challenged, and revised over time.

That process is where the value sits and LongView is one step in that direction.