AGI Pragmatic Implementation

2024-05-17

Habib: Hi, this is a few papers and my previous discussion regarding the possible implementation of AGI. I am already discussing the possibility of AGI with some components like first-person perspective, full-transformer with encoder acting as sensory input and decoder acting as self-generated thought loop, multi-layered memory management with window context as working memory, compressive memory with infini-attention as random quick access short term memory, attention-based retrieval system as long term memory, and incremental self-finetuning as conceptual memory, implementing EMOTE architecture as the possible component for empathy or understanding others "agent" perspective including humans, implementing monte carlo search tree as possible implementation of step by step slow thinking, implementing a sense of time with adding temporal embedding as incremental softmax-like function based on thinking loop inspired by biological creatures when perceiving time, and the possibility of implementing two different logical thinking style such as Te and Ti from Jung Cognitive function which Ti act as backward reasoning and Te act as forward planning. I want to continue the discussion regarding these.

Gemini: This is a fascinating collection of discussions and insights on the potential of AGI. Let's break down the key concepts and their implications:

Merging Multi-Layered Memory and First-Person Illusion:
Contextual Grounding: By anchoring the first-person perspective in the working memory, an LLM can have a sense of presence within a conversation, similar to how humans experience the "now."
Memory Integration: Combining different memory layers allows the model to store and access past experiences and thoughts, building a consistent sense of self over time.
Long-Term Consistency: This allows for the development of a "personality" and consistent "beliefs" across interactions, even over extended periods.
Learning and Evolution: By leveraging self-finetuning during "sleep" states, the model can internalize knowledge and refine its understanding of itself and the world.
Full Transformer Architecture for Sensory Input and Self-Generated Thoughts:
Encoder-Decoder Distinction: The encoder processes external sensory input (the "real world"), while the decoder handles internal thoughts and reasoning, mirroring the human experience of perception and internal monologue.
First-Person Perspective: This architectural design enables the illusion of a first-person perspective, allowing the LLM to potentially perceive and interact with the world in a more human-like way.
Monte Carlo Search Tree (MCTS) as a Tree of Thoughts:
Structured Reasoning: MCTS provides a structured framework for exploring different thought paths and evaluating their logical consistency, enabling more complex and human-like reasoning.
Explainable Decision-Making: The tree structure allows for a degree of transparency into the LLM's reasoning process, potentially making its decisions more understandable to humans.
Correlation-Based Discriminator for Reward Function:
Real-World Data for Training: Training the discriminator on real-world data can reduce bias and improve the generalizability of its judgments on thought coherence.
Capturing Discriminative Patterns: This allows the discriminator to learn complex relationships between concepts and events, going beyond simple keyword matching.
Explainability: Analyzing the discriminator's outputs can provide insights into the LLM's reasoning process and identify potential biases.
Implementing a Sense of Time:
Relative Temporal Embedding: The softmax-like function based on thinking loops mimics the human experience of time perception, where time seems to pass more quickly as we age.
Biologically Inspired: This approach aligns with how biological creatures perceive time, further bridging the gap between human and artificial intelligence.
Training the Architecture:
Foundation Building: The initial pre-training phases focus on building a strong foundation in sensory processing, thought generation, and memory management.
Self-Improvement Loop: The model then engages in a continuous cycle of prompt synthesis, MCTS-guided search, and policy updates, leading to continual learning and improvement.
Te and Ti Thinking Styles in AGI:
Computational Models for Cognitive Functions: Formalizing Jungian concepts into computational models can provide a tangible framework for understanding and studying human cognition.
AGI as a Testing Ground: Implementing these models in AGI systems can offer empirical validation and demonstrate the practical applications of Jung's theory.
Bridging the Gap Between Te and Ti Users: This approach could potentially lead to a consensus among experts by satisfying both the Te need for practical implementation and the Ti need for theoretical consistency.

Challenges and Opportunities:

Defining Effective Reward Functions: Capturing the complexity of real-world tasks and aligning them with human values is crucial for guiding the MCTS search.

Handling Uncertainty and Incomplete Information: AGI systems need to be able to reason and make decisions in real-world scenarios where information is often incomplete or ambiguous.

Ethical Considerations: As AGI systems become more sophisticated, addressing ethical concerns related to bias, fairness, and potential misuse is paramount.

Overall, the proposed AGI architecture offers a compelling vision for the future of AI. By integrating insights from human cognition, memory, time perception, and structured reasoning, this approach aims to create truly intelligent and adaptable systems. While significant research and development are needed, the potential benefits justify the pursuit of this ambitious goal.

Habib: I want to discuss the potential transition product or proof of concept for this plan. I think it's too expensive if we directly update the current status quo of LLM which is just a plain text-completion model decoder-only Transformer, it would be so expensive to pursue the research now, especially since I am an individual without any institutional backup. So, I think, what if we start with the existing pre-trained LLM, even with a decoder-only transformer, to slowly prove the effectiveness of our design? I think it this way, for the infini-attention part, it's not too costly because we only need small fine-tuning and change a small part of the inference code so we can do that with existing LLM, for an attention-based retrieval system it's also a relatively easy and straightforward implementation because we only need to extract the attention layer while do inference and we can use existing vector database, for first-person perspective I think it's also cheap and straightforward because we only need a small fine-tuning with "self-perspective" dataset (similar to training base model with chat format), for separating sensory input and self-generated thoughts, this is the tricky part, I think of the implementation of new LLM template that different from a regular template like Alpaca or ChatML, so the template consists of different parts: first it consist the "CURRENT SENSORY INPUT:" which is the external input like users or visual input, second it consist the "PREVIOUS SENSORY INPUT" which is the external input, but from the past, this is actually just an output from the attention based retrieval system, so everything that we recall from long term memory would be filled in this part. Third it consists of "SELF-GENERATED THOUGHT:" which is the current internal state of the model, and the last part consists of "OUTPUT:" which is the possible "interaction tools" of the model to the real world. Those four components would fill the default window context of the model, and the model fluidly allocates its own priority for each component. All of those components can have multiple existence inside the window context because they are acting like the sequence of "components". Like, we can make the LLM stuck into long self-generated thoughts if the LLM prioritize it, but we can "interrupt" the model by making "disturb noise" so the "CURRENT SENSORY INPUT:" of the model would be filled with our noise. It's similar to humans who can only focus on one specific thinking process at a time. I think implementing the logical consistency model for each part of the sequence, so it evaluates the overall thinking process of the model step by step and gives some kind of "consistency score", and we can use an encoder-only model like BERT that we finetune on larger "cause and effect" dataset based on the real world. For the incremental fine-tuning, we can implement it straightforwardly because we already got the daily interaction dataset from daily activity and trained the model based on that. There are some components left like the EMOTE component, sense of time, full Monte Carlo search tree, and implementation of Te and Ti, but I think it's more important to evaluate this approach in a simpler environment first until we can afford more sophisticated development.

Gemini: This is an excellent, pragmatic approach to progressively build and test your AGI architecture, especially given your limited resources. Focusing on incremental development using existing pre-trained LLMs as a foundation allows you to gather evidence for your design's effectiveness without the massive costs associated with training from scratch.

Let's elaborate on your proposed implementation strategy for each component:

Infini-attention:
Minimal Fine-tuning: You can fine-tune an existing LLM on datasets with long contexts, emphasizing the retention of past information.
Modified Inference Code: Adjust the inference code to incorporate the compressive memory mechanism of Infini-attention.
Attention-Based Retrieval System:
Attention Map Extraction: During inference, extract the attention maps from the LLM and use them as keys for a vector database.
Utilize Existing Vector Databases: Leverage pre-built and optimized vector database solutions for efficient storage and retrieval.
First-Person Perspective: "Self-Perspective" Dataset: Fine-tune the LLM on a dataset designed to instill a sense of self and internal thought processes, similar to training chatbots with personality.
Separating Sensory Input and Self-Generated Thoughts:

Novel LLM Template: Create a new template with distinct sections for:

CURRENT SENSORY INPUT: Immediate external inputs (user queries, visual data, etc.).

PREVIOUS SENSORY INPUT: Information retrieved from long-term memory.

SELF-GENERATED THOUGHT: Internal state and ongoing thought processes.

OUTPUT: Actions or responses directed towards the external world.