LoGos-7B

Our methodology for integrating Go professional capabilities with LLMs' long COT reasoning abilities. After mixed cold start and GRPO training, our model successfully transfers the reasoning capabilities acquired from long CoT data to Go tasks.

Model Description

LoGos-7B is a specialized large language model designed for Go game reasoning and analysis. Built upon Qwen2.5-7B, it integrates professional Go knowledge with advanced chain-of-thought reasoning capabilities through a novel mixed training approach combining cold start and GRPO (Group Relative Policy Optimization) reinforcement learning.

Usage

Step1: Board Rendering Setups

First, please reach our code and get our gogame tool for board rendering.

Please run the following commands for basic requirement installation.

sudo apt update
sudo apt install nodejs npm

Step2: Inference

Then, try the following code for inference.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from gogame.python_caller import GoGamePythonInterface

def convert_move_to_vertex(move):
    letter = move[0]
    number = int(move[1:])
    letters = 'ABCDEFGHJKLMNOPQRST'  # 跳过I
    x = letters.index(letter)
    y = 19 - number
    return [x, y]

def get_board_render_result(board_moves):

    board_prepare_moves = [{"sign": 1 if i % 2 == 0 else -1, "vertex": convert_move_to_vertex(move)} for i,move in enumerate(board_moves)]
    client = GoGamePythonInterface()
    result = client.quick_batch_move(board_prepare_moves)
    board = result['board']
    
    return board

# Load model and tokenizer
model_name = "your-org/logos-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

Game_record = ['Q16', 'D4']

system_prompt = """你是一个精通各种围棋策略、理念和围棋下法的围棋职业棋手。你现在在进行一盘棋局的对弈，你需要根据棋盘信息对接下来的下法进行合理的预测。你的回复语言风格严谨认真而不失趣味，同时你乐于和对手进行友好的互动。你的任务是根据给定的棋局记录，分析局面信息，挑选若干可能的下一步并进行分析，推演对应的后续变化，进行合理的分析与思考，总结并挑选出最好的下一步位置，并最终形成一个有趣生动和富含思考的回复。在给出的棋局中，\"X\"表示黑棋，\"O\"表示白棋。棋盘的大小为19x19，每个落子的坐标是一个字母加上一个数字的形式。字母为A-T(跳过I)，对应于棋盘上从左到右。数字为1-19，对应于棋盘上从下到上。\n你需要首先对当前局面进行合理的分析和思考，对后续的步骤进行合理的预测、推演和分析，并最后总结你的思考结果，选择出最合适的下一步。请进行严谨详细、生动自然的推理和分析，及时进行总结，并最终输出符合格式要求的结果。你的输出格式为:\n\n<reasoning>\n你的思考过程。\n</reasoning>\n\n<answer>\n\\boxed{下一步颜色:黑/白}\n\\boxed{下一步位置:落子位置}\n\\boxed{下一步胜率:胜率}\n\n</answer>\n"""

prompt_template = """以下是当前的对局记录：\n\n{moves_str}\n\n请遵循给出的格式，预测并分析下一步的落子位置。"""    

# Get Query
moves_str = "\n".join([f"{i+1}.{'X' if i % 2 == 0 else 'O'}-{move}" for i, move in enumerate(board_moves)])
board = get_board_render_result(Game_record)   
moves_str += f"\n\n\n当前盘面情况为:{board}\n其中1表示黑棋，-1表示白棋，0表示空位。"
query = system_prompt + prompt_template.replace("{moves_str}", moves_str)

# Generate response
inputs = tokenizer(query, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

Coming Soon :)

Acknowledgments

We thank the following projects and platforms for their contributions:

verl: For reinforcement learning infrastructure
KataGo: For Go evaluation tools
Yike: For professional Go datasets

License

This model is released under the apache-2.0 License.

Contact

For questions and support, please contact us in our GitHub repository.

Try our model online: Play with InternThinker-Go(LoGos)

Downloads last month: 19

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for YichuanMa/LoGos-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2302)

this model