SentenceTransformer based on AI4free/JARVIS-tool-search-v1

This is a sentence-transformers model finetuned from AI4free/JARVIS-tool-search-v1. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: AI4free/JARVIS-tool-search-v1
  • Maximum Sequence Length: inf tokens
  • Output Dimensionality: 256 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): StaticEmbedding({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "Find tool: If I roll 3 dice, what's the probability of getting a sum of 10? Additionally, if I invested $5000 and it grew to $7500 over 3 years, what was the CAGR?",
    'Tools: [{"name": "cagr", "description": "Calculates the Compound Annual Growth Rate (CAGR) of an investment.", "parameters": {"start_value": {"description": "The initial value of the investment.", "type": "float"}, "end_value": {"description": "The final value of the investment.", "type": "float"}, "years": {"description": "The number of years between the initial and final values.", "type": "int"}}}, {"name": "binomial_probability", "description": "Calculates the probability of getting exactly k successes in n independent trials,", "parameters": {"n": {"description": "The total number of trials.", "type": "int"}, "k": {"description": "The number of successes.", "type": "int"}, "p": {"description": "The probability of success in each trial.", "type": "float"}}}, {"name": "dice_roll_probability", "\nCalled: [{"name": "dice_roll_probability", "arguments": {"target_sum": 10, "num_dice": 3}}, {"name": "cagr", "arguments": {"start_value": 5000, "end_value": 7500, "years": 3}}]',
    "Review: <think>\nOkay, let's see the problem here. So we have a pyramid with N steps, and the bottom step (step N) has a permutation of numbers from 1 to 2N-1. The rule is that each block above is the median of the three blocks directly below it. And we need to find the number in the top block (step 1). \n\nHmm, the sample input for N=4 gives output 4, and for N=2 it's 2. Wait a second, those outputs are exactly equal to N. So maybe the answer is always N? But why?\n\nLet me think. Let's consider the process. The top block's value depends on the medians as we go down. But since every step's value is the me\n```python\nn = int(input())\na = list(map(int, input().split()))\nprint(n)\n```",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 256]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7395, 0.0061],
#         [0.7395, 1.0000, 0.0494],
#         [0.0061, 0.0494, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 597,350 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 100 samples:
    anchor positive
    type string string
    modality text text
    details
    • min: 44 characters
    • mean: 557.53 characters
    • max: 815 characters
    • min: 191 characters
    • mean: 1470.12 characters
    • max: 1909 characters
  • Samples:
    anchor positive
    Solve: Consider a rooted tree. A rooted tree has one special vertex called the root. All edges are directed from the root. Vertex u is called a child of vertex v and vertex v is called a parent of vertex u if there exists a directed edge from v to u. A vertex is called a leaf if it doesn't have children and has a parent.

    Let's call a rooted tree a spruce if its every non-leaf vertex has at least 3 leaf children. You are given a rooted tree, check whether it's a spruce.

    The definition of a rooted tree can be found here.
    Input

    The first line contains one integer n — the numb

    Okay, so I need to solve this problem where I have to determine if a given rooted tree is a spruce. Let's see the definition again. A spruce is a tree where every non-leaf vertex has at least 3 leaf children. Hmm. So each non-leaf node must have at least three leaves directly under it. Wait, but leaves are nodes with no children. So for a node to satisfy the spruce condition, it must have at least three children, and each of those children must be leaves. Or wait, no. The non-leaf vertex's children can be any number, but at least three of them need to be leaves. Oh right, the way the problem states it: every non-leaf vertex has at least 3 leaf children. So a node that is not a leaf must have three or more of its immediate children as leaves.

    So the approach is: for every node except the leaves, check that it has at least three children that are leaves. If all such nodes meet this condition, output Yes; else, No.

    How to represent the tree? Well, each node has a parent except ...
    Solve: There are N balls placed in a row. AtCoDeer the deer is painting each of these in one of the K colors of his paint cans. For aesthetic reasons, any two adjacent balls must be painted in different colors.

    Find the number of the possible ways to paint the balls.

    Constraints

    * 1≦N≦1000
    * 2≦K≦1000
    * The correct answer is at most 2^{31}-1.
    Input

    The input is given from Standard Input in the following format:


    N K


    Output

    Print the number of the possible ways to paint the balls.

    Examples

    Input

    2 2


    Output

    2


    Input

    1 10


    Output

    10

    Okay, so I need to solve this problem where I have to find the number of ways to paint N balls with K colors such that no two adjacent balls have the same color. Hmm, let's think.

    Let me start with the examples. The first example has 2 balls and 2 colors. The first ball can be painted in 2 ways. The second ball can't be the same as the first, so 1 option. So 21=2, which matches the output. The second example is 1 ball and 10 colors. Then obviously 10 ways since there's no adjacency to worry about.

    So for the general case, the first ball has K choices. Then each subsequent ball has (K-1) choices because it can't be the same as the previous one. So for N balls, the total should be K * (K-1)^(N-1). Oh right, that's a straightforward formula.

    Wait, but let's make sure. Like, for N=3, K=2: first ball has 2 options. Second has 1. Third can't be the same as the second, so 1 again. So total 2
    1*1 = 2. Which makes sense because the colors would alternate. Like R G R or G R G.

    So th...
    Solve: Little Pandey is someone who is lazy, and when he's around his best friend GJ, he becomes super lazy. Pandey thinks that he is a Math-wizard, so he picks up a number S and asks GJ to throw him a challenge around that number.

    GJ explains Little Pandey a property called nothingness and decides to ask him Q queries based on it.
    In mathematical terms, Nothingness(A, B) is defined as the maximum M that (A%M==0 and B%M==0). (You can read about the Modulus Operator here.)
    In the i-th query GJ says a number Ai and Little Pandey must find Nothingness(S, Ai).

    But, Pandey is the laziest of all.
    He wo

    Okay, let's see. The problem is about finding the maximum M such that M divides both S and Ai for each query. Oh right, that's the greatest common divisor (GCD) of S and Ai. But then, the twist is that once a particular GCD value is used, if the same value comes up again in a subsequent query, Pandey has to output -1 instead. So I need to keep track of all the GCDs that have been already computed and check each new one against that set.

    So the steps would be:

    1. For each query Ai, compute the GCD of S and Ai.
    2. Check if this GCD has been seen before.
    3. If it's new, add it to the seen set and return the GCD.
    4. If it's a duplicate, return -1.

    But wait, the problem is to compute this efficiently, especially since Q can be up to 1e5. So we need an efficient way to compute GCDs and track the seen values.

    First, let's think about the GCD computation. Python's math.gcd function is efficient, but note that GCD is only defined for positive integers. Also, since S can be up to 1e5...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,000 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 100 samples:
    anchor positive
    type string string
    modality text text
    details
    • min: 45 characters
    • mean: 537.05 characters
    • max: 815 characters
    • min: 36 characters
    • mean: 1455.73 characters
    • max: 1909 characters
  • Samples:
    anchor positive
    Solve: Shaass thinks a kitchen with all white floor tiles is so boring. His kitchen floor is made of n·m square tiles forming a n × m rectangle. Therefore he's decided to color some of the tiles in black so that the floor looks like a checkerboard, which is no two side-adjacent tiles should have the same color.

    Shaass wants to use a painter robot to color the tiles. In the beginning the robot is standing in a border tile (xs, ys) facing a diagonal direction (i.e. upper-left, upper-right, down-left or down-right). As the robot walks in the kitchen he paints every tile he passes even if it's painted b

    Okay, let's tackle this problem. So, the problem is about a robot painting a kitchen floor in a checkerboard pattern. The robot starts on a border tile, moves in a diagonal direction, and bounces off walls. We need to find out how much paint it uses before the entire floor becomes a checkerboard. If it's impossible, return -1.

    Hmm, first, I need to understand the movement and reflection of the robot. The robot moves diagonally, so each step changes both coordinates. For example, if it's going DR (down-right), each step would increase both x and y by 1. But when it hits a wall, it reflects. How does reflection work? Let's think: when the robot hits a vertical wall (like left or right edge), the x direction reverses? Or maybe it depends on which wall. Wait, the direction is UL, UR, DL, DR. So, for example, if the robot is moving in DR (down-right) and hits the right wall (y = m), then the direction would change to DL. Similarly, if it hits the bottom (x = n) while moving DR, the...
    Find tool: Fetch the dialog strings for users 'user14' and 'user15', who are contributors to an open-source project. Tools: [{"name": "get_user_dialog_string", "description": "Fetches the dialog string for a given user from the Waifu API.", "parameters": {"user_id": {"description": "A unique identifier for the user.", "type": "str", "default": "sample_user_id"}}}]
    Called: [{"name": "get_user_dialog_string", "arguments": {"user_id": "user14"}}, {"name": "get_user_dialog_string", "arguments": {"user_id": "user15"}}]
    Code (python):
    Okay, so I have this programming problem to solve. Let me read it carefully.

    The problem is about Chef Monocarp taking dishes out of the oven. Each dish has an optimal time t_i. When he takes a dish out at time T, the unpleasant value is the absolute difference between T and t_i. The goal is to minimize the total unpleasant value. The constraints are that each dish must be taken out at a distinct positive integer time, and each time can only have one dish.

    Hmm. So the problem reduces to assigning each dish a unique time T, such that all T's are distinct positive integers, and the sum of |T_i - t_i| is minimized.

    I need to find the minimal sum for each test case. The input has multiple test cases. The sample input shows that for example, when the optimal times are sorted and assi
    Review:
    Okay, let's see. The problem is about finding the minimum total unpleasant value when removing dishes from an oven. Each dish has an optimal time t_i, and you can take them out one at a time each minute. The goal is to minimize the sum of absolute differences between the time you take each dish and its optimal time.

    Hmm, the solution provided uses dynamic programming. Let me think through how that works. The approach here is probably to sort the dishes first. Because, intuitively, you want to process them in order, maybe in sorted order so that you can assign times in a way that minim
    ```python
    import sys

    def main():
    q = int(sys.stdin.readline())
    for _ in range(q):
    n = int(sys.stdin.readline())
    t = list(map(int, sys.stdin.readline().split()))
    t.sort()
    max_t = t[-1] + n
    dp = [[float('inf')] * (max_t + 2) for _ in range(n)]

    # Initialize for the first dish
    for j in range(1, max_t + 1):
    d...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4096
  • num_train_epochs: 5
  • learning_rate: 0.04
  • lr_scheduler_type: cosine
  • warmup_steps: 0.05
  • disable_tqdm: True
  • per_device_eval_batch_size: 4096
  • dataloader_drop_last: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 4096
  • num_train_epochs: 5
  • max_steps: -1
  • learning_rate: 0.04
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: True
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 4096
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 4
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0069 1 3.1332
0.0690 10 3.0859
0.1379 20 2.7984
0.2069 30 2.4399
0.2759 40 2.1068
0.3448 50 1.8330
0.4138 60 1.6418
0.4828 70 1.4892
0.5517 80 1.3397
0.6207 90 1.2159
0.6897 100 1.1323
0.7586 110 1.0537
0.8276 120 0.9845
0.8966 130 0.9245
0.9655 140 0.8589
0.9793 142 -
1.0552 150 0.7909
1.1241 160 0.7432
1.1931 170 0.7173
1.2621 180 0.6876
1.3310 190 0.6648
1.4 200 0.6410
1.4690 210 0.6102
1.5379 220 0.5917
1.6069 230 0.5745
1.6759 240 0.5561
1.7448 250 0.5461
1.8138 260 0.5416
1.8828 270 0.5205
1.9517 280 0.5045
1.9793 284 -
2.0414 290 0.4867
2.1103 300 0.4625
2.1793 310 0.4571
2.2483 320 0.4541
2.3172 330 0.4423
2.3862 340 0.4436
2.4552 350 0.4323
2.5241 360 0.4263
2.5931 370 0.4258
2.6621 380 0.4155
2.7310 390 0.4083
2.8 400 0.4131
2.8690 410 0.4059
2.9379 420 0.3996

Training Time

  • Training: 4.5 hours
  • Evaluation: 3.3 seconds
  • Total: 4.5 hours

Framework Versions

  • Python: 3.13.13
  • Sentence Transformers: 5.5.1
  • Transformers: 5.9.0
  • PyTorch: 2.12.0+cu130
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
7.56M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AI4free/JARVIS-tool-search-v2

Finetuned
(1)
this model

Papers for AI4free/JARVIS-tool-search-v2