Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use AI4free/JARVIS-tool-search-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("AI4free/JARVIS-tool-search-v2")
sentences = [
"Solve: Once Volodya was at the museum and saw a regular chessboard as a museum piece. And there were only four chess pieces on it: two white rooks, a white king and a black king. \"Aha, blacks certainly didn't win!\", — Volodya said and was right for sure. And your task is to say whether whites had won or not.\n\nPieces on the chessboard are guaranteed to represent a correct position (every piece occupies one cell, no two pieces occupy the same cell and kings cannot take each other). Thus, your task is only to decide whether whites mate blacks. We would remind you that it means that the black king can be",
"<think>\nOkay, let's tackle this problem. So, the task is to find the capital city such that the number of road inversions needed is minimized. The roads are unidirectional, and the country forms a tree structure if we ignore the directions. Hmm.\n\nFirst, I need to understand the problem correctly. The country has n cities connected by n-1 roads, forming a tree when directions are ignored. The capital needs to be a city from which all other cities are reachable via the directed roads. The goal is to choose the capital (or capitals) that require the least number of road reversals.\n\nSo, for each possible city as the capital, I need to calculate how many roads would need to be reversed. Then, pick the minimum and all cities that achieve that minimum.\n\nBut calculating this directly for each city would be O(n^2), which is not feasible since n can be up to 2e5. So we need a smarter approach.\n\nWait, since the underlying structure is a tree, maybe there's a way to compute this efficiently with a traversal. Like, using a post-order traversal and dynamic programming.\n\nLet me think. Let's imagine that the tree is rooted at some node. For each node, the number of inversions required would be the sum of inversions needed in each subtree plus the inversions from the parent to the child. Wait, but the direction of the edges matters. For example, if the tree is rooted at u, and there's an edge from u to v, then that edge is in the correct direction (no inversion needed). But if there's an edge\n```\nimport sys\nfrom collections import deque\n\ndef main():\n n = int(sys.stdin.readline())\n adj = [[] for _ in range(n + 1)]\n for _ in range(n - 1):\n s, t = map(int, sys.stdin.readline().split())\n adj[s].append((t, 0))\n adj[t].append((s, 1))\n \n parent = [0] * (n + 1)\n direction = [0] * (n + 1)\n sum_inv = 0\n \n stack = [(1, -1, False)]\n while stack:\n \n```",
"<think>\nOkay, let's see. I need to solve this programming problem. Hmm. Let's read the problem carefully.\n\nSo, the problem is about tracking artifacts owned by Kostya and his allies. There are basic and composite artifacts. Basic ones are bought, and composite ones are made by combining certain numbers of basic artifacts. When you create a composite, the components are consumed. Each time after a purchase, we check if a composite can be made, and if so, it's done immediately. The output is the count of each artifact each ally has, sorted lexicographically.\n\nThe input gives k allies, n basic artifacts, m composite recipes, and q purchases. Each purchase is a friend (ai) buying a basic artifact. Then, after each purchase, check if the composite can be made. If yes, create it, consume the components, and add the composite to the inventory. And this process might cascade, like making a composite could allow another composite to be made, but the problem states that after the i-th purchase, no more than one opportunity arises. So we don't have to handle multiple steps here. Wait, the problem says: \"It is guaranteed that after the i-th purchase no more than one opportunity to collect the composite artifact appears.\" So after each purchase, there's at most one possible composite that can be crafted. That simplifies things. So for each purchase, after adding the artifact, check if any composite can be crafted, and if exactly one can be (as per the problem's guarantee), then do it.\n\nWa\n```\nfrom collections import defaultdict\n\nk, n, m, q = map(int, input().split())\n\nbasics = set()\nfor _ in range(n):\n basics.add(input().strip())\n\ncomposites = []\nfor _ in range(m):\n line = input().strip()\n name_part, components_part = line.split(': ')\n comp_name = name_part\n components = {}\n for item in components_part.split(', '):\n art, cnt = item.split()\n components[ar\n```",
"<think>\nOkay, so I need to determine if the black king is in checkmate based on the given positions of two rooks, the white king, and the black king. Let's break down the problem step by step.\n\nFirst, I remember that checkmate occurs when the black king is currently under attack (in check) and has no legal moves to escape. So my plan is to check two things: 1) Is the black king in check right now? 2) Does every possible move of the black king lead to a position where it's still in check?\n\nHmm. Let's think about how to model the chessboard. Each position is given as a two-character string like 'a6'. So I'll need to convert these into coordinates. Maybe represent the board as 8x8 grid with rows 1-8 and columns a-h. Let's map the letters a-h to 0-7 and numbers 1-8 to 0-7 as well? Or maybe 1-based to 8-based? Wait, let's see. For example, 'a1' would be (0,0) if we use zero-based, but that's up to how I parse it. The exact coordinates need to be handled correctly.\n\nSo the first step is to parse all the input positions into their x and y coordinates. For example, 'a6' would be x=0 (since a is first), y=5 (since 6 is the sixth row, but if rows are 1-8, maybe 1 is the first row). Wait, in chess, rows (ranks) go from 1 to 8, with 1 being the bottom for white. So when converting 'a6' to coordinates, the letter is the file (a-h, columns) and the number is the rank (1-8, rows). So for example, 'a1' is the bottom-left for white's perspective. So maybe to model this as (x, y) where x is 0 \n```\ndef pos_to_coords(s):\n x = ord(s[0]) - ord('a')\n y = int(s[1]) - 1\n return (x, y)\n\ndef is_rook_attacking(rook_pos, target_pos, occupied):\n rx, ry = rook_pos\n tx, ty = target_pos\n if rx != tx and ry != ty:\n return False\n if rx == tx:\n start = min(ry, ty)\n end = max(ry, ty)\n for y in range(start + 1, end):\n if (rx, y) in occupied:\n \n```"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from AI4free/JARVIS-tool-search-v1. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): StaticEmbedding({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"Find tool: If I roll 3 dice, what's the probability of getting a sum of 10? Additionally, if I invested $5000 and it grew to $7500 over 3 years, what was the CAGR?",
'Tools: [{"name": "cagr", "description": "Calculates the Compound Annual Growth Rate (CAGR) of an investment.", "parameters": {"start_value": {"description": "The initial value of the investment.", "type": "float"}, "end_value": {"description": "The final value of the investment.", "type": "float"}, "years": {"description": "The number of years between the initial and final values.", "type": "int"}}}, {"name": "binomial_probability", "description": "Calculates the probability of getting exactly k successes in n independent trials,", "parameters": {"n": {"description": "The total number of trials.", "type": "int"}, "k": {"description": "The number of successes.", "type": "int"}, "p": {"description": "The probability of success in each trial.", "type": "float"}}}, {"name": "dice_roll_probability", "\nCalled: [{"name": "dice_roll_probability", "arguments": {"target_sum": 10, "num_dice": 3}}, {"name": "cagr", "arguments": {"start_value": 5000, "end_value": 7500, "years": 3}}]',
"Review: <think>\nOkay, let's see the problem here. So we have a pyramid with N steps, and the bottom step (step N) has a permutation of numbers from 1 to 2N-1. The rule is that each block above is the median of the three blocks directly below it. And we need to find the number in the top block (step 1). \n\nHmm, the sample input for N=4 gives output 4, and for N=2 it's 2. Wait a second, those outputs are exactly equal to N. So maybe the answer is always N? But why?\n\nLet me think. Let's consider the process. The top block's value depends on the medians as we go down. But since every step's value is the me\n```python\nn = int(input())\na = list(map(int, input().split()))\nprint(n)\n```",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 256]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7395, 0.0061],
# [0.7395, 1.0000, 0.0494],
# [0.0061, 0.0494, 1.0000]])
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| modality | text | text |
| details |
|
|
| anchor | positive |
|---|---|
Solve: Consider a rooted tree. A rooted tree has one special vertex called the root. All edges are directed from the root. Vertex u is called a child of vertex v and vertex v is called a parent of vertex u if there exists a directed edge from v to u. A vertex is called a leaf if it doesn't have children and has a parent. |
|
Solve: There are N balls placed in a row. AtCoDeer the deer is painting each of these in one of the K colors of his paint cans. For aesthetic reasons, any two adjacent balls must be painted in different colors. |
|
Solve: Little Pandey is someone who is lazy, and when he's around his best friend GJ, he becomes super lazy. Pandey thinks that he is a Math-wizard, so he picks up a number S and asks GJ to throw him a challenge around that number. |
|
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| modality | text | text |
| details |
|
|
| anchor | positive |
|---|---|
Solve: Shaass thinks a kitchen with all white floor tiles is so boring. His kitchen floor is made of n·m square tiles forming a n × m rectangle. Therefore he's decided to color some of the tiles in black so that the floor looks like a checkerboard, which is no two side-adjacent tiles should have the same color. |
|
Find tool: Fetch the dialog strings for users 'user14' and 'user15', who are contributors to an open-source project. |
Tools: [{"name": "get_user_dialog_string", "description": "Fetches the dialog string for a given user from the Waifu API.", "parameters": {"user_id": {"description": "A unique identifier for the user.", "type": "str", "default": "sample_user_id"}}}] |
Code (python): |
Review: |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 4096num_train_epochs: 5learning_rate: 0.04lr_scheduler_type: cosinewarmup_steps: 0.05disable_tqdm: Trueper_device_eval_batch_size: 4096dataloader_drop_last: Truedataloader_num_workers: 4batch_sampler: no_duplicatesper_device_train_batch_size: 4096num_train_epochs: 5max_steps: -1learning_rate: 0.04lr_scheduler_type: cosinelr_scheduler_kwargs: Nonewarmup_steps: 0.05optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Trueproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 4096prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Truedataloader_num_workers: 4dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.0069 | 1 | 3.1332 |
| 0.0690 | 10 | 3.0859 |
| 0.1379 | 20 | 2.7984 |
| 0.2069 | 30 | 2.4399 |
| 0.2759 | 40 | 2.1068 |
| 0.3448 | 50 | 1.8330 |
| 0.4138 | 60 | 1.6418 |
| 0.4828 | 70 | 1.4892 |
| 0.5517 | 80 | 1.3397 |
| 0.6207 | 90 | 1.2159 |
| 0.6897 | 100 | 1.1323 |
| 0.7586 | 110 | 1.0537 |
| 0.8276 | 120 | 0.9845 |
| 0.8966 | 130 | 0.9245 |
| 0.9655 | 140 | 0.8589 |
| 0.9793 | 142 | - |
| 1.0552 | 150 | 0.7909 |
| 1.1241 | 160 | 0.7432 |
| 1.1931 | 170 | 0.7173 |
| 1.2621 | 180 | 0.6876 |
| 1.3310 | 190 | 0.6648 |
| 1.4 | 200 | 0.6410 |
| 1.4690 | 210 | 0.6102 |
| 1.5379 | 220 | 0.5917 |
| 1.6069 | 230 | 0.5745 |
| 1.6759 | 240 | 0.5561 |
| 1.7448 | 250 | 0.5461 |
| 1.8138 | 260 | 0.5416 |
| 1.8828 | 270 | 0.5205 |
| 1.9517 | 280 | 0.5045 |
| 1.9793 | 284 | - |
| 2.0414 | 290 | 0.4867 |
| 2.1103 | 300 | 0.4625 |
| 2.1793 | 310 | 0.4571 |
| 2.2483 | 320 | 0.4541 |
| 2.3172 | 330 | 0.4423 |
| 2.3862 | 340 | 0.4436 |
| 2.4552 | 350 | 0.4323 |
| 2.5241 | 360 | 0.4263 |
| 2.5931 | 370 | 0.4258 |
| 2.6621 | 380 | 0.4155 |
| 2.7310 | 390 | 0.4083 |
| 2.8 | 400 | 0.4131 |
| 2.8690 | 410 | 0.4059 |
| 2.9379 | 420 | 0.3996 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
minishlab/potion-base-8M