Instructions to use arnavj007/gemma-js-instruct-finetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arnavj007/gemma-js-instruct-finetune with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="arnavj007/gemma-js-instruct-finetune")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arnavj007/gemma-js-instruct-finetune")
model = AutoModelForCausalLM.from_pretrained("arnavj007/gemma-js-instruct-finetune")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use arnavj007/gemma-js-instruct-finetune with PEFT:
```
Task type is invalid.
```
Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use arnavj007/gemma-js-instruct-finetune with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "arnavj007/gemma-js-instruct-finetune"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arnavj007/gemma-js-instruct-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/arnavj007/gemma-js-instruct-finetune

SGLang

How to use arnavj007/gemma-js-instruct-finetune with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "arnavj007/gemma-js-instruct-finetune" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arnavj007/gemma-js-instruct-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "arnavj007/gemma-js-instruct-finetune" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arnavj007/gemma-js-instruct-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use arnavj007/gemma-js-instruct-finetune with Docker Model Runner:
```
docker model run hf.co/arnavj007/gemma-js-instruct-finetune
```

Model Card for gemma-js-instruct-finetune

Model Details

Model Description

This is the model card for gemma-js-instruct-finetune, a fine-tuned version of the gemma-2b-it model. This fine-tuned model was trained to improve the performance of generating long-form, structured responses to JavaScript-related instructional tasks. The fine-tuning process used the QLoRA (Quantized Low-Rank Adaptation) method, enabling efficient parameter tuning on limited hardware resources.

Developed by: Arnav Jain and collaborators
Shared by: Arnav Jain
Model type: Decoder-only causal language model
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: gemma-2b-it

Model Sources

Repository: gemma-js-instruct-finetune
Dataset: Evol-Instruct-JS-Code-500-v1
Demo: Weights & Biases Run

Uses

Direct Use

The model can be directly used for generating solutions to JavaScript programming tasks, creating instructional code snippets, and answering technical questions related to JavaScript programming.

Downstream Use

This model can be further fine-tuned for specific programming domains, other languages, or instructional content generation tasks.

Out-of-Scope Use

This model is not suitable for:

Non-technical, general-purpose text generation
Applications requiring real-time interaction with external systems
Generating solutions for non-JavaScript programming tasks without additional fine-tuning

Bias, Risks, and Limitations

Recommendations

Users should validate generated code for correctness and security.
Be cautious of potential biases or inaccuracies in the dataset that could propagate into model outputs.
Avoid using the model for sensitive or critical applications without thorough testing.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arnavj007/gemma-js-instruct-finetune")
model = AutoModelForCausalLM.from_pretrained("arnavj007/gemma-js-instruct-finetune")

def get_completion(query: str):
    prompt = f"<start_of_turn>user {query}<end_of_turn>\n<start_of_turn>model"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=1000)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

response = get_completion("Create a function in JavaScript to calculate the factorial of a number.")
print(response)

Training Details

Training Data

The training dataset consisted of 500 JavaScript instructions paired with relevant outputs. These instructions focused on tasks like code snippets, algorithm implementations, and error-handling scenarios.

Dataset: Evol-Instruct-JS-Code-500-v1

Training Procedure

Preprocessing

Instructions and outputs were formatted using a standardized prompt-response template.
Data was tokenized using the Hugging Face tokenizer for gemma-2b-it.

Training Hyperparameters

Training regime: QLoRA (Quantized Low-Rank Adaptation)
Batch size: 1 per device
Gradient accumulation steps: 4
Learning rate: 2e-4
Training steps: 100
Optimizer: Paged AdamW (8-bit)

Speeds, Sizes, Times

Training runtime: ~1435 seconds
Trainable parameters: 3% of the model (~78M)

Evaluation

Testing Data, Factors & Metrics

Testing Data

The test dataset consisted of 100 JavaScript instructions held out from the training set.

Metrics

Quality of generated code snippets
Ability to handle complex prompts with multiple sub-tasks

Results

The fine-tuned model demonstrated significant improvement in handling long prompts and generating structured code. It provided complete solutions for tasks like API creation with advanced features (e.g., caching, error handling).

Summary

Fine-tuning with QLoRA enabled robust performance improvements, making the model capable of generating detailed instructional responses.

Environmental Impact

Hardware Type: NVIDIA Tesla T4 GPU (free-tier Colab)
Hours used: ~0.4 hours
Carbon Emitted: Minimal (estimated using ML Impact Calculator)

Technical Specifications

Model Architecture and Objective

The model uses a decoder-only architecture optimized for causal language modeling tasks.

Compute Infrastructure

Hardware: NVIDIA Tesla T4
Software:
- Transformers: 4.38.2
- PEFT: 0.8.2
- Accelerate: 0.27.1
- BitsAndBytes: 0.42.0

Citation

BibTeX:

@misc{Jain2024gemmajs,
  author = {Arnav Jain and collaborators},
  title = {gemma-js-instruct-finetune},
  year = {2024},
  howpublished = {\url{https://huggingface.co/arnavj007/gemma-js-instruct-finetune}}
}

More Information

For questions or feedback, contact Arnav Jain.

Downloads last month: 3

Safetensors

Model size

3B params

Tensor type

F16