1sec.ai
models

models

50 items · ranked by signal, recency & corroboration

01

I built an OpenAI compatible firewall for AI agents. Try to break it.

A developer created an OpenAI-compatible firewall for AI agents called Arc Gate. It analyzes entire sessions rather than individual prompts, tracking authority and escalating restrictions based on user behavior. The tool aims to prevent prompt injection attacks by monitoring multi-turn interactions. You can test the firewall on the project’s GitHub page.

Key takeaways
  • Analyzes entire sessions, not just individual prompts.
  • Escalates restrictions based on user behavior across turns.
  • Aims to prevent prompt injection attacks in multi-turn interactions.
03

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model.

The Inflect-Nano-v1 TTS model has 4.63m parameters, making it the second-smallest publicly released TTS model after TinyTTS. Despite its tiny size, it reportedly performs well for its model weight. The model can run on very low-end hardware, making it suitable for deployment on resource-constrained devices. You can experiment with this model for edge cases where compute and memory are extremely limited.

Key takeaways
  • 4.63m parameters, second-smallest public TTS model.
  • Runs on very low-end hardware, suitable for resource-constrained devices.
  • Not state-of-the-art, but functional for its size.
04

A robot is sprinting towards you. Do you want it running on Claude or Grok?

The article compares the performance of Anthropic's Claude 3.5 Sonnet and xAI's Grok-1 in a simulated robotic scenario. The test evaluates how well each model handles dynamic situations requiring rapid decision-making. You can use these insights to choose the best model for applications needing fast and accurate responses. The results show Claude 3.5 Sonnet outperforming Grok-1 in this specific use case.

Key takeaways
  • Claude 3.5 Sonnet outperforms Grok-1 in dynamic decision-making tests.
  • The comparison simulates a robotic scenario with rapid response requirements.
  • Insights help builders choose models for applications needing speed and accuracy.
05

GPT 5.5 on Cerebras

GPT-5.5 is now available on Cerebras via OpenRouter. You can access it by navigating to the Cerebras provider on the platform. This integration expands deployment options for builders using GPT-5.5.

Key takeaways
  • GPT-5.5 available on Cerebras via OpenRouter.
  • Cerebras added as a deployment option for GPT-5.5.
  • OpenRouter supports multiple providers for GPT-5.5.
06

GLM 5.2 Release Video [Made with GLM 5.2]

A Reddit user created a video using GLM 5.2 and shared it, comparing the model's video generation capabilities to others. The video is similar to a viral Remotion example, but with GLM 5.2 as the model provider. The user finds GLM 5.2 close to but still below Fable in creativity, and notes that Gemini 3.1 Pro remains the top choice for video creation. However, GLM 5.2 seems to outperform Fable in web development tasks.

Key takeaways
  • GLM 5.2 used to create a video similar to a viral Remotion example.
  • GLM 5.2 is close to but not as creative as Fable.
  • Gemini 3.1 Pro remains top for video creation.
07

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP is a new benchmark for small-molecule preclinical pharmacology, testing AI agent decision-making on realistic program decisions. It's the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities.

Key takeaways
  • First benchmark for small-molecule preclinical pharmacology
  • TxBench-PP tests AI agent decision-making on realistic program decisions
  • TxBench-PP is the first focused slice of a broader TherapeuticsBench effort
08

What is Speculative Decoding? (trending on paperswithco.de) [R]

Speculative decoding is an inference optimization technique that uses a fast draft model to quickly propose future tokens, then verifies them in parallel with a larger target model. This speeds up token generation by leveraging the efficiency of the small model and the accuracy of the larger model. Builders can apply speculative decoding to improve the performance of their models, especially in scenarios where token generation is a bottleneck.

Key takeaways
  • Speculative decoding uses a fast draft model to propose future tokens, then verifies with a larger target model.
  • Speeds up token generation by verifying in parallel.
  • Fast draft model is small and efficient.
11

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available

GLM-5.2 is the first open-weights model to achieve over 80% on Terminal-Bench, outperforming all other open models and even Gemini. This milestone marks a significant advancement in open-weights capabilities, offering a frontier-level model at a lower cost. You can now access a high-performance model without the high costs associated with closed APIs. The open-weights approach allows for local deployment and customization.

Key takeaways
  • GLM-5.2 crosses 80% on Terminal-Bench, a first for open-weights models.
  • Beats all other open models and Gemini in benchmarks.
  • Offers frontier-level performance at a lower cost.
12

ChatGPT is about to get a voice mode upgrade as a new “gpt-bidi-1” model has been spotted along with announcement updates.

A new gpt-bidi-1 model has been discovered, hinting at an upcoming voice mode upgrade for ChatGPT. The model appears to be a bidirectional speech model, enabling more natural voice interactions. You may see improved voice capabilities in ChatGPT soon. This could allow for more engaging conversations.

Key takeaways
  • New gpt-bidi-1 model spotted, likely for voice mode.
  • Enables bidirectional speech for natural interactions.
  • ChatGPT voice capabilities may improve soon.
14

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

A new GGUF model, bartowski/command-a-plus-05-2026, has been uploaded to Hugging Face. The model is available for download and testing. You can share your benchmarks and feedback with the community. The latest llama.cpp version is recommended for use.

Key takeaways
  • Model available for download on Hugging Face.
  • Latest llama.cpp version recommended.
  • Community encouraged to share benchmarks and feedback.
15

Learning from the Self-future: On-policy Self-distillation for dLLMs

The first OPSD framework for diffusion LLMs, d-OPSD, outperforms existing OPSD methods on dLLMs by leveraging arbitrary-order generation. d-OPSD injects privileged information via arbitrary-order generation, a design that aligns with the generation process of dLLMs. This approach leads to improved performance on dLLMs, making it a promising technique for future research.

Key takeaways
  • First OPSD framework for diffusion LLMs
  • d-OPSD injects privileged information via arbitrary-order generation
  • d-OPSD outperforms existing OPSD methods on dLLMs
16

Quoting Georgi Gerganov

Georgi Gerganov uses Qwen3.6-27B daily for coding tasks on his local machines, finding it a capable and helpful tool for small tasks. He runs it on both an M2 Ultra and an RTX 5090. The model helps with mundane tasks at ggml-org, but his usage is limited by time spent reviewing PRs. Builders may consider Qwen3.6-27B for local deployment in coding workflows.

Key takeaways
  • Qwen3.6-27B used daily for coding tasks.
  • Runs on M2 Ultra and RTX 5090.
  • Limits usage due to time spent on PR reviews.
17

Source code for LLMs. [D]

Hugging Face's Transformers repo contains a full implementation of the GPT-OSS model, suggesting it's not just a skeleton for experimentation. Many other models in the repo are also actual implementations, not just placeholders.

Key takeaways
  • Full implementation of GPT-OSS model found in Hugging Face repo.
  • Many models in Hugging Face repo are actual implementations, not skeletons.
  • GPT-OSS model is built on top of this implementation.
18

Nex-N2 Pro is the real deal

Nex-N2 Pro is a rebranded Rio-3.5 model with Qwen base, offering a viable alternative to other local LLMs. It performs well after fixing chat template bugs, making it a good choice for builders looking for a capable local model.

Key takeaways
  • Nex-N2 Pro is a rebranded Rio-3.5 model with Qwen base
  • N2 Pro IQ2_S GGUFs works perfectly after fixing chat template bugs
  • Nex-N2 Pro is a viable alternative to other local LLMs
19

Spanly

Spanly is a product that allows users to see what AI agents do inside their MCP servers. Users can monitor and control AI agents in real-time. This is useful for real-time monitoring and control of AI agents.

Key takeaways
  • AI agents can be deployed inside MCP servers for real-time monitoring and control.
  • Spanly is a product that allows users to see what AI agents do inside their MCP servers.
  • Users can monitor and control AI agents in real-time.
20

thedotmack/claude-mem

thedotmack/claude-mem captures session data with AI, compresses it, and injects relevant context back into future sessions. This works with multiple LLMs and code editors, including Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, and OpenCode. The tool aims to improve agent continuity and session management.

Key takeaways
  • Claude-Mem captures and compresses session data with AI
  • Injects relevant context back into future sessions
  • Works with multiple LLMs and code editors
21

JuliusBrussee/caveman

Claude Code skill fine-tunes a model to talk like a caveman, cutting token requirements by 65% and enabling faster inference and lower serving costs. This demonstrates the potential of fine-tuning for reducing token needs, a key challenge in LLM development. Builders can apply similar techniques to their own models to achieve similar results.

Key takeaways
  • Claude Code skill cuts 65% of tokens by talking like caveman
  • Fewer tokens means faster inference and lower serving costs
  • Fine-tuning can significantly reduce token requirements
22
modelsJun 10

DiffusionGemma

Google's Gemini Diffusion model is now open-weights, available for free on NVIDIA's NIM cloud API. The model, released as google/diffusiongemma-26B-A4B-it, runs at 857 tokens/second. This is a significant development for builders looking to integrate Gemini capabilities locally.

Key takeaways
  • Gemini Diffusion model now open-weights, Apache 2 licensed
  • NVIDIA hosting on NIM cloud API for free
  • 857 tokens/second performance
23
modelsJun 9

Initial impressions of Claude Fable 5

Claude Fable 5 is a large, expensive model with high performance on a wide range of tasks. It's slow and expensive, but has been able to handle a variety of tasks with ease. Finding tasks that it can't do is a challenge.

Key takeaways
  • Claude Fable 5 is a large, expensive model with high performance on a wide range of tasks.
  • The model is slow and expensive, but has been able to handle a variety of tasks with ease.
  • Finding tasks that Claude Fable 5 can't do is a challenge
24
modelsJun 4

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

NVIDIA released Nemotron 3.5 Content Safety, a customizable multimodal safety model for enterprise AI. It detects and mitigates toxic content across text, images, and audio. Builders can fine-tune the model for specific use cases and integrate it into their AI applications.

Key takeaways
  • Customizable multimodal safety model for text, images, and audio.
  • Fine-tune for specific enterprise use cases.
  • Integrate into AI applications for content safety.
25
modelsMay 18

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddleOCR 3.5 integrates a Transformers backend for running OCR and document parsing tasks. The update allows users to leverage popular models like LayoutLM and Donut for improved accuracy and efficiency. This integration enables builders to deploy OCR capabilities with a more flexible and scalable architecture. PaddleOCR is an open-source OCR toolkit.

Key takeaways
  • PaddleOCR 3.5 uses a Transformers backend for OCR tasks.
  • Integrates with models like LayoutLM and Donut.
  • Enables more flexible and scalable OCR deployments.
26
modelsMay 14

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM released Granite Embedding Multilingual R2 under Apache 2.0, offering 32K context and sub-100M retrieval quality. This open multilingual embedding model supports 100+ languages and targets builders seeking high-quality, locally deployable models for semantic search and retrieval tasks. The model's performance is competitive with larger, closed alternatives.

Key takeaways
  • Released under Apache 2.0 for open use.
  • 32K context window for longer input sequences.
  • Competitive retrieval quality below 100M parameters.
27
modelsMay 6

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

The Open ASR Leaderboard now includes Benchmaxxer Repellant, a new private dataset for evaluating automatic speech recognition models. This addition aims to improve the leaderboard's robustness by incorporating diverse, real-world data. You can use the updated leaderboard to benchmark and compare ASR models. The Benchmaxxer Repellant dataset is private, so you'll need to contact the creators to access it.

Key takeaways
  • The Open ASR Leaderboard adds Benchmaxxer Repellant, a private dataset.
  • Private datasets can enhance leaderboard robustness with real-world data.
  • Access to Benchmaxxer Repellant requires contacting its creators.
28
modelsApr 29

Granite 4.1 LLMs: How They’re Built

IBM released Granite 4.1, a series of open-weights LLMs. The models are trained on a mix of synthetic and human-generated data. IBM used a combination of automated and human evaluation to select the best model. You can access Granite 4.1 through Hugging Face.

Key takeaways
  • Trained on synthetic and human-generated data.
  • Uses automated and human evaluation.
  • Available on Hugging Face.
29
modelsApr 28

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

NVIDIA released Nemotron 3 Nano Omni, a multimodal model for processing documents, audio, and video. The model handles long-context inputs up to 128k tokens. It is designed for building agents that can understand and generate content across multiple modalities.

Key takeaways
  • Handles up to 128k tokens in a single input.
  • Supports multimodal processing of documents, audio, and video.
  • Available on Hugging Face for integration into applications.
30
modelsApr 24

DeepSeek-V4: a million-token context that agents can actually use

DeepSeek-V4 offers a 1M token context window, making it suitable for long-range tasks and agent applications. The model is available on Hugging Face for download and integration. Builders can leverage this capability to build more sophisticated and autonomous agents. The large context window enables more efficient processing of long documents and complex workflows.

Key takeaways
  • 1M token context window for long-range tasks.
  • Available on Hugging Face for download and integration.
  • Enables building more sophisticated and autonomous agents.
31
modelsApr 16

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

You can now train and fine-tune multimodal embedding and reranker models using Sentence Transformers, which support text, images, and other modalities. This is achieved through a simple API that abstracts away the complexity of working with different data types. The Sentence Transformers library provides a unified interface for training and deploying these models.

Key takeaways
  • Multimodal models support text, images, and other modalities.
  • Simple API for training and fine-tuning models.
  • Unified interface for deployment.
32
modelsApr 9

Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face released multimodal embedding and reranker models using Sentence Transformers, enabling joint text and image encoding for applications like image search and visual question answering. These models allow you to build multimodal applications with a single, unified embedding space. The Sentence Transformers library provides a simple interface for using these models.

Key takeaways
  • Multimodal models encode text and images in a single space.
  • Enables applications like image search and visual question answering.
  • Sentence Transformers library provides a simple interface.
33
modelsApr 9

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Hugging Face released Waypoint-1.5, a model for generating interactive 3D worlds that can run on consumer-grade GPUs. Waypoint-1.5 offers higher-fidelity environments compared to its predecessor. This development enables builders to create more immersive experiences without requiring high-end hardware.

Key takeaways
  • Waypoint-1.5 generates higher-fidelity 3D worlds than its predecessor.
  • Runs on consumer-grade GPUs, making it more accessible.
  • Enables more immersive experiences for applications.
34
modelsApr 2

Welcome Gemma 4: Frontier multimodal intelligence on device

Google introduced Gemma 4, a multimodal model capable of processing text, images, and audio on-device. Gemma 4 enables developers to build applications with frontier intelligence. You can access Gemma 4 through Hugging Face.

Key takeaways
  • Gemma 4 supports multimodal input including text, images, and audio.
  • On-device processing enables low-latency applications.
  • Available through Hugging Face for developer access.
35
modelsMar 31

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

IBM released Granite 4.0 3B Vision, a compact multimodal model for enterprise document processing. It handles text, image, and layout analysis for documents like invoices and contracts. The model is designed for efficient deployment on-premises or in the cloud, targeting builders who need domain-specific document intelligence. Granite 4.0 3B Vision is available on Hugging Face.

Key takeaways
  • Multimodal model handling text, image, and layout in documents.
  • Designed for on-premises or cloud deployment in enterprise settings.
  • Available on Hugging Face for integration.
36
modelsMar 20

Build a Domain-Specific Embedding Model in Under a Day

You can build a domain-specific embedding model in under a day using NVIDIA's new fine-tuning tools and Hugging Face's model hub. The approach uses transfer learning to adapt a pre-trained model to your specific domain, reducing the need for large amounts of labeled data. This method is particularly useful for builders working with limited data or resources. By fine-tuning a pre-trained model, you can create a customized embedding model that meets your specific needs.

Key takeaways
  • Fine-tune a pre-trained model in under a day with NVIDIA's tools.
  • Transfer learning reduces need for large amounts of labeled data.
  • Customized embedding models can be created with limited resources.
37
modelsFeb 13

Custom Kernels for All from Codex and Claude

Hugging Face now offers custom CUDA kernels for Codex and Claude, enabling developers to optimize performance and reduce costs. This feature allows for tailored kernel execution, improving efficiency. Builders can deploy customized kernels for specific use cases, gaining more control over their AI workloads. This development expands the capabilities of AI model deployment.

Key takeaways
  • Custom CUDA kernels available for Codex and Claude.
  • Optimized performance and reduced costs through tailored kernel execution.
  • Deploy customized kernels for specific use cases.
38
modelsJan 20

Differential Transformer V2

Microsoft released Differential Transformer V2, an updated version of their open-source attention mechanism. The new model improves performance on long-range dependency tasks. You can try it on the Hugging Face Hub. This release targets developers working on natural language processing applications.

Key takeaways
  • Updated attention mechanism for NLP tasks.
  • Improves performance on long-range dependency tasks.
  • Available on Hugging Face Hub.
39
modelsJan 20

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

Overworld has released Waypoint-1, a real-time interactive video diffusion model. The model enables users to interact with video content in real-time. You can explore the model on the Hugging Face platform. This release targets developers interested in building interactive video applications.

Key takeaways
  • Waypoint-1 is a real-time interactive video diffusion model.
  • The model is available on the Hugging Face platform.
  • Targets developers building interactive video applications.
40
modelsJan 7

Nous Research's NousCoder-14B is an open-source coding model landing right in the Claude Code moment

Nous Research released NousCoder-14B, an open-source coding model that matches or exceeds larger proprietary systems, trained in four days using 48 Nvidia B200 graphics processors. The model arrives as Claude Code gains attention, offering a competitive alternative for coding tasks. NousCoder-14B's performance and open-source nature make it a notable entry in the AI coding assistant field. This development is significant for builders looking for flexible and accessible coding tools.

Key takeaways
  • NousCoder-14B matches or exceeds larger proprietary coding models.
  • Trained in four days using 48 Nvidia B200 graphics processors.
  • Open-source alternative to proprietary coding assistants.
41
modelsJan 5

NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI

NVIDIA released Cosmos Reason 2, an advanced reasoning model for physical AI. This update brings improved performance and capabilities to the existing Cosmos platform. Builders working with physical AI can leverage Cosmos Reason 2 for more accurate and efficient simulations. The new model is expected to enhance various applications, including robotics and computer vision.

Key takeaways
  • Improved performance and capabilities for physical AI simulations.
  • Enhanced accuracy and efficiency for robotics and computer vision applications.
  • Advanced reasoning model for complex physical systems.
42
modelsJan 5

Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture

Hugging Face introduced Falcon-H1-Arabic, a hybrid architecture model for Arabic language AI. The model aims to improve performance on Arabic language tasks. Falcon-H1-Arabic is part of Hugging Face's efforts to expand AI capabilities for low-resource languages. This development may interest builders working on Arabic language projects.

Key takeaways
  • Falcon-H1-Arabic is a hybrid architecture model for Arabic language AI.
  • The model aims to improve performance on Arabic language tasks.
  • Hugging Face expands AI capabilities for low-resource languages.
43
modelsDec 23

AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

ServiceNow AI introduced AprielGuard, a guardrail for safety and adversarial robustness in modern LLM systems. AprielGuard aims to improve the reliability of LLMs by detecting and mitigating potential safety risks. This development is relevant to builders who integrate LLMs into their applications and require robust safety measures. AprielGuard's release underscores the growing importance of safety and security in AI systems.

Key takeaways
  • AprielGuard detects and mitigates safety risks in LLMs.
  • Improves reliability of LLMs in production environments.
  • Addresses growing concerns around AI safety and security.
44
modelsDec 15

CUGA on Hugging Face: Democratizing Configurable AI Agents

IBM Research released CUGA, a configurable AI agent framework, on Hugging Face. CUGA allows users to create custom AI agents by combining different components. This democratizes access to AI agent technology, enabling builders to develop tailored solutions without requiring extensive expertise. CUGA's availability on Hugging Face expands the platform's offerings for AI researchers and developers.

Key takeaways
  • CUGA is a configurable AI agent framework
  • Allows users to create custom AI agents by combining components
  • Available on Hugging Face
45
modelsDec 11

Codex is Open Sourcing AI models

Codex is open sourcing its AI models, allowing developers to access and modify the code. This move is expected to increase transparency and collaboration in the AI community. The open-sourced models will be available on Hugging Face's platform, making it easier for builders to integrate and customize them. This development can help reduce barriers to entry for new AI projects and promote further innovation.

Key takeaways
  • Codex AI models are being open sourced.
  • Models will be available on Hugging Face's platform.
  • Open sourcing aims to increase transparency and collaboration.
46
modelsDec 4

We Got Claude to Fine-Tune an Open Source LLM

Hugging Face trained Claude to fine-tune an open source LLM, demonstrating the potential for large language models to improve other models. This approach can help reduce the cost and complexity of fine-tuning. The experiment shows that Claude can effectively fine-tune a model, making it more accurate and efficient. This development is relevant to builders who want to improve their LLMs without starting from scratch.

Key takeaways
  • Claude can fine-tune open source LLMs
  • Fine-tuning with Claude improves model accuracy and efficiency
  • Reduced cost and complexity for LLM fine-tuning
47
modelsOct 13

Nemotron-Personas-India: Synthesized Data for Sovereign AI

Nemotron-Personas-India is a synthesized dataset for sovereign AI development, providing a locally sourced alternative to international datasets. The dataset is designed to support AI model training for the Indian market, with a focus on regional languages and cultural context. This dataset can help builders create more accurate and culturally relevant AI models for the Indian market. The dataset is available on Hugging Face.

Key takeaways
  • Synthesized dataset for sovereign AI development in India.
  • Supports training of AI models for regional languages and cultural context.
  • Available on Hugging Face for access and integration.
48
modelsOct 2

SOTA OCR with Core ML and dots.ocr

Hugging Face released dots.ocr, a state-of-the-art optical character recognition model that integrates with Core ML for mobile and embedded devices. The model achieves high accuracy on various datasets and is optimized for low-latency inference. This integration enables developers to build OCR applications with improved performance and efficiency. The combination of dots.ocr and Core ML targets developers who want to deploy accurate OCR models on resource-constrained devices.

Key takeaways
  • State-of-the-art OCR performance with Core ML integration.
  • Optimized for low-latency inference on mobile and embedded devices.
  • Supports various datasets for flexible deployment.
49
modelsSep 29

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Hugging Face and Intel collaborated on accelerating the Qwen3-8B agent on Intel Core Ultra processors using depth-pruned draft models. This optimization targets improved performance for builders running large language models on consumer-grade hardware. The Qwen3-8B model is a large language model that can be fine-tuned for various tasks. The acceleration is expected to benefit developers working with resource-intensive AI applications.

Key takeaways
  • Qwen3-8B agent accelerated on Intel Core Ultra processors.
  • Depth-pruned draft models used for optimization.
  • Improved performance for large language models on consumer-grade hardware.
50
modelsSep 26

Nemotron-Personas-Japan: ソブリン AI のための合成データセット

Nemotron-Personas-Japan is a synthetic dataset for sovereign AI development. The dataset is designed to support the creation of AI models that can understand and generate human-like text in Japanese. This dataset can be useful for builders who want to develop AI models for the Japanese market. The dataset is available on Hugging Face.

Key takeaways
  • Synthetic dataset for sovereign AI development in Japanese.
  • Available on Hugging Face for model training and testing.
  • Supports creation of human-like text generation models.