Latest News

This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization

Reasoning language models, or RLMs, are increasingly used to simulate step-by-step problem-solving by generating long, structured reasoning chains. These models break down complex questions into simpler parts and build logical steps to reach answers. This chain-of-thought (CoT) approach has proven…

Read MoreThis AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization

Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification

In the pretraining of LLMs, the quality of training data is crucial in determining model performance. A common strategy involves filtering out toxic content from the training corpus to minimize harmful outputs. While this approach aligns with the principle that…

Read MoreRethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what it defines as the next pivotal evolution in enterprise automation: Agentic Artificial Intelligence. These systems, capable of autonomous decision-making and…

Read MorePwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work | MIT News

Starting in July, MIT’s Shaping the Future of Work Initiative in the Department of Economics will usher in a significant new era of research, policy, and education of the next generation of scholars, made possible by a gift from the…

Read MoreMIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work | MIT News

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

Equipping LLMs with external tools or functions has become popular, showing great performance across diverse domains. Existing research depends on synthesizing large volumes of tool-use trajectories through advanced language models and SFT to enhance LLMs’ tool-calling capability. The critical limitation…

Read MoreReinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX

In this tutorial, we will learn how to deploy a fully functional Model Context Protocol (MCP) server using smithery as the configuration framework and VeryaX as the runtime orchestrator. We’ll walk through installing and configuring…

Read MoreA Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX

RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

LLMs have gained outstanding reasoning capabilities through reinforcement learning (RL) on correctness rewards. Modern RL algorithms for LLMs, including GRPO, VinePPO, and Leave-one-out PPO, have moved away from traditional PPO approaches by eliminating the learned value function network in favor…

Read MoreRL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare

OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in collaboration with 262 physicians across 60 countries and 26 medical specialties, HealthBench addresses the…

Read MoreOpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare