Latest News

ETH and Stanford Researchers Introduce MIRIAD: A 5.8M Pair Dataset to Improve LLM Accuracy in Medical AI

Challenges of LLMs in Medical Decision-Making: Addressing Hallucinations via Knowledge Retrieval LLMs are set to revolutionize healthcare through intelligent decision support and adaptable chat-based assistants. However, a major challenge is their tendency to produce factually incorrect medical information. To address…

Read MoreETH and Stanford Researchers Introduce MIRIAD: A 5.8M Pair Dataset to Improve LLM Accuracy in Medical AI

ByteDance Researchers Introduce VGR: A Novel Reasoning Multimodal Large Language Model (MLLM) with Enhanced Fine-Grained Visual Perception Capabilities

Why Multimodal Reasoning Matters for Vision-Language Tasks Multimodal reasoning enables models to make informed decisions and answer questions by combining both visual and textual information. This type of reasoning plays a central role in interpreting charts, answering image-based questions, and…

Read MoreByteDance Researchers Introduce VGR: A Novel Reasoning Multimodal Large Language Model (MLLM) with Enhanced Fine-Grained Visual Perception Capabilities

Google DeepMind Releases Gemini Robotics On-Device: Local AI Model for Real-Time Robotic Dexterity

Google DeepMind has unveiled Gemini Robotics On-Device, a compact, local version of its powerful vision-language-action (VLA) model, bringing advanced robotic intelligence directly onto devices. This marks a key step forward in the field of embodied AI by eliminating the need…

Read MoreGoogle DeepMind Releases Gemini Robotics On-Device: Local AI Model for Real-Time Robotic Dexterity

ByteDance Researchers Introduce Seed-Coder: A Model-Centric Code LLM Trained on 6 Trillion Tokens

Reframing Code LLM Training through Scalable, Automated Data Pipelines Code data plays a key role in training LLMs, benefiting not just coding tasks but also broader reasoning abilities. While many open-source models rely on manual filtering and expert-crafted rules to…

Read MoreByteDance Researchers Introduce Seed-Coder: A Model-Centric Code LLM Trained on 6 Trillion Tokens

New from Chinese Academy of Sciences: Stream-Omni, an LLM for Cross-Modal Real-Time AI

Understanding the Limitations of Current Omni-Modal Architectures Large multimodal models (LMMs) have shown outstanding omni-capabilities across text, vision, and speech modalities, creating vast potential for diverse applications. While vision-oriented LMMs have shown success, omni-modal LMMs that support speech interaction based…

Read MoreNew from Chinese Academy of Sciences: Stream-Omni, an LLM for Cross-Modal Real-Time AI

BAAI Launches OmniGen2: A Unified Diffusion and Transformer Model for Multimodal AI

Beijing Academy of Artificial Intelligence (BAAI) introduces OmniGen2, a next-generation, open-source multimodal generative model. Expanding on its predecessor OmniGen, the new architecture unifies text-to-image generation, image editing, and subject-driven generation within a single transformer framework. It innovates by decoupling the…

Read MoreBAAI Launches OmniGen2: A Unified Diffusion and Transformer Model for Multimodal AI