Skip to main content

Research Lab

Recent music-AI and MIDI infrastructure work translated into product decisions.

Last verified 2026-05-28. This page stays deliberately concrete: each signal includes the source, the publication date, and the exact move it should force in the product.

Verified window
May 28, 2026 verified research window
Signals tracked
21
Evidence mix
Official docs / Peer reviewed / Preprint / Spec track

Latency

4 active signals

These items are recent enough to affect roadmap truth, product copy, or the operator loop today.

MIDI AssociationSpec trackNAMM 2026 transport update, verified May 28, 2026

MIDI 2.0 transport work brings Web MIDI into the standards conversation

The MIDI Association Transport Working Group is tracking MIDI 2.0 transports across Bluetooth, Web MIDI, and network remote-management updates, making browser transport readiness a roadmap-level constraint.

Product move

Keep transport posture visible in-product. MIDI 2.0 Web MIDI, BLE, and managed network transport work means the browser forge should expose whether a capture is legacy MIDI 1.0, UMP-normalized, or profile-ready.

Verified 2026-05-28

Windows Experience BlogOfficial docsFeb 17, 2026

Making music with MIDI just got a real boost in Windows 11

Microsoft's Windows 11 rollout puts multi-client MIDI, MIDI 2.0 support, and newer transport plumbing into the mainstream desktop stack, with active issue tracking and workarounds published through April 30, 2026.

Product move

Treat multi-client MIDI and UMP-native operating-system support as baseline assumptions. Browser capture should stay resilient when users run multiple MIDI-aware tools at once, and release notes should call out OS-level caveats explicitly.

Verified 2026-05-28

Microsoft MIDI DocsSpec track2026 docs, verified May 28, 2026

Windows MIDI Services moves MIDI 2.0 into the OS routing layer

Windows MIDI Services documents the new service and SDK model: a mediator service supports multi-client access, MIDI 1.0/2.0 UMP transports, loopbacks, app-to-app MIDI 2.0, message translation, and timestamp-aware scheduling.

Product move

Keep the browser engine UMP-first internally and store translation metadata. Windows MIDI Services now makes multi-client routing, MIDI 1.0-to-UMP translation, loopbacks, and scheduled messages normal platform expectations rather than niche studio features.

Verified 2026-05-28

MIDI AssociationSpec track2025 spec overview, verified May 28, 2026

Network MIDI 2.0 makes UMP transport and jitter telemetry first-class

Network MIDI 2.0 standardizes UDP transport for MIDI 1.0 and MIDI 2.0 data as UMP, adds device capability identification, and compares UMP-layer jitter-reduction timestamps against RTP-MIDI's weaker timestamp support.

Product move

Treat lookahead scheduling and transport telemetry as core player features. Network MIDI 2.0 pushes UMP and jitter-reduction timestamps into the transport layer, so the browser player should expose queue health instead of hiding timing behavior.

Verified 2026-05-28

Generation

5 active signals

These items are recent enough to affect roadmap truth, product copy, or the operator loop today.

Google DeepMindOfficial docsMar 25, 2026

Lyria 3 Pro moves AI music generation toward structured song control

Lyria 3 developer preview separates clip and pro variants, adds longer structured songs, and documents tempo, lyric timing, mood, and image-to-music controls with SynthID watermarking.

Product move

AI music creation needs explicit model-mode choices. The generation surface should distinguish fast clip prototyping from longer structured songs, and should expose tempo, lyrics, mood, and multimodal inputs only when the configured provider supports them.

Verified 2026-05-28

AAAIPeer reviewedMar 14, 2026

MIDILM improves controllable text-to-MIDI alignment

MIDILM separates text and musical decoding paths while sharing masked self-attention, improving semantic alignment and structural metrics on MidiCaps for controllable text-to-MIDI generation.

Product move

Treat text-to-MIDI as controlled co-creation, not a prompt box. Product prompts should preserve key, meter, tempo, and structural constraints so generated clips remain editable and musically coherent.

Verified 2026-05-28

Scientific ReportsPeer reviewedApr 14, 2026

CAST skeleton-to-texture generation improves long-form symbolic structure

CAST uses explicit skeleton guidance for long-range symbolic generation, splitting macro-harmonic planning from micro-texture filling and reducing long-sequence structural error versus a MuseFormer baseline.

Product move

Model long-form music as macro skeleton plus texture. The player now emits a structure brief so generation, loops, and accompaniment can anchor to sections before filling ornamentation.

Verified 2026-05-28

Google AI DocsOfficial docsDec 18, 2025 docs, verified May 28, 2026

Music generation using Lyria RealTime

The Gemini music-generation docs center Lyria RealTime: low-latency WebSocket sessions, weighted prompts, live config updates, and 48kHz stereo PCM streaming for interactive music control.

Product move

Keep MidiverseForge explicit about realtime session control: weighted prompts, BPM steering, pause/play/reset behavior, and audio buffering should be first-class UX instead of hidden transport details.

Verified 2026-05-28

arXivPeer reviewedApr 21, 2026

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

BEAT proposes a uniform temporal-step tokenization for symbolic music and reports stronger structural coherence and efficiency than mainstream event-based tokenizations.

Product move

Preserve beat-aware timing structure in the product model. Practice, continuation, and accompaniment features should retain explicit temporal grouping instead of flattening everything into event streams; the forge now computes a beat-grid stability signal on each capture.

Verified 2026-05-28

Workflow

4 active signals

These items are recent enough to affect roadmap truth, product copy, or the operator loop today.

arXivPreprintMay 12, 2026

STRUM makes playable chart generation an end-to-end transcription benchmark

STRUM combines spectral transcription, onset detection, pitch tracking, ASR, and ensemble classifiers to turn raw recordings into playable multi-instrument chart data.

Product move

Treat rhythm-game style playability as an import/export constraint. MidiverseForge should keep per-lane onset evidence and readable timing windows so practice captures can later become playable charts instead of static analytics snapshots.

Verified 2026-05-28

MIDI AssociationSpec trackProfile update, verified May 28, 2026

MIDI-CI Profiles turn interoperability into explicit behavior contracts

Recent MIDI Association profile updates emphasize negotiated behavior contracts, including default control mapping, GM function blocks, MPE, drawbar organ, rotary speaker, and Note On orchestral articulation profiles.

Product move

Surface profile targets before export or playback. The player now reports inferred readiness for Piano, MPE, drum note-map, default CC, orchestral articulation, and GM function-block profiles instead of treating every MIDI file as a flat note stream.

Verified 2026-05-28

arXivPeer reviewedApr 17, 2026

TinyMU: A Compact Audio-Language Model for Music Understanding

TinyMU shows that a 229M music-language model can retain much of large-model music reasoning quality while being far more deployable on constrained product surfaces.

Product move

Prefer compact music-understanding models for responsive product surfaces. Short-form coaching, tagging, and assistive analysis should stay cheap enough to run often instead of batching everything behind heavyweight jobs.

Verified 2026-05-28

OpenReviewPeer reviewedJan 26, 2026, updated Apr 11, 2026

LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

LEGATO is a large-scale end-to-end OMR model that recognizes full-page and multi-page typeset scores and outputs ABC notation with state-of-the-art performance.

Product move

Keep sheet ingestion grounded in robust document understanding instead of one-off heuristics. When MidiverseForge scans notation, the output should stay editable and structurally meaningful.

Verified 2026-05-28

Interaction

2 active signals

These items are recent enough to affect roadmap truth, product copy, or the operator loop today.

Evaluation

6 active signals

These items are recent enough to affect roadmap truth, product copy, or the operator loop today.

arXivPreprintMay 25, 2026

Score-agnostic structure analysis reframes large MIDI performance evaluation

Score-agnostic performance analysis groups transcribed performances by structural realization, using alignment and clustering when ground-truth score or audio is unavailable.

Product move

Judge imported and captured performances by structural coherence, not just truth-label matching. The bug hunt now treats saved captures as evidence that can be grouped, exported, and re-evaluated as the score-performance model improves.

Verified 2026-05-28

arXivPreprintMay 17, 2026

Optimal-transport piano transcription points to perceptual timing diagnostics

A piano transcription approach casts note-event prediction as optimal transport distribution matching, making temporal misalignment part of the optimization target.

Product move

Keep timing feedback tolerant to perceptual misalignment. The capture pipeline now preserves signed offsets, median latency, drift, and beat-grid stability so future audio transcription can optimize around human-perceived timing instead of frame-perfect labels only.

Verified 2026-05-28

arXivPeer reviewedMay 7, 2026

PianoCoRe: Combined and Refined Piano MIDI Dataset

PianoCoRe unifies and refines major open piano MIDI corpora into a large score-performance dataset with note-level alignment, quality filtering, and improved robustness for expressive rendering models.

Product move

Keep practice captures durable and exportable. The product should store enough aligned symbolic detail to support future expressive-rendering training, benchmarking, and score-to-performance workflows.

Verified 2026-05-28

arXivPeer reviewedMay 3, 2026

RenCon 2025: Revival of the Expressive Performance Rendering Competition

RenCon 2025 documents the revived expressive performance rendering competition and shows clear progress in rendering systems while still highlighting the gap to human-level musical expression.

Product move

Treat expression as a distinct product dimension. Feedback, scoring, and AI coaching should separate note correctness from expressive rendering quality and avoid inflated claims of human-level musicality.

Verified 2026-05-28

Scientific ReportsPeer reviewedMar 26, 2026

Multimodal expressiveness modeling points beyond single-score MIDI metrics

Recent multimodal piano expressiveness work highlights tempo curves, loudness profiles, pedaling patterns, phrase-level dynamic contours, and the limits of MIDI-only onset/velocity data.

Product move

Score expression at phrase and section level, not only as aggregate velocity statistics. Conductor Map and future coaching should separate timing, dynamic contour, articulation, and pedaling so practice feedback remains actionable.

Verified 2026-05-28

arXivPeer reviewedJan 26, 2026

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation

Benchmarks piano performance evaluation and finds that audio foundation models outperform symbolic representations across all 19 perceptual dimensions tested.

Product move

Keep symbolic scoring for instant browser feedback, but leave an explicit seam for audio-grade evaluation. The roadmap should not pretend MIDI-only metrics capture expressive quality completely.

Verified 2026-05-28