This is a single archived entry from Stefan Tilkov’s blog. For more up-to-date content, check out my author page at INNOQ, which has more information about me and also contains a list of published talks, podcasts, and articles. Or you can check out the full archive.

QCon SF 2009: Don Box & Amanda Laucher, Codename "M": Language, Data, and Modeling, Oh My!

Stefan Tilkov,

These are my unedited notes from Don Box & Amanda Laucher's talk about Codename "M": Language, Data, and Modeling, Oh My!

  • [Don has promised not to tweet. That's a good start.]
  • Interesting similarities btw/ M and MPS
  • M is a language for data - one of the most interesting places of data is a sequence of Unicode code points
  • Great support for text processing perceived as critical
  • Example: extract some data from text - tweets
  • Intellipad - default environment for writing grammar files, this is where tool support shows up first, then in Visual Studio
  • M code for a language definition (a function from text to something else, declaratively specified):

    module QCon
    {
    language Simple
    {
    syntax Main = empty;
    }
    }
    
  • Main is the rule that is the entry point

  • Open file in three-pane (or rather, four-pane) mode
  • Show source file, grammar, show output - empty file yields Main []
  • Amanda makes a great assistant ;-)
  • Non-empty file produces errors
  • Change "empty" to "any" - file with a single character works, more than that produces errors; change to any+, validates again
  • Simple language for interpreting tweets:

    module QCon
    {
    language Simple
    {
    syntax Main = Tweet;
    syntax Tweet = Content*;
    syntax Content
    = RawText
    | Handle
    | HashTag;
    token RawText = (any - ("#"|"@"))+
    token Handle = "@" Name;
    token HashTag = "#" Name;
    token Name = (any - ("@|"#"|" ""))+;
    }
    }
    
  • regular expressions at the token layer, context-free grammar at the syntax level

  • Amanda: "Only crap languages make you define something before you have to use it"
  • Discussion betwen Josh, Amanda, Don about whether or not the grammar is correct
  • Good point about interactive grammar development using the three-pane editor
  • Pattern names used to display data in a structured way
  • Adding @Classification["Keyword"] syntax-colors the source
  • M is structurally typed
  • M consists of lists and records
  • Generating the right-hand side:
  • Intellipad crashes! [Boom!] :-)
  • The spec for M is licensed under the Microsoft OSP (which makes people as happy as they can be working with Microsoft)
  • Javascript-based implementation of parts of the language; subset of the three-pane mode
  • Toolchain is written in C#, parser written in M, more and more compilers written in M
  • Intellipad crashes! Again! [Boom!] :-)

    module QCon
    {
    language Simple
    {
    syntax Main = v:Tweet => v;
    syntax Tweet = v:Content* => v;
    syntax Content
    = v:RawText => v
    | v:Handle => v
    | v:HashTag => v;
    token RawText = (any - ("#"|"@"))+
    token Handle = "@" v:Name => v;
    token HashTag = "#" v:Name => v;
    token Name = (any - ("@|"#"|" ""))+;
    }
    }
    
  • bubbles up the actual content, result is just a list of the strings extracted

    module QCon
    {
    language Simple
    {
    syntax Main = v:Tweet => v;
    syntax Tweet = v:Content* => v;
    syntax Content
    = v:RawText => v
    | v:Handle => v
    | v:HashTag => v;
    token RawText = v:(any - ("#"|"@"))+ => { Kind => "RawText", Text => v }
    token Handle = "@" v:Name => { Kind => "Handle", Name => v };
    token HashTag = "#" v:Name => { Kind => "HashTag", Topic => v };
    token Name = (any - ("@|"#"|" ""))+;
    }
    }
    
  • Question from Josh: Isn't this mixing lexing and production rules? Don: Goal is to have no difference, but it can be pulled out

  • Next: Consuming stuff
  • Using TDD to Don's tweets
  • VS project includes language grammar defined earlier
  • Amanda: Can one debug a grammar? Don: Answered later
  • M runtime can be hosted inside a C# program
  • Recently stopped internally to use .NET 3.5/VS 2008, now exclusively on .NET 4 and VS 2010
  • Language defintion is included in the program binary
  • Showing off some

    var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program
    dynamic result = language.ParseString(input);
    bool hasHashTag = false;
    foreach (var content in result)
    {
    hasHashTag = content.Kind == "HashTag";
    if (hasHashTag)
    break;
    }
    AssertEqual(true, hashHashTag);
    
  • Demo is actually working

  • Change AssertEqual to Assert.IsTrue to get around exception thrown
  • Question from Ian Robinson: Q. Can LINQ be used to walk the result? A. Not yet, as dyamic and LINQ don't mix yet
  • Not going to write any more C# types, has written all that are in him ;-)
  • "This will fail!"

    var language = Language.load(/* MImage */ typeof(Program).Assembly, "QCon", "Simple"); // yields runtime that can parse a program
    dynamic result = language.ParseString(input);
    var query = (from content in ((IEnumerable)result).OfType<dynamic>()
    where content.Kind == "HashTag"
    select content).Any();
    Assert.IsTrue(query);
    
  • It failed indeed.

  • "This talk is not about integration of two features I don't work on"
  • M is optionally typed, structurally typed
  • optional typing: syntax Bob = any* : Integer32 => 42;
  • only partially plumbed in the current version
  • Example for structural typing (not really working yet):

    module Ola
    {
    type HashTagRec =
    {
    Kind : Text where value == "HashTag";
    Topic : Text;
    }
    }
    
  • Rudimentary grammar debugging: set breakpoints in input source text; 4th pane shows up, shows matching stack

  • Syntactical and semantical editing support
  • Semantics relies on hooks that are not there yet
  • M is built using an M Grammar
  • Language completion for M is built using C#
  • Ambiguators for GLR need to be written in C#
  • One of the metrics used to evaluate the language: XML. There is a grammar for XML, briefly demoed.
  • MPS guys have a different religion – Microsoft believes people have a text editor they love
  • <?magnum PI?> :-)
  • Comment from Martin Fowler: Main difference to classic tools such as ANTLR is the dynamic - no code generation needed
  • Comment from Ola Bini: Didn't see the type-checking and debugging before – now he's impressed