MxBlog

Howto - Search engine friendly Mendix App

Howto - Create crawlable Mendix apps

Work in progress!

This post provides a short outline how you can use Mendix to create application that can be found using search engines like Google.

The solution described here is based on a few assumptions:

Implementation steps:

  1. Use deeplink module to configure deep links for all your pages
  2. Create a sitemap.xml page listing all deeplinks
  3. Detect if a webcrawler or bot is requesting a page
  4. Return a page optimized for for webcrawlers
  5. Provide social sharing buttons so users can easily create external urls to your pages
  6. Generate an Atom/RSS feed

Create deeplinks for all crawlable contents

You can use the App store Deeplink module to create urls for all the pages Google needs to index. Google indexes the content per url, and returns the url to the user when searching. Without deeplinks into your application, Google cannot direct your users to the correct page.

Deeplinks are also essential to for your ranking in a searchengine, as they enable external sites to link to specific pages. Google uses these links to determine how popular your pages are, and what words to associate with your pages.

  1. Download the deeplink module from the App store.

    Deeplink module in Mendix project

  2. Initialize the deeplink module by calling StartDeeplink microflow. In the following microflow, calls are also included to programmatically configure the deeplinks. You can also configure the deeplinks using an administration page in your application.

    Initialise deeplink

    You need to add a deeplink path to your webservice, so make sure deeplinks are forwared to the Mendix runtime. Here's an example how you can do this for nginx:

    location /link/ {
        proxy_pass http://127.0.0.1:8000/link/;
    }
    
  3. Configure the deeplinks. The following show how a deeplink for a single post is configured.

    Configure deeplink

    This configuration specifies which Microflow show be called when a deeplink is opened in a browser, i.e., FrontEnd.DL_ShowPost.

ShowPost Microflow

Generate a sitemap listing all deeplinks

Next step is to provide the search engine crawler with a list of url that need to be indexed. You can do this using a sitemap.xml document. This document looks like this:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
       <loc>http://www.mxblog.eu:80/link/post/can-i-make-a-mendix-application-crawlable</loc>
       <lastmod>2014-02-01</lastmod>
    </url>
    <url>
       <loc>http://www.mxblog.eu:80/link/post/howto-calling-a-rest-service-from-a-mendix-microflow</loc>
       <lastmod>2014-02-01</lastmod>
    </url>
    ...
</urlset>
  1. The following SitemapRequestHandler generates the contents of the sitemap.xml file:

    public class SitemapRequestHandler extends RequestHandler {
        private IContext context;
        public SitemapRequestHandler(IContext context) {
            this.context = context;
        }
        @Override
        public void processRequest(IMxRuntimeRequest iMxRuntimeRequest, IMxRuntimeResponse iMxRuntimeResponse, String s) throws Exception {
            Core.getLogger("SitemapRequestHandler").info("process request: " + s + ", " 
                + iMxRuntimeRequest.getRequestString());
            String response = "";
            HttpServletRequest request = (HttpServletRequest) iMxRuntimeRequest.getOriginalRequest();
            String uri = request.getRequestURI();
            String requestAccept = request.getHeader("Accept");
            String contentType = null;
            String protocolHost = request.getScheme() + "://" 
                + request.getServerName() + ":" + request.getServerPort() + "/link/post/";
            response = getXmlSitemap(protocolHost);
            contentType = "application/xml";
            iMxRuntimeResponse.setContentType(contentType);
            iMxRuntimeResponse.getWriter().write(response);
            iMxRuntimeResponse.setStatus(HttpServletResponse.SC_OK);
            iMxRuntimeResponse.getWriter().flush();
        }
        private String getXmlSitemap(String protocolHost) throws CoreException {
            String urlsXml = "";
            List<IMendixObject> list = Core.retrieveXPathQuery(
                this.context, "//" + Post.entityName + "[Permalink != NULL]");
            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
            for (IMendixObject mxObject : list) {
                Post post = Post.load(this.context, mxObject.getId());
                urlsXml += "<url><loc>" + protocolHost + post.getPermalink() + "</loc>\n";
                if (post.getMendixObject().getChangedDate(this.context) != null) {
                    urlsXml += "<lastmod>" 
                        + dateFormat.format(post.getMendixObject().getChangedDate(this.context)) 
                        + "</lastmod>\n";
                }
                urlsXml += "</url>\n";
            }
            return "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
                    "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
                    urlsXml + "</urlset>";
        }
    } 
    
  2. The initialization microflow discussed earlier included a java action to call the following code. It registers request handlers for a few resources needed to improve SEO, one of them being the request handler to generate sitemap.xml resources.

    public class RegisterSeoHandlers extends CustomJavaAction<Boolean>
    {
        public RegisterSeoHandlers(IContext context)
        {
           super(context);
        }
        @Override
        public Boolean executeAction() throws Exception
        {
           // BEGIN USER CODE
            Core.addRequestHandler("feed/",new AtomRequestHandler(this.getContext()));
            Core.addRequestHandler("sitemap.xml",new SitemapRequestHandler(this.getContext()));
            Core.addRequestHandler("image/", new ImageRequestHandler(this.getContext()));
            return true;
           // END USER CODE
        }
        /**
         * Returns a string representation of this action
         */
        @Override
        public String toString()
        {
           return "RegisterSeoHandlers";
        }
        // BEGIN EXTRA CODE
        // END EXTRA CODE
    }
    
  3. Finally you need to forward the sitemap.xml path from Nginx to the Mendix runtime:

    location /sitemap.xml {
        proxy_pass http://127.0.0.1:8000/sitemap.xml;
    }
    

You can register the sitemap file with Google using the Google Webmaster Tools, to ensure that google will read all the resources listed in the sitemap.

Fetch as Google - Webmaster tools

Detect crawlers and bots

Now that google knows what urls to crawl, we need to provide google with indexable content. By default, a Mendix application consists of a single html page, that is dynamically updated using javascript. Google will not execute this javascript code, so it will not see that final content displayed on the page.

The first step to serving content that is useful for a search engine, is to detect who is requesting a deeplink. The following example code uses a simple, and rather naive way to detect the type of visitor: it looks for user-agent.

public static boolean isCrawlerRequest(IMxRuntimeRequest iMxRuntimeRequest) {
    boolean isCrawler = false;
    HttpServletRequest request = (HttpServletRequest) iMxRuntimeRequest.getOriginalRequest();
    String userAgent = request.getHeader("User-Agent");
    Core.getLogger("CrawlerSupport").info("isCrawlerRequest: user_agent = " + userAgent);
    isCrawler = (request.getQueryString() != null ? request.getQueryString().contains("_escaped_fragment_") : false) ||
            userAgent == null ||
            userAgent.equals("") ||
            userAgent.matches(".*Googlebot.*") ||
            userAgent.matches(".*Bingbot.*") ||
            userAgent.matches(".*Baiduspider.*") ||
            userAgent.matches(".*iaskspider.*") ||
            userAgent.matches(".*LinkedinBot.*") ||
            userAgent.matches(".*LinkedInBot.*") ||
            userAgent.matches(".*facebookexternalhit.*") ||
            userAgent.matches(".*Twitterbot.*") ||
            userAgent.matches(".*FlipboardProxy.*") ||
            userAgent.matches(".*Yahoo.*") ||
            userAgent.matches(".*MetaURI.*") ||
            userAgent.matches(".*Crowsnest.*") ||
            userAgent.matches(".*TweetmemeBot.*") ||
            userAgent.matches(".*getprismatic.com.*") ||
            userAgent.matches(".*NING.*") ||
            userAgent.matches(".*HTTP_Request2.*") ||
            userAgent.matches(".*Google.HTTP.Java.Client.*") ||
            userAgent.matches(".*ShowyouBot.*") ||
            userAgent.matches(".*JS.Kit URL Resolver.*") ||
            userAgent.matches(".*web.snippet.*") ||
            userAgent.matches(".*PaperLiBot.*") ||
            userAgent.matches(".*Slurp.*") ||
            userAgent.matches(".*Bot.*")
    ;
    Core.getLogger("CrawlerSupport").info("is crawler = " + isCrawler);
    return isCrawler;
}

This code is called from the deeplink request handler, StartDeeplinkJava.processRequest():

            if ( session == null )
                if(...){
                    //
                   // ....
                   //
                } else { // try to server as guest link
                    StartDeeplinkJava.logger.debug("No session found for deeplink: " 
                        + request.getResourcePath() + ", attempting to serve link as guest.");
                    if (CrawlerSupport.isCrawlerRequest(request)) {
                        CrawlerSupport.returnHtmlPage(args, request, response, null);
                    } else {
                        serveDeeplink(args, request, response, session);
                    }
                }

Return SEO optimized content

Next step is to generate the content that we will return to the crawler. Even if you could return the default html code generated by javascript in Mendix, this would not be the best content to serve a search engine. Search engines, but also social media sites like linkedin, twitter and facebook, look for specific meta-content that is missing in default Mendix pages.

By generating a separate static html version of the page we can provide the correct meta data:

This part is rather application specific, so you'll need to decide yourself what information should be available on a page, and what tags and meta-data make sense.

The html generated doesn't attempt to include any functionality you may find on your normal Mendix page, as this is not relevant to the search engine. The search engine just needs to know what content a visitor will see when visiting your page.

The following code is just an example, that mostly uses the Post entity as input for what should be included in the html page:

public class StaticHtmlPageRenderer {
    private static final ILogNode logger = Core.getLogger("StaticHtmlPageRenderer");
    public StaticHtmlPageRenderer() {
    }
    /**
     * Render static html page for request
     *
     * @param args
     * @param request
     * @param response
     * @param existingsession
     * @throws java.io.IOException
     */
    public void render(String[] args, IMxRuntimeRequest request, IMxRuntimeResponse response, ISession existingsession) throws IOException, CoreException {
        logger.info("render");
        int responseCode = HttpServletResponse.SC_OK;
        IContext context = Core.createSystemContext();
        String html = "<!DOCTYPE html>\n<html>";
        HttpServletRequest req = (HttpServletRequest) request.getOriginalRequest();
        String urlPort = ((req.getServerPort() == 80 || req.getServerPort() == 443) ? "" : ":" + req.getServerPort());
        String protocolHost = req.getScheme() + "://" + req.getServerName() + urlPort + "/link/post/";
        Post post = getPost(context, protocolHost, request);
        String postTitle = post.getTitle().replace("'", "").replace("\"", "");
        String siteName = "MxBlog";
        String url = protocolHost + post.getPermalink();
        Image firstImage = getPostFirstImage(context, post);
        Account author = post.getPost_Account();
        String imageUrl = req.getScheme() + "://" + req.getServerName() + urlPort + (firstImage != null ? firstImage.getPermalink() : "/bloglogo.png");
        String description = post.getDescription();
        String keywords = post.getKeywords();
        String tags = "";
        Iterator<Tag> tagIter = post.getPost_Tag().iterator();
        while (tagIter.hasNext()) {
            Tag t = tagIter.next();
            tags += t.getName();
        }
        String authorFullname = (author != null ? author.getFullName() : "");
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        String postDate = sdf.format(post.getPostDate());
        String twittersite = "@akomx1";
        html += "<head>";
        html += "<title>" + post.getTitle() + " - " + siteName + "</title>\n";
        html += "<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>\n";
        html += "<meta name='copyright' content='" + siteName + "'>\n";
        html += "<meta name='description' content='" + description + "'>\n";
        html += "<meta name='keywords' content='" + keywords + ", " + tags + "'>\n";
        html += "<meta name='robots' content='index,follow'>\n";
        html += "<meta name='author' content='" + authorFullname + "' />\n";
        html += "<meta name='DC.title' content='" + postTitle + "'>\n";
        html += "<meta name='DC.creator' content='" + authorFullname + "' />\n";
        html += "<meta name='DC.description' content='" + description + "'>\n";
        html += "<meta name='DC.Date' content='" + postDate + "'>\n";
        html += "<meta property='og:description' content='" + description + "' />\n";
        html += "<meta property='og:site_name' content='" + siteName + "' />\n";
        html += "<meta property='og:title' content='" + postTitle + "' />\n";
        html += "<meta property='og:url' content='" + url + "' />\n";
        html += "<meta property='og:type' content='blog' />\n";
        if (author != null && author.getFacebookAccount() != null) {
            html += "<meta name='author' content='" + author.getFacebookAccount() + "' />\n";
        } else {
            html += "<meta name='author' content='" + authorFullname + "' />\n";
        }
        html += "<meta name='twitter:card' content='summary' />\n";
        html += "<meta name='twitter:url' content='" + url + "' />\n";
        html += "<meta name='twitter:site' content='" + (author != null && author.getTwitterAccount() != null ? author.getTwitterAccount() : "") + "' />\n";
        html += "<meta name='twitter:author' content='" + (author != null && author.getTwitterAccount() != null ? author.getTwitterAccount() : "") + "' />\n";
        html += "<meta name='twitter:title' content='" + postTitle + "' />\n";
        html += "<meta name='twitter:description' content='" + description + "' />\n";
        if (firstImage != null) {
            html += "<meta property='og:image' content='" + imageUrl + "' />\n";
            html += "<meta itemprop='image' content='" + imageUrl + "'>";
            html += "<meta name='twitter:image' content='" + imageUrl + "' />\n";
        }
        if (author != null && author.getGooglePlusAccount() != null) {
            html += "<link rel='author' href='https://plus.google.com/" + author.getGooglePlusAccount() + "/posts'/>";
        }
        html += "</head>";
        html += "<body>";
        html += "<header>" + siteName + "</header>\n";
        html += "<article itemscope='' itemtype='http://schema.org/BlogPosting'>"
                + "<h1><a href='" + url + "' itemprop='name headline'>" + post.getTitle() + "</a></h1>"
                + "<div itemprop='articleBody'>" + post.getContent() + "</div></article>\n";
        html += "<nav>" + getSiteMenu(context, protocolHost) + "</nav>";
        html += "</body></html>";
        /*
         *
         */
        response.getWriter().write(html);
        response.getWriter().flush();
        response.setStatus(responseCode);
    }
    /**
     * 
     * Returns the Post specified in the deeplink
     *
     */
    private Post getPost(IContext context, String protocolHost, IMxRuntimeRequest request) throws CoreException {
        logger.info("getPost");
        Post post = null;
        String[] requestPath = request.getResourcePath().substring(1).split("/");
        String deeplinkPath = requestPath[requestPath.length - 1];
        List<IMendixObject> list = Core.retrieveXPathQuery(context, "//" 
                                 + Post.entityName + "[Permalink = '" + deeplinkPath + "']");
        if (list.size() > 0) {
            post = Post.load(context, ((IMendixObject) list.get(0)).getId());
        }
        return post;
    }
    /**
     *
     * Returns the first image of the post. This will be used in some meta-data, so social sites 
     * can display an image to represent the post.
     *
     */
    private Image getPostFirstImage(IContext context, Post post) throws CoreException {
        Image image = null;
        String xpath = "//CMS.Post" + "[Permalink = '" + post.getPermalink() 
                     + "']/CMS.Image_Post/" + Image.entityName;
        logger.info("querying image: " + xpath);
        List<IMendixObject> list = Core.retrieveXPathQuery(context, xpath);
        if (list.size() > 0) {
            logger.info("first image: " + list.get(0).getClass().getName());
            image = Image.load(context, ((IMendixObject) list.get(0)).getId());
        }
        return image;
    }
    /**
     *
     * Returns an html unordered list of links to all crawlable pages on this site
     *
     */
    private String getSiteMenu(IContext context, String protocolHost) throws CoreException {
        String urlsXml = "";
        Map<String, String> sort = new HashMap<String, String>();
        sort.put(Post.MemberNames.Permalink.toString(), "ASC");
        List<IMendixObject> list = Core.retrieveXPathQuery(
            context, "//" + Post.entityName + "[Permalink != NULL]", -1, -1, sort);
        SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
        int level = 0;
        for (IMendixObject mxObject : list) {
            Post post = Post.load(context, mxObject.getId());
            urlsXml += "<li><a href='" + protocolHost + post.getPermalink() + "'>" 
                     + post.getTitle() + "</a></li>\n";
        }
        return "<ul>" + urlsXml + "</ul>";
    }
}

Create an easy way to share deeplinks

To make a post easily shareable a number of social buttons are displayed below every blog post. The important part is to be able to determine the deeplink for every post. This is done when saving a blogpost by using the title. It is stored with the post object.

Blog domain model

The blogpost page itself has a number of widgets, one will generate all the social buttons. The widget has one parameter, the deeplink url as stored with the post object.

Blog post page

The social buttons widget creates a number of html links based on the deeplink url;

Social buttons widget

You can style these buttons using css to resemble buttons:

div.SocialShare a { position: relative;
                         padding-left: 44px;
                        text-align: left;
                        white-space: nowrap;
                        overflow: hidden;
                        text-overflow: ellipsis;
                        margin: 3px; padding: 3px;
                        border: 1px solid black;
                        border-color: rgba(0,0,0,0.2);
                        }
div.SocialShare a { padding: 3px; color: white; }
div.SocialShare a#sharePermalink { background-color: #2c4762;}
div.SocialShare a#shareTwitter { background-color: #2ba9e1;}
div.SocialShare a#shareFacebook { background-color: #3b5998;}
div.SocialShare a#shareLinkedin { background-color: #007bb6;}
div.SocialShare a#shareGooglePlus {   background-color: #dd4b39; }
div.SocialShare a li { padding: 3px; }
div.SocialShare i.fa { padding: 3px; }