Der Mundo


About WWL | Pro | Help | Donate | For Publishers | API | Professional Services


The End Of The Language Barrier

Brian McConnell (Twitter), Founder, Worldwide Lexicon Project

NOTE: if you would like to view, create or edit translations for this essay, download our free Firefox Translator. With it, you can view and edit translations for any website in over 50 languages. If you would like to donate to support ongoing development, consider joining our fundraising campaign. If you are a developer, and would like to read about this in more detail, read this white paper.

While the Internet and the worldwide web have grown to become global in reach, rendering time and distance moot, they are still fragmented by language. The web is not really a single entity, but rather many networks, each relatively isolated from the others. Information now travels quite freely within a language, with few remaining economic or editorial controls over who can publish, and what people can read. Information does not travel so freely between languages. Important news stories and especially interesting pages may be picked up for translation, but overall, the flow of information across languages is inefficient, slow, and unpredictable. We need a multilingual web. What might this look like?

For ten years, I have worked on the Worldwide Lexicon, an open source project to create translation platforms. In recent months, we have made significant progress, and have begun testing tools that offer a glimpse of what the multilingual web could be. Like most first generation tools, ours are crude and can be much improved, but they are a working example of what to aim for.

From a product or system design standpoint, what should we want from a multilingual web? From a user's standpoint, this is easy. A user should be able to open a page, and if it is in a foreign language for them, the web server or their browser should make a best effort attempt to translate it using the best available human and machine translations without intervention. You can see an example of this with our Firefox Translator. It does this using a combination of human edited translations submitted by other users, along with machine translations from several sources. This tool is an addon, and performance issues aside, is a good example of what a multilingual web browser would do. When this capability is fully developed, it should become invisible to most users. When it is embedded in the web browser, every user will have this capability. Translation will become an ambient, automatic service.

The tricky aspect of this is that human language is impossible for computers to comprehend in any meaningful way. There is no single tool or algorithm that can be used to translate human language. Machine translation is good for quickly generating approximate translations, especially where grammar or style don't matter so much. Humans are much better at understanding context, and when they are translating into their native language, are the best at capturing the style and feel of a language. Barring a radical advance in artificial intelligence, this is likely to remain the case for a long time. To build a multilingual web, we need to use a variety of tools, including machine translation systems (there are several major types), translation memories (which store large volumes of human edited translations), and other tools such as dictionaries. In combination, you can build a pretty decent system that draws from different sources where they perform best.

Our vision at WWL is to make human/machine translation an embedded service, part of the collection of open services and protocols that comprise the Internet. This will happen within 2 to 3 years. The Firefox addon is a working example of what this will look like to ordinary users. By merely improving on this, to make it faster and to embed this functionality in systems like web servers, this will become part of the standard web services "stack", in the parlance of software engineers. When we reach that point, which could arrive sooner than most people realize, the web will be transparent to most languages. The multilingual web will be what's what's known as a "best effort" system. Billions of people will use these services, which will call out to retrieve the best available professional, volunteer and machine translations on demand, but they will be a mostly invisible to their users. They will become part of the web, and soon, a user will be able to open a page, read it in his language, without needing to do anything, or know how this is being done.

Building the Multilingual Web : Four Easy Pieces

So how, specifically, do we build the multilingual web? The idea of eliminating the language barrier for every user and website sounds ambitious, but by breaking this into several smaller tasks, we see that not only is this possible, but that all of the pieces required to build this already exist. To make this reality, we simply need to improve on what has already been built, and to embed it in as many different systems and services as possible. Then, in just a few years, billions of people will be using these tools, most of them without realizing it.

The multilingual web will consist of four pieces which work together to make the entire web translatable:

  1. Web browser extensions
  2. Web server extensions
  3. Global translation memory
  4. Language service providers

Multilingual Browsers

A multilingual web browser will look and act just like today's browser. The only difference is that it has been extended to include translation features. From a user's viewpoint, this is an invisible tool. The user opens a web page, and if it is in a foreign language, the translation software will activate itself and call a human/machine translation server to request the best available human or machine translations, and then redraw the web page with the translations.

Our Firefox Translator provides a good example of what this will look like. It automatically detects if a page needs to be translated, and if so, calls WWL and machine translation services to request the best available translations. Users can edit and score translations in the browser, simply by mousing over a translation to display a popup editor which saves the edits back to the translation memory, where they become available to other users. It also provides numerous options for displaying and color coding translations by type and quality.

The only thing we need to do now is to embed this technology in the browser, so it is built in to every copy of Firefox and other popular browsers. Until that happens, users can download addons, such as the ones we are developing for Firefox (and soon for other popular web browsers). This piece already exists today, but because users must make the extra step to find and install an addon, it is still unknown to most people. It just needs to be improved slightly, to make it faster, and to be embedded in popular browsers by default.

Web Server Extensions

By embedding translation software within popular web servers, such as the Apache server, we can make human/machine translation part of the web services "stack", or LAMP, as it's known in the parlance of software engineers. This will enable a webmaster to install a module, edit a configuration file, and translate an entire website on the fly. Any application or documents hosted on that web server will be translated as they are sent over the wire, again using the best combination of human and machine translations, per the owner's policies. (Embedding WWL type functionality in proxy servers is another way to do this).

Anyone who visits that site, even if they are using an older web browser or mobile browser without translation built in, will see the pages translated into their language. The web server software also uses this best effort approach to translate pages using the best available human and machine translations as the documents are being transmitted to users.

Some of this already exists, in the form of a simple web API, on which you can build server based translation scripts that can call out to request human and machine translations as needed. This works pretty well, and we are developing a high performance library called TransKit that will enable web server and web application developers to embed human/machine translation in almost any system. We hope to release an experimental version of this in October, along with a Apache module as a reference implementation. This module is being written in C and is designed for speed so that real-time translation does not noticeably affect page load times and other measures of performance. This will also be open source, so that developers can embed this library in a wide range of web servers, web applications and embedded systems.

Global Translation Memory

The third key piece is a global, web scale translation memory. A translation memory is a database of texts, their translations, and a revision history of translations to various languages. It records volunteer and professional translations from users and paid translators. The translation memory is accessed by web servers, browsers and other applications that need translation services via a simple web API. This component already exists, and is in production use. The Worldwide Lexicon translation server, based on open source software written in Python, and hosted on Google's grid computing platform App Engine, serves as a global translation memory. Developers can use the public translation memory, available at www.worldwidelexicon.org, or can download the source code and deploy their own instance (for a private in-company translation memory for example). Organizations such as TAUS are also developing shared public translation memories and APIs.

Our vision for the WWL translation memory is for it to be a global, open content corpus of translations. The system functions as a network of translation servers, so translations submitted to one translation server can be automatically shared with a global network of translation memories. Over time this will grow to encompass billions of texts and their translations. This is important because the translation memory, like Wikipedia, will be open content. Most translation memories today are proprietary systems that are hidden behind corporate firewalls. Translation corpora are difficult and expensive to create, and so most corporations are reluctant to share theirs. WWL is an open system, and will provide translation users, and researchers, with a large and growing source of high quality translations.

Language Service Providers

Language service providers offer real-time or on demand translation, and are an important component of this ecosystem. LSPs fall into two broad categories: machine translation services, and professional translation service bureaus. Machine translation services such as Babelfish, Google Translate, Apertium and Moses, enable users to quickly obtain approximate translations to and from about 50 languages (enough to cover > 95% of the Internet population). While these translations often contain errors, users can generally understand the source material with some effort. Professional translation services offer the option to pay for professional translators. Some of them have built highly automated web interfaces that enable systems like WWL to request professional translations on the fly. ProZ.com (one of the services we have integrated with), has created something similar to Amazon's Mechanical Turk service, where you can request a translation for a block of text via a quick web API call, and then receive the translation, often within minutes.

While there is no uniform standard for communicating with these services, a protocol in engineering jargon, they are all relatively easy to communicate with, and therefore to incorporate into software such as WWL, browser translators, etc in their current state. So, this component of the multilingual web also exists and is already at a pretty mature stage of technological and market development.

The End of the Language Barrier?

Two years ago, I predicted that the language barrier would cease to exist on the web by around 2010, as people began using embedded translation tools, and began editing translations en masse. The foundation for the multilingual web now exists, and what remains to be done is mostly a matter of improving on the tools already built, making them work together, and to embed them in other systems so that this technology becomes widely accessible. While that is a big goal and a big prediction, we are on the verge of that today. The Firefox Translator, just now entering its 1.0 release as an open source Firefox extension, offers a preview of what should become a standard feature in all web browsers in some form in the upcoming years.

Numerous online communities have emerged in recent years that focus on translation, and have proven that volunteer based models can work very well. Among the leaders are: Global Voices/Lingua, which translates blogs around the world, Meadan, which translates English/Arabic news and commentary, and YeeYan, which translates English news and commentary into Chinese, while Wikipedia has built a huge translation community that actively creates and translates content in dozens of languages.

I believe we are entering a transition where the language barrier will fade away rather quickly for web users. While it won't happen instantly, and the web won't be translated perfectly, accessing foreign language websites and services will become effortless, automatic, and as more humans get involved to edit and correct translations, better quality. Already this is reality for thousands of early adopters, and as these tools and others like them become ubiquitous, the web as a whole will become transparent across languages.

When that happens, billions of people will be using the multilingual web, although the underlying technology will be, like other Internet infrastructure, invisible and free.

Also Recommended

The Polyglot Internet, by Ethan Zuckerman of Global Voices

New: Firefox Translator

Download the WWL Translator for Firefox This free tool automatically translates foreign websites into your language. Now, browsing the web in other languages is as easy as browsing the web in your language.

Essay : The End of the Language Barrier

Brian McConnell, founder of WWL, published this essay, "The End of the Language Barrier". This essay describes the future of the worldwide web, and how it will become a multilingual system. We invite you to share and translate the essay and, if you think this is valuable work, to join our fundraising campaign.

Press


Recent Translations

(English → Português) usa.autodesk.com

(mpctba1@worldwidelexicon.org / Curitiba) Education customers can choose from a range of flexible, cost-effective programs. / Educação clientes podem escolher entre uma gama de flexível, Custo de programas eficazes.

(English → 中文 ) www.alexa.com

(58.22.94.162 / Longyan)3 month change / 3个月的变化

(English → 日本語) flash.tutsplus.com

(123.224.113.18 / Tokyo)Back in March, when Flashtuts+ first appeared, we had a huge amount of requests for specific subjects and tutorials. A staggering 25% of those requests were for a series of tutorials to take beginners through the fundamentals of ActionScript. In true Envato style, we've listened to our audience; today sees the launch of a string of tutorials to lead you through ActionScript 3.0 from the ground up. / 先頭へ戻る3月にFlashtuts最初の登場で、 具体的な科目やチュートリアルの要求の膨大な量だった。 それらの要求の驚異的な25%のチュートリアルへの一連のした ActionScriptの基礎を介して、初心者を取る。 真のエンバトスタイルでは、我々は視聴者に聞いた。 今日まであなたをリードするのチュートリアルの文字列の進水を見る ActionScript 3.0は、地上から。

(Español → English) www.telesurtv.net

(darint / Somerville) Chávez insta a que cumbre ASA trace el camino de África y Suramérica de la próxima década / Chavez calls for ASA summit to trace the path of Africa and South America for the next decade

(English → Português) www.who.int

(189.60.182.27 / Rio De Janeiro)The Bulletin of the World Health Organization is an international journal of public health with a special focus on developing countries. Since it was first published in 1948, the Bulletin has become one of the world’s leading public health journals. In keeping with its mission statement, the peer-reviewed monthly maintains an open-access policy so that the full contents of the journal and its archives are available online free of charge. As the flagship periodical of the World Health Organization (WHO), the Bulletin draws on WHO experts as editorial advisers, reviewers and authors as well as on external collaborators. Anyone can submit a paper to the Bulletin, and no author charges are levied. All peer-reviewed articles are indexed, including in ISI Web of Science and MEDLINE. / O Boletim da Organização Mundial de Saúde é umjornal internacional de saúde pública, com foco especial nos países em desenvolvimento. Desde que foi publicado pela primeira vez em 1948, O Boletim tornou-se um líder da saúde pública do mundo revistas. Em consonância com sua missão, peer-reviewed mensal mantém uma política de acesso aberto, para que todo o conteúdo do jornal e seus arquivos estão disponíveis online gratuitamente. Como o principal periódico da Organização Mundial da Saúde (OMS), O Boletim recorre a especialistas da OMS como conselheiros editoriais, revisores e autores, bem como de colaboradores externos. Qualquer pessoa pode apresentar um documento com o Boletim, e não são cobradas taxas de autor. Todos os artigos peer-reviewed são indexadas, inclusive no ISI Web of Science e MEDLINE.

(Français → English) www.lemonde.fr

(bsmcconnell@worldwidelexicon.org / San Francisco) Un accident survenu, mercredi 24 septembre, dans une centrale thermique en construction dans le centre de l'Inde, aurait fait au moins cent morts. / An accident occurred Wednesday, September 24, in a thermal power plant under construction in central India, would have at least one hundred dead.

(Français → English) www.lemonde.fr

(bsmcconnell@worldwidelexicon.org / San Francisco) Sydney suffoque sous les sables du désert / Sydney suffocates under the desert sands

(English → 日本語) userscripts.org

(yas815 / Fukuoka) Rewrites google image search links to point straight to the pictures. / リライトGoogleのイメージ検索のリンクを画像に直線をポイントします。

(Français → English) www.lemonde.fr

(bsmcconnell@worldwidelexicon.org / San Francisco) Nétanyahou veut empêcher l'Iran de se doter de la bombe nucléaire / Netanyahu wants to prevent Iran from acquiring a nuclear bomb

(English → 中文 ) www.alexa.com

(58.22.94.162 / Longyan)Is this your website? / 这是您的网站?

(English → Português) www.no-margin-for-errors.com

(193.136.35.4 / Porto)Here’s my jQuery answer to popups. Now open your content using prettyPopin instead of the old ugly popups. / Aqui está a minha resposta jQuery para popups. Agora abra o seu conteúdo utilizando prettyPopin em vez do velho feio popups.

(English → 日本語) www.eeo.com.cn

(eaton7 / Miura) 中国财政部决定在今年第四季度发行50年期的国债。此前,中国财政部所发国债的最长期限为30年。 / 中国財政部は、この1年の、50年国債の第4四半期にリリースすることを決めた。これに先立ち、財務省の国債30年の期間の上限によって発行された。

(English → Português) www.internetnews.com

(193.136.35.4 / Porto)He added that whenever the kernel adds a new feature the problem gets worse. That said, he didn't think that features are being added too fast and said that developers are finding bugs quickly. / Ele acrescentou que, sempre que o kernel adiciona um recurso novo, o problema fica pior. Dito isto, ele não pensa que recursos estão sendo adicionados muito rápido e disse que programadores estão encontrando erros rapidamente.

(Português → 中文 ) www.worldwidelexicon.org

(61.61.254.9 / Chutung)The End Of The Language Barrier / 消弭語言藩籬