Language expert: Machine Translation is misunderstood, it is an opportunity for the relaunch of the EU
February 13, 2017
Gerard Meijssen (GerardM)/ Wikipedia
Jochen Hummel is Chairman of LT-Innovate, the European Association of the Language Technology Industry, he is also the CEO of ESTeam and Coreon. Speaking with Christophe Leclercq, EurActiv’s Founder, as part of the #Media4EU series, he advocated for a more multilingual Europe and he explained why Language Technologies should be exploited more by entrepreneurial individuals in the industry who want to turn a profit.
From other interviews in the #Media4EU series, I’ve perceived that barriers between different professional communities are significant. There are translators who gather at events like #TranslatingEurope, language technology companies like the one you represent, EU officials working on internal tools such as MT@EC, and media people in other circles. How could they finally ‘gel together’? Has there been progress since the Riga Summit in May 2015?
Language remains hard to compute. Only few understand what language technology can deliver and how to implement it productively. This leaves a lot of room for overselling what technologies can actually do, irrational fears, choosing to invest in the wrong ones or missing cost-savings opportunities. Moreover, language can be politically sensitive. Particularity in the public sector, decisions makers and project managers are often happy to ignore the problem or pass it on, for example to those working for the MT@EC project.
There hasn’t been much progress, in spite of big efforts to raise awareness for this crucial barrier to the Digital Single Market. In the end some disruptor, probably an outsider, will manage to ‘gel it together’. In the US, Natural Language Processing [Ed. The field of study that focuses on the interactions between human language and computers] is experiencing a huge comeback due to the rise of Artificial Intelligence (AI). So maybe IBM Watson will fix it for Europe, or maybe Google, perhaps even for free as some officials still believe.
LT-Innovate, the language technology association you chair, held the conference LT-Accelerate on November 21 and 22. There was a panel on ‘Speech and Language Technologies for Media Processing’ involving the BBC, Deutsche Welle and technology experts. What are the trends today?
Speech recognition works pretty well by now, but of course if the content is already in textual form you save a step. I am more interested about what happens later: how to extract the knowledge from the textual information. The last US election has shown the incredible power of social media and how traditional methods like polls are failing. Whoever squeezes a bit more knowledge out of textual information available to everybody runs the show.
EurActiv has recently interviewed Robert Madelin, who addressed your ‘Summit’ when he was still leading the EU Commission’s DG CONNECT. Regarding media strategies, he stated ‘There is potential for bigger consortia and syndicates, it has to be in a different model that enables the local to stay local, otherwise the global will trump the continental.’ And then he was more specific: ‘You have to decide […] technologies that could completely transform your business costs between your current countries and language regimes and other countries and languages. For example, when it comes to automated translation, we ain’t seen nothing yet.’ What is your vision for the media sector in 10 or 20 years?
Being somebody who spends too much time consuming social and traditional media, I’ve noticed how much content these days is simply cut-and-pasted without any further investigation. So the evolution of the media sector could go two ways: Firstly this trend could continue and language technology could be used to repurpose a piece of – sometimes questionable – content for different languages, styles, audiences, and markets. Alternatively, media companies could finally think about appropriate business models for the 21st century and use language technology to support proper investigative journalism and efficient content creation.
Poor translation processes may hinder cross-border cooperation projects like Europa, LENA, Euranet Plus, CPN, while others like Voxeurop, CafeBabel, Euronews (and of course EurActiv) make do, but could improve with technologies. What would be your recommendations? Based on your experience in other sectors, what cost reduction and time gain, or – in other words – what volume increase could be generated through machine translation combined with post-editing and rewriting where needed?
I cannot give you percentages because translation is too complex of a process. What I can say is that on the investigative side, i.e. analyzing multilingual text, a lot can be done without any translation. On the writing front, it depends on format and audience. In the end smart media companies will deploy all kinds of Language Technology (LT) solutions: from fully automated, cross-language fact extraction and natural language generation of, let’s say, sport articles to mainly human translation of political essays.
Are we still stuck with the natural reluctance of translators vis à vis machine translation? And the inability of language technologists to integrate human post-editing and even journalistic re-writing? How do we change the mindset and move forward? Is this more about re-training than technologies? In EU jargon, a DG EAC [Ed. Directorate-General on Education and Culture] issue and not just concerning CONNECT [Ed. Directorate-General on the Digital Single Market] and TRAD [Directorate-General for Translation] ?
Machine Translation is one of the most misunderstood technologies and the industry itself is largely to blame for that. Since the 60s (!) IT companies have been boasting about putting translators out of work, most recently Google with the hyped Neural Machine Translation (MT) [Ed. approach to machine translation in which a large neural network is trained to maximize translation performance]. Everyone who had Latin in school remembers painfully that it’s impossible to translate a text properly without understanding it. The day computers really understand language we’ll have a different kind of tech revolution in which MT will only be a footnote. Until then, smart people will make money by using language technology for what it can do well, while those who are driven by fear will quickly be outdated and pushed aside. And the ones who believe in exaggerating the technology’s abilities will have to deal with failed projects.
For reasons related both to terminology and industry change management, sector-specific approaches are often more promising than ‘preaching around’. If there is an ‘add-on’ to the Digital Single Market for the second half of this EU mandate, what should it entail regarding the media sector and languages? Your association’s members are keen on projects under Horizon2020 [Ed. The EU Framework Programme for Research and Innovation] and CEF [Ed. Connecting Europe Facility (CEF) an EU funding instrument to promote growth, jobs and competitiveness]: anything beyond that?
First, we need a European Language Cloud, meaning a basic infrastructure for Natural Language Processing (NLP) for all EU languages and our trading partners. Without this infrastructure, all these great new technologies mentioned before will only be limited to speakers of large language groups. For the media – and many other sectors – it would be very helpful to have access to multilingual meaning and knowledge assets of governmental organisations. This is a prerequisite for knowledge extraction, translation, language generation, etc.
Leaders like Presidents Juncker and Presidents Tusk wish to work with countries and civil society to re-launch the European project, upon the EU anniversary of March 2017 in Rome. Will the topics we just discussed remain ‘technical’, or could they belong to a vision for a more multilingual Europe?
Let me put it this way: if Brexit really happens EU citizens will be forced to use a language which isn’t even official anymore [Ed. English is also the official language in Ireland and Malta]. Dreaming of a Digital Single Market while ignoring Europe’s multilingualism is, at the best, pretty naïve. If the European project is to succeed we must work much more closely together. In EU jargon: our institutions need to be interoperable. Today the interoperability ends when textual data has to cross language borders.
If the Commission’s answer to this fundamental European challenge is its home-grown MT@EC and a small fraction of the H2020 R&D funding, there won’t be much of a Rome re-launch. On the other hand, if the Commission was to develop a vision of a multilingual eGov and Internal Market, it could not only save the Union but also make the European industry the fittest for the global non-English market.