Joshua MT Tool

Joshua MT Tool

Joshua MT Tool is an open source tool for statistical machine translation which is parsing-based. The toolkit achieves state of the art translation performance on the French-English translation task.

Contenido

History

Joshua uses parallel and distributed computing techniques for scalability. It is written in Java and implements all the essential algorithms: chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. Additionally, parallel and distributed computing techniques are exploited to make it scalable. A great effort has been made to ensure that the toolkit is easy to use and to extend. The toolkit has been used to translate roughly a million sentences in a parallel corpus for large-scale discriminative training experiments in order to contribute to the progress of the syntax-based machine translation research.

Goals

The design of Joshua is supposed to achieve three major goals:

Extensibility: The Joshua code is organized into separate packages for each major aspect of functionality. In this way it is clear which files contribute to a given functionality and researchers can focus on a single package without worrying about the rest of the system and all extensible components are defined by Java interfaces.

End-to-end cohesion: To combat issues such as the diverse components of a machine translation pipeline which are often designed by separate groups and have different file format and interaction requirements, the Joshua toolkit integrates most critical components of the machine translation pipeline. Moreover, each component can be treated as a stand-alone tool and does not rely on the rest of the toolkit we provide.

Scalability: It has been ensured that the decoder is scalable to large models and data sets. Among the techniques contributing to scalability: parsing and pruning algorithms implemented with dynamic programming strategies and efficient data structures, suffix-array grammar extraction, parallel and distributed decoding and boom filter language models.

Main functions implemented in Joshua toolkit

Training Corpus Sub-sampling: Joshua makes use of a method proposed by Kishore Papineni to select the subset of the training data consisting of sentences useful for inducing a grammar to translate a particular test set.

Suffix-array Grammar Extraction: It uses a source language suffix array to extract only those rules which will actually be used in translating a particular set of test sentences. This results in a vastly smaller rule set than techniques which extract all rules from the training set.

Decoding Algorithms: The decoder assumes a probabilistic synchronous context-free grammar (SCFG)

Language Models: It has three local n-gram language models implemented.

Minimum Error Rate Training (MERT): Joshua's MERT optimizes parameter weights to maximize performance on a development set as measured by an automatic evaluation metric.

References

  • Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese and Omar Zaidan, 2009. Joshua: An Open Source Toolkit for Parsing-based Machine Translation. In Proceedings of the Workshop on Statistical Machine Translation (WMT09).

Enlaces externos

See also

  • Machine Translation
  • Comparation of machine translation applications

Wikimedia foundation. 2010.

Игры ⚽ Поможем написать реферат

Mira otros diccionarios:

  • Joshua Davis (web designer) — Joshua Davis (born June 13, 1971) is an American web designer, author and artist in new media. He was an early pioneer in the use of Macromedia Flash. He is the author of Flash to the Core (2002) [cite book|last=Davis|first=Joshua|title=Flash to… …   Wikipedia

  • Joshua Meyrowitz — is a professor of communications at the department of Communication at the University of New Hampshire in Durham. He has published works regarding the effects of mass media, including No Sense of Place: The Impact of Electronic Media on Social… …   Wikipedia

  • Joshua Oppenheimer — Joshua Lincoln Oppenheimer (born 23 September 1974 in Texas, USA) is an American film director based in London, UK.Oppenheimer s films push the boundaries of fiction and documentary. Influenced by the experimental montage of his mentor, Dusan… …   Wikipedia

  • Tool (band) — Infobox musical artist Name = Tool Img capt = Tool performing live in 2006. Visible from left to right are: Adam Jones, Maynard James Keenan and Justin Chancellor. Img size = 250 Landscape = yes Background = group or band Alias = Origin = Los… …   Wikipedia

  • Tool — en un concierto en Barcelona el 29 de may …   Wikipedia Español

  • Joshua Hendy Iron Works — Infobox Company name = Joshua Hendy Iron Works type = Defunct (1947) genre = foundation = 1856 founder = Joshua Hendy location city = Sunnyvale, California location country = USA location = locations = area served = key people = industry =… …   Wikipedia

  • Abraham Joshua Heschel School — The Abraham Joshua Heschel School (AJHS) is a pluralistic K 12 Jewish day school in New York City. Its two central values, pluralism and egalitarianism, create a tightly knit, yet diverse community. Located in Manhattan, the school seeks to… …   Wikipedia

  • Lederberg, Joshua — born May 23, 1925, Montclair, N.J., U.S. U.S. geneticist. He earned his Ph.D. at Yale University. With his student Norton Zinder, Lederberg discovered that certain viruses were capable of carrying a bacterial gene from one bacterium to another, a …   Universalium

  • Joseph Pomeroy Widney — (December 26, 1841 mdash; July 4, 1938) was a polymathic pioneer American physician, medical topographer, scholar educator, clergyman, entrepreneur philanthropist, proto environmentalist, prohibitionist, philosopher of religion, controversial… …   Wikipedia

  • Escherichia coli — E. coli redirects here. For the protozoan parasite, see Entamoeba coli. For the 2011 E.coli outbreak, see 2011 E. coli O104:H4 outbreak. For a specific strain, see Escherichia coli (disambiguation). For Escherichia coli in molecular biology, see… …   Wikipedia

Compartir el artículo y extractos

Link directo
Do a right-click on the link above
and select “Copy Link”