Joshua MT Tool

Joshua MT Tool: Joshua MT Tool is an open source tool for statistical machine translation which is parsing-based. The toolkit achieves state of the art translation performance on the French-English translation task.

Contenido

1 History

2 Goals

3 Main functions implemented in Joshua toolkit

4 References

5 Enlaces externos

6 See also

History

Joshua uses parallel and distributed computing techniques for scalability. It is written in Java and implements all the essential algorithms: chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. Additionally, parallel and distributed computing techniques are exploited to make it scalable. A great effort has been made to ensure that the toolkit is easy to use and to extend. The toolkit has been used to translate roughly a million sentences in a parallel corpus for large-scale discriminative training experiments in order to contribute to the progress of the syntax-based machine translation research.

Goals

The design of Joshua is supposed to achieve three major goals:

Extensibility: The Joshua code is organized into separate packages for each major aspect of functionality. In this way it is clear which files contribute to a given functionality and researchers can focus on a single package without worrying about the rest of the system and all extensible components are defined by Java interfaces.

End-to-end cohesion: To combat issues such as the diverse components of a machine translation pipeline which are often designed by separate groups and have different file format and interaction requirements, the Joshua toolkit integrates most critical components of the machine translation pipeline. Moreover, each component can be treated as a stand-alone tool and does not rely on the rest of the toolkit we provide.

Scalability: It has been ensured that the decoder is scalable to large models and data sets. Among the techniques contributing to scalability: parsing and pruning algorithms implemented with dynamic programming strategies and efficient data structures, suffix-array grammar extraction, parallel and distributed decoding and boom filter language models.

Main functions implemented in Joshua toolkit

Training Corpus Sub-sampling: Joshua makes use of a method proposed by Kishore Papineni to select the subset of the training data consisting of sentences useful for inducing a grammar to translate a particular test set.

Suffix-array Grammar Extraction: It uses a source language suffix array to extract only those rules which will actually be used in translating a particular set of test sentences. This results in a vastly smaller rule set than techniques which extract all rules from the training set.

Decoding Algorithms: The decoder assumes a probabilistic synchronous context-free grammar (SCFG)

Language Models: It has three local n-gram language models implemented.

Minimum Error Rate Training (MERT): Joshua's MERT optimizes parameter weights to maximize performance on a development set as measured by an automatic evaluation metric.

References

Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese and Omar Zaidan, 2009. Joshua: An Open Source Toolkit for Parsing-based Machine Translation. In Proceedings of the Workshop on Statistical Machine Translation (WMT09).

Enlaces externos

Joshua home: http://joshua.sourceforge.net/Joshua/Welcome.html

See also

Machine Translation

Comparation of machine translation applications

Categorías:
Machine Translation Tool
Statistical machine translation
Phrase-based machine translation

Игры ⚽ Поможем сделать НИР

Mira otros diccionarios:

Joshua Davis (web designer) — Joshua Davis (born June 13, 1971) is an American web designer, author and artist in new media. He was an early pioneer in the use of Macromedia Flash. He is the author of Flash to the Core (2002) [cite book|last=Davis|first=Joshua|title=Flash to… … Wikipedia
Joshua Meyrowitz — is a professor of communications at the department of Communication at the University of New Hampshire in Durham. He has published works regarding the effects of mass media, including No Sense of Place: The Impact of Electronic Media on Social… … Wikipedia
Joshua Oppenheimer — Joshua Lincoln Oppenheimer (born 23 September 1974 in Texas, USA) is an American film director based in London, UK.Oppenheimer s films push the boundaries of fiction and documentary. Influenced by the experimental montage of his mentor, Dusan… … Wikipedia
Tool (band) — Infobox musical artist Name = Tool Img capt = Tool performing live in 2006. Visible from left to right are: Adam Jones, Maynard James Keenan and Justin Chancellor. Img size = 250 Landscape = yes Background = group or band Alias = Origin = Los… … Wikipedia
Tool — en un concierto en Barcelona el 29 de may … Wikipedia Español
Joshua Hendy Iron Works — Infobox Company name = Joshua Hendy Iron Works type = Defunct (1947) genre = foundation = 1856 founder = Joshua Hendy location city = Sunnyvale, California location country = USA location = locations = area served = key people = industry =… … Wikipedia
Abraham Joshua Heschel School — The Abraham Joshua Heschel School (AJHS) is a pluralistic K 12 Jewish day school in New York City. Its two central values, pluralism and egalitarianism, create a tightly knit, yet diverse community. Located in Manhattan, the school seeks to… … Wikipedia
Lederberg, Joshua — born May 23, 1925, Montclair, N.J., U.S. U.S. geneticist. He earned his Ph.D. at Yale University. With his student Norton Zinder, Lederberg discovered that certain viruses were capable of carrying a bacterial gene from one bacterium to another, a … Universalium
Joseph Pomeroy Widney — (December 26, 1841 mdash; July 4, 1938) was a polymathic pioneer American physician, medical topographer, scholar educator, clergyman, entrepreneur philanthropist, proto environmentalist, prohibitionist, philosopher of religion, controversial… … Wikipedia
Escherichia coli — E. coli redirects here. For the protozoan parasite, see Entamoeba coli. For the 2011 E.coli outbreak, see 2011 E. coli O104:H4 outbreak. For a specific strain, see Escherichia coli (disambiguation). For Escherichia coli in molecular biology, see… … Wikipedia

Los diccionarios y las enciclopedias sobre el Académico

Joshua MT Tool

Contenido

History

Goals

Main functions implemented in Joshua toolkit

References

Enlaces externos

See also

Mira otros diccionarios:

Compartir el artículo y extractos

Los diccionarios y las enciclopedias sobre el Académico

Wikipedia Español

Joshua MT Tool

Contenido

History

Goals

Main functions implemented in Joshua toolkit

References

Enlaces externos

See also

Mira otros diccionarios:

Compartir el artículo y extractos

Link directo