I'm a Natural Language Processing researcher in the Debating Technologies group at IBM Research.
In 2011, I completed my PhD studies at the NLP lab at Bar-Ilan University. My PhD thesis, in statistical Natural Language Processing, addressed topics in textual entailment and was done under the instruction of Prof. Ido Dagan. Following graduation I worked as a postdoctoral resercher at Xerox Research in France and at the University of Grenoble, mostly on Statistical Machine Translation (SMT).
Computational argumentation and debating
(Can't publically share anything about my work yet. See the project's page for general details).
Personalized Machine Translation (PMT)
Machine Translation has advanced in recent years to produce better translations for clients’ specific domains, and sophisticated tools allow translators to obtain translations according to their prior edits. We suggest that MT should be further personalized to the end-user level – the receiver or the author of the text – as done in other applications. Language use is known to be influenced by personality traits as well as by socio-demographic characteristics such as age or mother tongue. As a result, it is possible to automatically identify these traits of the author from her texts. To provide themost faithful translation and to allow user modeling based on translations, we posit that machine translation should be personalized. PMT for the readers of the translations can take into account the reader's translational preferences, as refelected e.g. in complexity or style.
Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia and Shuly Wintner. Personalized Machine Translation Preserving Original Author Traits. 2016. [To be published at EACL 2017]
Shachar Mirkin and Jean-Luc Meunier. Personalized machine translation: Predicting translational preferences. EMNLP 2015.
Shachar Mirkin, Scott Nowson, Caroline Brun and Julien Perez. Motivating Personality-aware Machine Translation. EMNLP 2015.
Model-aware improvement of source translatability
Some source texts are more difficult to translate than others. One way to handle such texts is to modify them prior to translation (aka pre-editing). A prominent factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. Our research aims to improve source translatability either automatically, or through interactive tools which enable monolingual speakers of the source language to obtain better translation.
Shachar Mirkin, Sriram Venkatapathy and Marc Dymetman. 2013. Confidence-driven Rewriting for Improved Translation. In Proceedings of MT Summit.
Sriram Venkatapathy and Shachar Mirkin. An SMT-driven Authoring Tool. COLING 2012.
Shachar Mirkin, Lucia Specia, Nicola Cancedda, Ido Dagan, Marc Dymetman and Idan Szpektor. Source-Language Entailment Modeling for Translating Unknown Terms. ACL-IJCNLP 2009.
Semantic inference / Textual entailment
Textual Entailment (TE) is a popular paradigm for modeling semantic inference. The core TE task, Textual Entailment recognition, is to determine whether the meaning of one text can be inferred (or entailed) from another. My textual entailment research mostly focused around understanding entailment in context, to deal with either lexical ambiguity or discourse-based interpretation, but also addressed acquisition of lexical entailment relationships and the application of TE to different applications (e.g. SMT, as in several of the above works).
Shachar Mirkin, Jonathan Berant, Ido Dagan and Eyal Shnarch. Recognising Entailment within Discourse. COLING 2010.
Shachar Mirkin, Ido Dagan, Lili Kotlerman and Idan Szpektor. Classification-based Contextual Preferences. TextInfer 2011.
Shachar Mirkin, Ido Dagan and Sebastian Padó. Assessing the Role of Discourse References in Entailment Inference. ACL 2010.
Shachar Mirkin, Ido Dagan, Maayan Geffet. 2006. Integrating Pattern-Based and Distributional Similarity Methods for Lexical Entailment Acquisition. COLING-ACL 2006.
SMT domain adaptation
Data selection is a common technique for adapting statistical translation models for a specific domain, which has been shown to both improve translation quality and to reduce model size. Selection often relies on in-domain data, of the same domain of the texts expected to be translated, selecting the sentence-pairs that are most similar to the in-domain data from a pool of parallel texts; yet, this approach holds the risk of resulting in a limited coverage, when necessary n-grams that do appear in the pool are less similar to in-domain data that is available in advance. Our research aims to find ways to bridge these two potentially contradicting considerations, while producing compact translation models.
Shachar Mirkin and Laurent Besacier. Data Selection for Compact Adapted SMT Models. AMTA 2014. [See Section 6 for a simple and very effective method for data selection / domain adaptation for machine translation]
Program Committee Member / Reviewer
Journal of Natural Language Engineering (JNLE) 2016 // COLING 2016 // LREC 2016 // EMNLP 2016 // *SEM 2016 // EMNLP 2015 // *SEM 2015 // CICLING 2015 // Journal of Language Resources and Evaluation (LREV) 2014 // EMNLP 2014 // COLING 2014 // WMT 2014 // LREC 2014 // WMT 2013 // Journal of Language Resources and Evaluation (LREV) 2013 // IJCNLP 2013 // *SEM 2013 // Journal of Computer Science and Technology (JCST) 2013 // WMT 2012 // EACL 2012 // LREC 2012 // ACM TIST Journal, Special Issue on Paraphrasing 2011 // EMNLP 2011 // TextInfer 2011 // COLING 2010 // EMNLP 2009 // AAAI 2008