PAPERS

Paul Rodrigues, Valerie Novak, C. Anton Rytting, Julie Yelle, Jennifer Boutz. (2018). Arabic Data Science Toolkit: An API for Arabic Language Feature Extraction. International Conference on Language Resources and Evaluation. LREC 2018. Miyazaki, Japan.
C. Anton Rytting, Paul Rodrigues, Tim Buckwalter, Valerie Novak, Aric Bills, Noah H. Silbert, Mohini Madgavkar. (2014) ArCADE: An Arabic Corpus of Auditory Dictation Errors. Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications. ACL 2014. Baltimore, Maryland. pp109–115.
Levi King, Eric Baucom, Timur Gilmanov, Sandra Kuebler, Daniel Whyatt, Wolfgang Maier, Paul Rodrigues. (2014) The IUCL+ System: Word-Level Language Identification via Extended Markov Models. Proceedings of the First Workshop on Computational Approaches to Code Switching. EMNLP 2014. Doha, Qatar. pp102-106
Paul Rodrigues and Sandra Kuebler. (2013) “Part of Speech Tagging Bilingual Speech Transcripts with Intrasentential Model Switching.” Papers from the 2013 Association for Advancement of Artificial Intelligence (AAAI) Spring Symposium. Palo Alto, CA. pp56-63.
Michael Bloodgood, Peng Ye, Paul Rodrigues, David Doermann and David Zajic. (2012) “A random forest system combination approach for error detection in digital dictionaries.” European Chapter of the Association for Computational Linguistics (EACL) 2012 Workshop on Innovative hybrid approaches to the processing of textual data. pp78-86.
Paul Rodrigues and C. Anton Rytting. (2012) “Typing Race Games as a Method to Create Spelling Error Corpora.” The 8th international conference on Language Resources and Evaluation (LREC). Istanbul, Turkey: European Language Resources Association (ELDA). pp3019-3024
Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, and Peng Ye. (2011) “Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling.” Electronic Lexicography in the 21st Century. pp227-232.
David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, and Michael Bloodgood. (2011) “Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language.” Electronic Lexicography in the 21st Century. pp297-301.
C. Anton Rytting, Paul Rodrigues, Tim Buckwalter, David Zajic, Bridget Hirsch, Jeff Carnes, Nathanael Lynn, Sarah Wayland, Chris Taylor, Charles Blake III, Evelyn Browne, Corey Miller and Tristan Purvis. (2010) “Error Correction for Arabic Dictionary Lookup.” In Proceedings of international conference on Language Resources and Evaluation (LREC). Valetta, Malta: European Language Resources Association (ELDA). May 2010. pp263-268.
Paul Rodrigues. (2009) “Determining L1 & L1 Degree of Phonological Accent From Phonetic Transcription after Categorical Filtration of Speech.” 7th Annual Conference on Technology for Second Language Learning. Ames, Iowa. pp40-55.
Paul Rodrigues, Damir Ćavar. (2008) “Learning Arabic Morphology Using Information Theory.” In Proceedings of the Chicago Linguistics Society (CLS). Vol 41. Chicago: University of Chicago. pp49-58.
Paul Rodrigues, Damir Ćavar. (2007) “Learning Arabic Morphology Using Statistical Constraint Satisfaction Models.” In Elabbas Benmamoun (Ed.), Perspectives on Arabic Linguistics XIX: Proceedings of the 19th Arabic Linguistics Symposium (ALS). Urbana, IL, USA. pp63-75.
Damir Ćavar, Joshua Herring, Toshikazu Ikuta, Paul Rodrigues, Giancarlo Schrementi. (2006) “On Unsupervised Grammar Induction From Untagged Corpora.” In: P. Kaszubski (ed.) PSiCL: Poznań Studies in Contemporary Linguistics. 41, Adam Mickiewicz University, Poznań, Poland. pp57-71.
Damir Ćavar, Paul Rodrigues, Giancarlo Schrementi. (2006) “Unsupervised Morphology Induction for Part-of-Speech Tagging.” In Aviad Eilam, Tatjana Scheffler and Joshua Tauberer (eds). U. Penn Working Papers in Linguistics: Proceedings of the 29th Annual Penn Linguistics Colloquium (PLC). Volume 12.1, 2006. Philadelphia, PA, USA. pp29-41.
Damir Ćavar, Joshua Herring, Toshikazu Ikuta, Paul Rodrigues, Giancarlo Schrementi. “On Statistical Parameter Setting.” (2004) Proceedings of the First Workshop on Psycho-computational Models of Human Language Acquisition (COLING). Geneva, Switzerland. pp9-16.
Damir Ćavar, Joshua Herring, Toshikazu Ikuta, Paul Rodrigues, Giancarlo Schrementi. (2004) “On Induction of Morphology Grammars and its Role in Bootstrapping.” Proceedings of the 9th Conference on Formal Grammar. ESSLI. Nancy, France. pp47-62.

REFEREED WORKSHOP PAPERS

C. Anton Rytting, Noah Silbert, Paul Rodrigues, Valerie Novak, Aric Bills, Tim Buckwalter, Mohini Madgavkar. (2014) ArCADE: Arabic Corpus of Auditory Dictation Errors. Ninth Workshop on Innovative Use of NLP for Building Educational Applications. ACL 2014. Baltimore, Maryland.
Paul Rodrigues, Sandra Kuebler. (3/2013). “Part of Speech Tagging Bilingual Speech Transcripts with Intrasentential Model Switching.” Association for Advancement of Artificial Intelligence Spring Symposium. Palo Alto, CA.
Erica Michael, Sergey Blok, Michael Bloodgood, Petra Bradley, Ryan Corbett, Michael Maxwell, Peter Osthus, and Paul Rodrigues. (2012) Evaluating Parallel Corpora: Assessing Utility for Use with Translation Memory Systems in Government Settings.
Michael Bloodgood, Peng Ye, Paul Rodrigues, David Doermann and David Zajic. (2012) “A random forest system combination approach for error detection in digital dictionaries.” European Chapter of the Association for Computational Linguistics 2012 Workshop on Innovative hybrid approaches to the processing of textual data.
Paul Rodrigues and C. Anton Rytting. (2012) Typing Race Games as a Method to Create Spelling Error Corpora. The 8th international conference on Language Resources and Evaluation (LREC).
Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, Peng Ye. “Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling.” Electronic Lexicography in the 21st Century. Nov 2011.
David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, Michael Bloodgood. “Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language.” Electronic Lexicography in the 21st Century. Nov 2011.

TECHNICAL REPORTS

Paul Rodrigues, PhD. Pipeline to bootstrap a baseline multilingual optical character recognition dataset. 2018. 3pp.
Thomas Conners, PhD, Nikki Adams, PhD, Anton Rytting, PhD, Nathaniel Clair, MA, Claudia Brugman, PhD, Anne David, PhD, Amalia Gnanadesikan, PhD, Michelle Morrison, PhD, Paul Rodrigues, PhD. Data Normalization for Speech-to-text: Deliverables. 2018. 2pp.
Paul Rodrigues, PhD, Hadi Amiri, PhD, Wendy Chambers, PhD, Jennifer Boutz, PhD., Susannah Paletz, PhD, Philip Resnik, PhD, Timothy Buckwalter, MA, Valerie Novak, MA, Nikki Adams, PhD, Alan Mishler, BS, Joe Danks, PhD, Marilyn Maines, MA. Entity-targeted Sentiment Analysis: Automatic evaluation of emotional temperature to people, places, and things. Technical Report. University of Maryland Center for Advanced Study of Language. 2016. 73pp.
Alan Mishler, MA, Timothy Buckwalter, MA, Wendy Chambers, PhD, Michael Bloodgood, PhD, Kevin Wonus, BS, Paul Rodrigues, PhD, Marilyn Maines, MA. Computational Cultural Assessment: Machine Classification of social media text along useful sociocultural dimensions. Technical Report. University of Maryland Center for Advanced Study of Language. 2016. 51pp.
Joseph Danks, PhD, Marilyn Maines, MA, Wendy Chambers, PhD, Alan Mishler, BA, Timothy Buckwalter, MA, Michael Bloodgood, PhD, Paul Rodrigues, PhD, Julie Yelle, MA, Katie Kiraly, BA. Best Practices and Lessons Learned: Recommendations from Computational Cultural Assessment. Technical Report. University of Maryland Center for Advanced Study of Language. 2016. 18pp.
Katie Kiraly, BA, Andrew Volpe, MA, Julie Yelle, MA, Salma Bouziani, BA, Marilyn Maines, MA, Joseph Danks, PhD, Wendy Chambers, PhD, Hannah Benninger, MA, Timothy Buckwalter, PhD, Amy Pate, PhD, Susannah Paletz, PhD, Paul Rodrigues, PhD. Cultural assessment of social unrest and political instability in Jordan: Detailed analysis of individual tweets, Twitter-users, and other social media. Technical Report. University of Maryland Center for Advanced Study of Language. 2016. 111pp.
Alan Mishler, MA, Timothy Buckwalter, MA, Wendy Chambers, PhD, Michael Bloodgood, PhD, Kevin Wonus, BS, Paul Rodrigues, PhD, Marilyn Maines, MA, Joseph Danks, PhD. Computational Cultural Assessment: Filtering Tweets for Content Relevant to Social Unrest and Political Instability. 2016. 37pp.
Thomas Conners, PhD, Claudia Brugman, PhD, Paul Rodrigues, PhD, Sean Simpson, MA. Summary of register properties from corpus of Jakarta Indonesian Twitter as contrasted with SMS: Implications for NLP Pre-processing. Technical Report. University of Maryland Center for Advanced Study of Language. 2016. 19pp.
Paul Rodrigues, PhD, Wendy Chambers, PhD, Susannah Paletz, PhD, Valerie Novak, MA. CASL Emotion Dataset of Emoji and Emoticions: Introduction to the CASL Emotion Dataset of Emoji and Emoticions, Version 1.0. Technical Report. University of Maryland Center for Advanced Study of Language. 7pp.
Paul Rodrigues, Erin Smith Crabb, Tim Buckwalter. Computational Cultural Assessment: Introduction to the annotated dataset and research software prototype. Technical Report. University of Maryland Center for Advanced Study of Language. 2015. 4pp.
Paul Rodrigues, Wendy Chambers, Valerie Novak, Susannah Paletz, Jennifer Boutz, Tim Buckwalter, Erin Smith Crabb, Amy Pate, Alan Mishler, Joe Danks, Marilyn Maines, Andrew Volpe, Allyson Slater, Salma Bouziani, Emily Iarocci. Identification and machine classification of thematic and demographic factors related to social unrest and political instability. Technical Report. University of Maryland Center for Advanced Study of Language. 2015. 64pp.
Thomas J. Conners, Claudia M. Brugman, Paul Rodrigues, with Andrew Marty. “Technical Report 1: summary of register properties from corpus of Jakarta Indonesian SMS.” Technical Report. University of Maryland Center for Advanced Study of Language. April 2015. 40pp.
Paul Rodrigues, Philip Resnik, Leonardo Claudino, Jennifer Boutz, Valerie Novak, Michael Bloodgood, Benjamin Strauss. Political Spectrum Identification: Introduction to CASL’s culturally-responsive Political Spectrum Identifier, and its Arabic WMD Issue Module. Technical Report. University of Maryland Center for Advanced Study of Language. 2013. 53pp.
Paul Rodrigues, Leo Claudino. Political Spectrum Identifier: Pipeline to identify sentiment of political text with the enhancement of features from culture and psychology. Technical Report. University of Maryland Center for Advanced Study of Language. 2013. 5pp.
Jennifer Boutz, Valerie Novak, Benjamin Strauss, Paul Rodrigues. “Validation of a domain-specific sentiment and culture-annotated dataset: Annotation Guide and Interannotator Agreement Studies.” Technical Report. University of Maryland Center for Advanced Study of Language. 2013. 46pp.
Paul Rodrigues, Jennifer Boutz, Valerie Novak, Michael Bloodgood, Benjamin Strauss. “Can we improve sentiment analysis with knowledge of culture? Annotating sociocultural knowledge to build better natural language processing systems.” Technical Report. University of Maryland Center for Advanced Study of Language. 2013. 18pp.
C. Anton Rytting, Noah H. Silbert, Paul Rodrigues, Tim Buckwalter, Valerie Novak, Mohini Madgavkar, Aric Bills. “An auditory perception study of non-native Arabic: Validating and Improving the Arabic Did You Mean..? Tool.” Technical Report. University of Maryland Center for Advanced Study of Language. 2013.
David Zajic, David Doermann, Paul Rodrigues, Peng Ye, Elena Zotkina. Faster, more accurate repair of electronic dictionaries: A progress report on CASL’s research or error detection and correction in electronic dictionaries.” Technical Report. University of Maryland Center for Advanced Study of Language. September 2013.
Michael Bloodgood, Benjamin Strauss, Erica Michael, Paul Rodrigues, Sergey Blok, Petra Bradley, Ryan Corbett, Peter Osthus, Michael Maxwell. “Improving translation through parallel corpora: Optimizing match strength through vault size and quality.” Technical Report. University of Maryland Center for Advanced Study of Language. 2012.
David Zajic, David Doermann, Michael Bloodgood, Paul Rodrigues, Peng Ye, Dustin Foley, Elena Zotkina. “A hybrid system for error detection in electronic dictionaries: A progress report on CASL’s ADALT research program” Technical Report. University of Maryland Center for Advanced Study of Language. July 2012.
David Doermann, David Zajic, Michael Bloodgood, Paul Rodrigues, Peng Ye. “Improving the speed of error correction in digital dictionaries” Technical Report. University of Maryland Center for Advanced Study of Language. April 2011.
Evelyn Browne, Claudia Brugman, Anne David, Melissa Fox, Nathanael Lynn, Michael Maxwell, Corey Miller, Tristan Purvis, Paul Rodrigues, Amalia Ganadesikan, Alina Twist, Tamara Wehmeir, Nikki Adams, Michael Marlo. Pashto Morphology. Technical Report. University of Maryland Center for Advanced Study of Language. 429pp.
Jennifer H. Boutz, Tim Buckwalter, Mohini Madgavkar, Rebecca W. McGowan, Valerie Novak, C. Anton Rytting, Aaron Freeman, Evelyn Browne, David M. Zajic, Paul Rodrigues. “Expanding Yemeni Arabic resources.” Technical Report. University of Maryland Center for Advanced Study of Language. April 2011.

DISSERTATION

Paul Rodrigues. (2012). Processing Highly Variant Language Using Incremental Model Selection. Indiana University. Doctoral dissertation.212pp. (Examines natural language processing of mixed-language streams, incorporating short-string language identification and chat alphabet transliteration.) 
Dissertation Committee: Sandra Kuebler, Markus Dickinson, Steven Franks, Samuel Obeng
Qualifying Advisory Committee: Damir Ćavar, Kenneth de Jong, Sandra Kuebler 

INVITED TALKS

Paul Rodrigues. (10/26/2016). Entity-Targeted Social Media Sentiment Analysis. Pacific Northwest National Laboratory. Seattle, WA.
Paul Rodrigues. (1/24/2012). Applied Natural Language Processing and Machine Learning for LCTL Dictionary Correction and Search. Indiana University. Bloomington, IN.
Paul Rodrigues. (1/12/2012). Stream processing of highly variant speech transcript and internet messages using incremental model selection.” University of Maryland Center for Advanced Study of Language. College Park, MD.

REFEREED PRESENTATIONS

Rytting, C. A., Silbert, N., Rodrigues, P., Novak, V., and Bills, A. Perceptual similarity and phonetic context in native English listeners’ perception of Arabic consonants. The Thirty-second Second Language Research Forum (SLRF), Provo, UT, 1 November 2013.
Paul Rodrigues, C. Anton Rytting, Tim Buckwalter. (2/2013). “Arabic Chat Alphabet: A data-oriented analysis of variation in Latinized Arabic.” The 27th Arabic Linguistics Symposium. Bloomington, IN.
Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne. (2010) “Finding Entries in an Online Arabic Dictionary.” University of Maryland Human Computer Interaction Lab Twenty-seventh Annual Symposium. May 2010.
Damir Ćavar, Paul Rodrigues, Giancarlo Schrementi. (2004) “Syntactic Parsing Using Mutual Information and Relative Entropy.” Extended Abstract. Proceedings of the Midwest Computational Linguistics Colloquium (MCLC) Bloomington, IN, USA.

NON-REFEREED PRESENTATIONS

Paul Rodrigues. Translation of HLT Resources: Balancing Cost Savings and Accuracy. Association for Machine Translation in the Americas, MT Summit XV. Miami, FL. October 2015.
David Zajic and Paul Rodrigues. Using statistical alignment to improve digitized lexical resources. University of Maryland Center for Advanced Study of Language. College Park, MD. January 1 2011.
Paul Rodrigues, David Zajic, Tim Buckwalter, Mike Maxwell, and C.A. Rytting. “Quality control for digitized dictionaries.” Association for Machine Translation in the Americas, The Ninth Conference of the Association for Machine Translation in the Americas, Workshop on Developing Updating and Coordinating Technologies, Dictionaries, and Lexicons for Terminological Consistency. Denver, CO. October 2010.
Corey Miller, Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues, Tristan Purvis. “Creating a dual-use pandialectal Pashto grammar.” Language Education and Resource Network (LEARN) for AF-PAK Languages. Omaha, Nebraska. May 2010

DATASETS

Gnanadesikan, A., David, A., Bills, A., Brugman, C., Golonka, E., Benninger, H., Boutz, J., Yelle, J., Morrison, M., Clair, N., Pandza, N., Rodrigues, P., Jackson, S., Conners, T., Accad, A., Kouloganes, A., Scheff, B., Jimenez, D., Richardson, D., Johnson, E., Lord, E., Yocklin, G., Soyoye, G., David, G., Sin, H. S., Pickens, H., Cho, H. J., Ordonez, J., Prior, J., Duran, J., Chung, J. H., Aghajanian, K., Amano, L., McDay, M., Rao, M., Beck, Q., Perry, T., Chang, W.-N., Ennals, W., & Linck, J. (2018). A dataset containing images of machine-printed text in 33 languages to improve automatic language-identification capabilities (DO104 Obj. 1.1 Technical Data Package). College Park: University of Maryland Center for Advanced Study of Language.
Gnanadesikan, A., David, A., Bills, A., Brugman, C., Golonka, E., Benninger, H., Boutz, J., Yelle, J., Morrison, M., Clair, N., Pandza, N., Rodrigues, P., Jackson, S., Conners, T., Accad, A., Kouloganes, A., Scheff, B., Jimenez, D., Richardson, D., Johnson, E., Lord, E., Yocklin, G., Soyoye, G., David, G., Sin, H. S., Pickens, H., Cho, H. J., Ordonez, J., Prior, J., Duran, J., Chung, J. H., Aghajanian, K., Amano, L., McDay, M., Rao, M., Beck, Q., Perry, T., Chang, W.-N., Ennals, W., & Linck, J. (2018). A dataset of annotated images of machine-printed text in 33 languages to improve OCR capabilities (DO104 Obj. 2.2 Technical Data Package). College Park: University of Maryland Center for Advanced Study of Language.

Rytting, C.A., Rodrigues, P., Buckwalter, T., Novak, V., Bills, A., Silbert, N., and Madgavkar, M. 2014. ArCADE: An Arabic Corpus of Auditory Dictation Errors. Available at http://www.casl.umd.edu/datasets/cade/arcade/index.html

NON-REFEREED POSTERS

“CASL: Dictionaries and Grammars” University of Maryland Language Science Day, Session 1. (9/24/2010)
“CASL: Arabic Group” University of Maryland Language Science Day, Session 2. (9/24/2010)