This project deals with the study of various computational language models that are to be employed in a semantic parser based on comparative studies in large text collections in English and five Romance languages (English, Spanish, Italian, Portuguese, and Romanian). The goal is to devise and evaluate large-scale automatic techniques for both intra-language issues, such paraphrase acquisition and generation (e.g., silk dress and not silk's dress), and for inter-linguistic aspects such as machine translation (e.g., causa de preocupation which cannot be translated as the noun compound worry cause). The models will be tested in three major applications: Multilingual Question Answering, Machine Translation, and Text-to-Image Generation.
- computational and cognitive models designed to foster interdisciplinary research in order to make breakthrough predictions for future directions in a predefined field
- tools designed to promote interdisciplinary collaborations by providing novel topic suggestions to professionals who would like to engage in research discussions with other parties, but who are not familiar with those areas
- current fields: linguistics, machine learning, education (including educational psychology), marketing, and the language technology industry
- other fields will be added
- computational models for the acquisition of causal knowledge, an indispensable part of human/machine reasoning
- approach based on computational and psycholinguistic techniques in finding flexible representations of causal information that allow abstraction across various contexts
- first language loss as a result of bilingualism, with particular interest in heritage speakers of Spanish, Hindi, and Romanian living in the United States
- seeks to answer why language loss targets certain linguistic domains but not others, and to identify the linguistic and psycholinguistic causes for the particular patterns of loss found
- in collaboration with S. Montrul and R. Bhat (Linguistics)
[to be added]