PhD- A Generic and Model-Agnostic Evaluation Framework for Decision-Making Tasks(Task-Oriented) F/M

Orange

  • Lannion, Côtes-d'Armor
  • Stage
  • Temps-plein
  • Il y a 2 mois
about the roleOrange has implemented various bot solutions, and records large amounts of human-machine and human-to-human conversations for customer care. Often, these conversations are not rigorously evaluated.
Large language models (LLMs) are a breakthrough in many natural language processing (NLP) tasks, including the development of agents capable of solving complex tasks [6]. Moreover, developing chatbots has become democratized. It is likely that there will be a proliferation of solutions in the near future. The boundaries between various NLP tasks and the domains (e.g. tourism, restaurant, retail, technical support, etc.) are blurring.
The evaluation of the various solutions is becoming a real need, it is necessary to broaden the scope of evaluation and to make it transposable.
Previous work studied the correlation between objective and subjective metrics (indicators) to evaluate conversations [3] and for text generation [6]. Others predicted the quality of the conversation [1]. Model-agnostic scores are proposed in [4] to compare the behavior of the two dialogue systems. More recently, the DSTC (Dialogue System Technology Challenge) is focused this year on the evaluation of dialogues [5].
Inspired by the game theory [7], interpretability [2] and self-driving cars [8], we will infer the strategy that has been followed by the dialogue system [10]. We will work both on public (WebShop, ALFWorld,...) and private Orange (Technical Assistance, Commercial Bots, etc.) datasets.References:[1] Rojas-Barahona Lina M. (2020). Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function. In proceedings of NeurIPS workshop HLDS2020.
[2] Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt. Interpreting Vision and Language generative models with semantic visual priors. in Frontiers in AI. Special issue : Explainable AI in Natural Language.
[3] Marilyn A. Walker, Diane J. Litman, Candace A. Kamm, and Alicia Abella. 1997a. PARADISE: A framework for evaluating spoken dialogue agents. ACL and EACL, pages 271-280, Madrid, Spain.
[4] Ultes, Stefan, and Wolfgang Maier. "Similarity scoring for dialogue behaviour comparison." SIGDIAL. 2020.
[5] Mehri, Shikib, et al. "Interactive evaluation of dialog track at DSTC9." arXiv preprint arXiv:2207.14403 (2022).
[6] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 [7] Yu, Xiaopeng, et al. "Model-based opponent modeling." Advances in Neural Information Processing Systems 35 (2022): 28208-28221.
[8] Teng, Siyu, et al. "Motion planning for autonomous driving: The state of the art and future perspectives." IEEE Transactions on Intelligent Vehicles (2023).about you- You have experience in the fields of Artificial Intelligence, Machine Learning and particularly in deep learning.
- You have a good level of mathematics (numerical optimization, statistics, probability, etc.).
- You are proficient in software development
- You are proficient in read, written and spoken English
- You are curious, attracted by new technologies, and ready to keep up with their evolutions
- You enjoy working in a team, within multidisciplinary projects, and contributing to a common goal, while being autonomous in your activities
- You have good analytical and synthesis skills
- Proficiency in one of the following deep learning tools: Torch, pyTorch, TensorFlow, MXNet would be a plus
- You like to communicate the results of your work through written reports and oral presentations preferable in EnglishRequired training (master's degree, engineering degree, PhD, scientific and technical field, etc.)
Engineering degree and/or Research Master's degree, with knowledge in machine learning and in at least one of the fields listed above.
Desired experience (internships, etc.)
A first experience in the implementation of deep learning algorithms (as part of an internship for example) would be a plus.additional informationYou will join a team specialized in dialogue, you will work with researchers, data scientists, architects, developers, PhD students and interns.departmentOrange Innovation brings together the research and innovation activities and expertise of the Group's entities and countries. We work every day to ensure that Orange is recognized as an innovative operator by its customers, and we create value for the Group and the Brand in each of our projects. With 720 researchers, thousands of marketers, developers, designers and data analysts, it is the expertise of our 6,000 employees that fuels this ambition every day.
Orange Innovation anticipates technological breakthroughs and supports the Group's countries and entities in making the best technological choices to meet the needs of our consumer and business customers.Big Data and Artificial Intelligence is an important lever for Orange. Therefore, it is possible to re-create the customer relationship optimizing the network management, enhancing the customer experience producing a great advantage from the customer perspective.Orange is developing products and services based on artificial intelligence that use natural language processing, dialogue and conversational agents, pattern recognition and predictive analytics technologies.The Data & AI department's main mission is to make Orange a "data-driven" company, which defines the Group's standards in terms of Data & AI, and facilitates the development of data use cases, products and services. This department is called upon to support the entire Orange.contractThesis

Orange