Industry Day

29 March 2018

From Research To Production

This coming Spring, the ECIR 2018 Industry Day will focus on the application of IR techniques in production environments and the development of production ready IR/ML systems, whether they make use of open-source or in-house libraries or involve the development of those libraries. Our goal is to connect researchers and practitioners, tool and product developers to promote closer collaborations and synergies to close the gap between research and production systems. We would like to have as many representatives from the whole research to product pipeline as possible. Say, you invented a new topic model and published it, or you contributed a production ready implementation of that topic model into an open-source tool, like gensim, or you work at a startup company and used that new technology in your product stack. Come and exchange your experiences and learn from others’ experiences!

See you in Grenoble!

Gabriella and Miguel

Programme

8.30-8.50	Registrations
8.50-9.00	Intro games: Involves catching and throwing a ball
9.00-10.00	Keynote: Radim Řehůřek (RARE Technologies) Anatomy of an idea: mixing open source, research and business
10.00-10.30	Coffee break
10.30-11.30	Panel: Julia Kiseleva (University of Amsterdam and UserSat), Bhaskar Mitra (Microsoft), Craig Macdonald (University of Glasgow), Agnes van Belle (Texkernel)
11.30-11.50	Fabrizio Silvestri (Facebook) Query Embeddings: From Research to Production and Back!
11.50-12.10	Mihai Lupu (Research Studios) Self Optimizer: Applying State-of-the-Art Research to the Industrial Intellectual Property Domain
12.10-12.30	Tereza Iofciu (myTaxi) Building a demand prediction solution at mytaxi
12.30-13.30	Lunch
13.30-13.50	Erik Graf (cortical.io) AI in the wild
13.50-14.10	Alessandro Benedetti (Sease Ltd) From Academic Papers To Production: A Learning To Rank Story
14.10-14.30	Manos Tsagkias (904Labs) A.I. for Search: Lessons Learned
14.30-14.50	Christoph Glauser (ArgYou AG) Getting Closer to the User or Why Most Content is Never Searched for
14.50-15.00	Pitch madness
15.00-15.30	Coffee break
15.30-15.50	Marc Bron (Schibsted Media Group) Managment of Industry Research: Experiences of a Research Scientist
15.50-16.10	Minjie Xu (Bloomberg) Read less learn more: overview extraction from news
16.10-16.30	Manel MEZGHANNI (IRIT Laboratory) Integrating new Artificial Intelligence Techniques to Semios for Requirements Software
16.30-17.00	Discussion and closing

Keynote

Radim Řehůřekis is the founder and director of RARE Technologies, a leading R&D company focused on machine learning and natural language processing. Radim has been building practical solutions for businesses for over a decade. He is the creator of Gensim, a popular open source Python library for topic modeling and information retrieval.

Talk abstract

Machine learning has become red-hot with hype, but the world of academic research is still worlds apart from the pragmatic needs of the industry. What are some common gotchas on the journey from inception to scoping, research prototype, validation to production? I’ll share our insights from a decade of building practical ML and NLP solutions for some of the largest companies in the world.

Panel

Julia Kiseleva is a postdoctoral researcher at University of Amsterdam and co-founder of usersat.com, a spin-off company offering search and recommender systems optimization specifically for mobile devices. She has extensive industrial experience at leading IT companies including Microsoft Research, Microsoft Bing, Yandex.ru, Hewlett Packard Research, E-bay, and Booking.com. In 2016, she obtained a PhD from the Eindhoven University of Technology on my research to improve the user’s search and browse experience.

Bhaskar Mitra is a Principal Applied Scientist at Microsoft AI & Research, Cambridge. He joined Bing in 2007 (then called Live Search), where his responsibilities included experimentation and shipping new ranking techniques, as well as improving infrastructure for experimentation agility. In 2013, he switched to an applied research role and is currently pursuing a part-time doctorate at University College London under the supervision of Dr. Emine Yilmaz and Dr. David Barber. His current research interests include representation learning and neural networks, and their applications to large scale retrieval systems. As an industry researcher, he is focused on “shippable” research, and has contributed to open source projects such as the Microsoft Cognitive Toolkit (CNTK).

Craig Macdonald is a Lecturer in Information Retrieval, within the School of Computing Science at the University of Glasgow. He is interested in information retrieval (IR) in general, for instance in settings such as Web, Enterprise, social media and Smart cities, building upon observations from sensor). He regularly participates in TREC, and jointly co-ordinated the TREC Blog track from 2006-2010, the Microblog track (from 2011-2012), and the Web track (2014-). His thesis addressed access to expertise in enterprise environments, and was titled The Voting Model for People Search. He has since deployed an expert search system for the SICSA alliance of Scottish computing science schools. He is involved in Terrier as a lead developer for the Terrier IR platform, and also uses Terrier in his research publications. The TerrierTeam and I also blog over at the TerrierTeam@Blogspot, and occasionally chat about IR on Twitter @craig_macdonald.

Agnes van Belle is a research engineer and team lead of Search R&D team at Texkernel, focusing on improving the retrieval and matching performance of their main search product that can be used for searching CVs and vacancies as well as automatically matching them. She joined Textkernel in 2014 and has since worked on developing, integrating and leveraging Learning to Rank and Neural IR techniques techniques in the product. Some challenges related to that in terms of bridging the gap between literature and practice concern how to pair these approaches with a faceted, navigational search approach; as well as how to get satisfactory user feedback to learn or improve the models by integrating explicit feedback possibilities into the search interface or using implicit feedback from logs. Prior to this Agnes graduated from the university of Amsterdam in Artificial Intelligence, and worked on several data science projects for municipalities, enterprises and the government. Her current interests are information retrieval, predictive modeling, data mining and reactive systems.

Radim Řehůřekis will also join the panel.

About the Speakers

Fabrizio Silvestri

Fabrizio is currently a software engineer at Facebook working in the search team on various topics related to science and technology of query recommendation and query analysis in general. Prior to Facebook, Fabrizio was a principal scientist at Yahoo where he has worked on sponsored search and native ads within the Gemini project. Fabrizio holds a Ph.D. in Computer Science from the University of Pisa, Italy where he studied problems related to Web Information Retrieval with particular focus on Efficiency related problems like Caching, Collection Partitioning, and Distributed IR in general. He has been the recipient (together with Ranieri Baraglia) of the best Web Intelligence 2004 paper award, and the recipient of the ECIR 06 best paper award. In 2014 he has been a recipient of the best paper award at the internal Yahoo conference: Tech Pulse. He is the author of more than 130 papers and he has patents filed in the area of web advertising.

Erik Graf

Eric has over 12 years’ R&D experience in the fields of natural language processing and information retrieval in multinational companies and startups. His passion lies in the development of scalable artificial-intelligence and machine-learning solutions for real-world problems. Erik obtained his PhD at the University of Glasgow, exploring information retrieval based on human information processing. His general research interest lies at the intersection of cognitive science, IR, and NLP. He has worked or collaborated with several academic and industry research groups, including the HP Information Dynamics Lab, IBM Labs, the Glasgow IR Group, and the Sheffield NLP Research Group. At Cortical.io, he has been responsible for the development of successful commercial NLP solutions for large enterprises. These include the Cortical.io Retina, which enables semantic searching of big text data, and the Cortical.io Contract Intelligence Engine, which automates extraction of key information from large volumes of legal documents. Erik currently leads R&D at Cortical.io.

Manel MEZGHANI

Manel received her PhD degree in computer science in 2015 from the Paul Sabatier University, Toulouse, France and from Faculty of Economics and Management of Sfax, Tunisia. Since 2015, she joined Toulouse Research Lab in Computer Science (IRIT) as a researcher. Her research areas include user profile, social network analysis, information retrieval and machine learning. Since 2018, she joined R&D team of PROMETIL, where she leads projects on the development of an intelligent approach to improve quality in technical documents. Her current research is related to requirements engineering combined with artificial intelligence.

Minjie Xu

Minjie is a senior software engineer working in the Machine Learning team at Bloomberg in London, where he works on applying ML/NLP techniques to news wire and social media analytics in order to make such content more easily discoverable by Bloomberg clients. Before joining Bloomberg he obtained his PhD degree in Computer Science from Tsinghua University, working with Prof. Jun Zhu on various Bayesian techniques (max-margin nonparametric Bayesian models, distributed Bayesian posterior sampling, etc.) and published several papers at ICML and NIPS. Minjie is also an active reviewer for multiple leading Machine Learning conferences and he received an ICML Outstanding Reviewer Award in 2016.

Alessandro Benedetti

Alessandro is a Search Consultant and R&D Software Engineer at Sease Ltd. His focus is on information retrieval, information extraction, natural language processing, and machine learning. At Sease Alessandro is working as a freelance on Search/Machine learning projects and consultancies. Prior to that he designed and developed an end to end integration of Learning To Rank technologies in a Solr powered commercial search engine, and an Enterprise Semantic Search Engine known as Sensefy using approaches such as Named Entity Recognition at indexing time, advanced autocompletion, and document similarity metrics. When he isn’t working for clients, he is actively contributing to the open source community and presenting the applications of leading edge techniques in real world scenarios at meetups and conferences such as ECIR, the Lucene/Solr Revolution, Open Source Summit and ApacheCon.

Tereza Iofciu

Tereza is working in the Data Science team at mytaxi in Hamburg, focusing on demand prediction. Before that she was part of the Data Engineering team at mytaxi, building the data infrastructure with Hadoop technologies. Prior to moving to the industry sector she was a researcher at L3S research institute in Hanover, where she focused on user modeling, tag analysis and entity search and obtained her PhD degree.

Marc Bron

Marc received his MSc degree in Artificial Intelligence at the University of Amsterdam in 2009, where he continued to pursue a PhD in Information Retrieval at the Intelligent Systems Lab Amsterdam under supervision of Prof. dr. Maarten de Rijke. After completing his PhD on exploratory search and complex search tasks in 2013, Marc held two successive post-doc positions at Amsterdam and Utrecht University. Marc joined Yahoo! Labs in 2014 and worked on various projects related to user engagement and advertisement quality. Most recently Marc started as a Senior Data-scientist at Schibsted to work on user engagement and quality aspects of search and advertising related products. Marc’s research interests concern the study and analysis of user behavior in order to develop better tools and algorithms for organizing and filtering information.

Mihai Lupu

Mihai is a computer scientist with a background in Search Technologies, who has been working in the patent domain since finishing his PhD studies in 2008. Currently, he is associate editor of World Patent Information, and Studio Director for the Data Science Studio at Research Studios Austria. He has been previously a Post-doctoral researcher at TU Wien, where he worked on semantic search technology and the means to objectively quantify improvements in this area. As such, he was an invited speaker at EPO’s Patent Information Conference in 2016 and at the International Conference on Search, Data, Text Mining and Visualization in 2017. Mihai Lupu has over 100 publications, of which 2 editions of the Current Challenges in Patent Information Retrieval book and the Patent Retrieval book, and has been co-organizing patent-related evaluation campaigns at the National Institute of Standards and Technology (NIST) in the US, and at the Conference and Labs of the Evaluation Forum (CLEF) in Europe.

Christoph Glauser

Dr. Christoph Glauser was born on 9th October 1964 in Berne, Switzerland. He has two daughters who both live and study in Berne. Between 1985 and 1992 he studied mass media and political science and history at the universities of Berne and Geneva. He did research in the framework of a national research program about “efficiency and efficacy of government programs” at the University of Geneva while doing his PhD at the University of Berne. In 1994 he founded the “Institute for applied argumentation research IFAAR” in Berne where he started doing scientific research about national and international election campaigns and content analysis for media and news agencies. From 1996 to 2000 he directed a research program at the Swiss Federal Institute of Technology (ETH) in Zurich about “expert communication on biotechnology in the public sphere and in the media”. In 1998 he was lecturing “Publizistik (communications)” at the University of Zürich. Between 1998 and 2007 he was a lecturer at Communication School of the University of Washington Seattle USA. Since 2001 Dr. Glauser has been president of the board of the “IFAAR” institute (NPO) and Managing Director of ArgYou AG. He has developed a new “find-engine” for measuring the effect and impact of online campaigns, PR and e-government. “Computerworld” calls him a GOT (global online tycoon). On LinkedIn he is rated as a “superstar” and “influencer”. He calls himself an “online-researcher” and a simple mass media scientist.

Manos Tsagkias

Manos leads 904Labs, an Amsterdam-based artificial-intelligence company that offers the first commercially available self-learning search engine for e-commerce and content providers. Manos holds a Ph.D. in search engine technology and has more than ten years of experience in search, recommendation, and predictive analytics systems. He has published more than 50 scientific publications on search engines and predictive analytics, and he has been involved in three startup ventures. Manos’ current mission is to disseminate the importance and the implications of search in today’s online businesses.