Thesis/project proposals

Are you looking for a project for your Master thesis or for some other project you would like to carry out during your studies, for example for the BSc course Progetto di Ingegneria Informatica? Below you can find some research topics I am or would like to work on. All these can be turned into projects and theses and are meant to allow you to get a glimps behind the scenes of the work of a professor and to show you that there is more than teaching in a university. Methodologically, typically these topics are to be approached using a data science and/or web/software engineering methodology.

If you spot something you like, drop me an email or get in touch with me! I'll be happy to discuss possible options to work together. Also, yes, you can work in groups of two and write and present your thesis together.

What you need: programming skills, knowlege of web technologies, data management skills, a brain and passion.

What you get: new competences, a thesis (or tesina), a project, personal satisfaction and recognition of your work.

Writing your thesis: yes, a thesis is also about writing and writing skills. Here some thoughts on writing and a possible template for your thesis.

In addition to the topics below, here you can find the slides presented during the thesis proposal meeting organized by the Polimi Data Scientists. Perhaps you participated in the meeting and are looking for those slides.

Harmful social bots

Topics: social bots, social networks, data science, ethics

Bots are algorithmically driven entities that behave like humans in online communications. They are increasingly infiltrating social conversations on the Web or in chat apps. If not properly prevented, this presence of bots may cause harm to the humans they interact with, e.g., they may offend or discriminate people. The goal of this research is to understand which types of harm and which abuses may happen, whether abuses can be considered intentional or not, whether it is possible to prevent them and, if yes, how.

Research questions and possible thesis/project topics

  • Bot detection: Given a specific type of abuse, e.g., discrimination, spamming or mimicking interest, how can we identify bots that may be vulnerable to this type of abuse? Answering this question requires following a data science methodology and may require the use of machine learning and AI algorithms, online/reinforcement learning techniques, network analysis techniques, and similar. Different social networks may be studied. The expected output are algorithms or methods able to classify social network accounts based on the harm they may cause.
  • Code analysis: Here the idea is to start from the code of bots shared online, e.g., in GitHub, and to study which abusive code patterns can be identified, which effects they may have, and how to spot bots that implement them. Again, answering these questions requires a data science approach and may imply the use/implementation of code analysis and search algorithms as well as manual inspection of code. Different social networks may be studied. Possible outputs are code patterns or pattern search algorithms and data analyses.
  • BotBuster: As a preliminary result of the research in this area, an online tool named BotBuster has been developed. It allows one to classify accounts in Twitter as bot or not and, in case of bots, it identifies potential harms. It currently implements an ensemble of bot classification algorithms, which are trained offline and used online. The question is how to turn BotBuster into a tool that is able to learn over time. Also new algorithms for BotBuster can be implemented.

Preliminary results

Conversational screen readers

Topics: personal assistants, web engineering, human-computer interaction

Screen readers are accessibility tools for blind or visually impaired people or simply people that are not able to look at the screen of the computer in a given instant of time. They are able to read out text and to deliver a voice-based interaction with applications; user input is typically keyword based. Vocal personal assistants like Amazon Alexa, Google Assistant, Microsoft Cortana and Apple Siri are virtual assistants that provide a bidirectionally vocal user interface toward software. The goal of this research is to study how to turn personal assistants into modern screen readers for the web and to design, develop and test respective prototype systems. The vision is “talking” with websites.

Research questions and possible thesis/project topics

  • Heuristic conversation generation: Given the URL of a website, how can we extract the necessary knowledge and content structure of the respective web page so that a conversational agent is able to maintain a meaningful conversation with users? In a first attempt, answering this question will require an analysis of existing accessibility mechanisms, the identification of which capabilities can reasonably be supported through a conversational interface, and the conception of heuristics that may be used to instruct the conversational agent. Web engineering skills are a must and “speaking” prototypes are targeted.
  • Conversation annotation format: Assuming that it is not possible to generate meaningful conversation skills without any additional input, a new question could be: is it possible to specify an annotation format that allows one to add conversation-specific hints to the website so as to enable new/advanced conversational skills? Answering this question requires web engineering skills and again targets “speaking” prototypes.
  • Automated conversation generation: Forgetting about heuristics and annotations or also using them, is it possible to automatically learn how to extract conversational knowledge form web pages? The approach here is more a data science approach (it will be necessary to analyze lots of HTML pages), yet the idea is still to implement a working prototype that supports voice interactions.
  • User experience: How effective are conversational screen readers in practical settings? Is the paradigm actually useful? Which are hurdles for adoption, which instead the strong points compared to traditional screen readers? Lots of questions that require suitable user studies to be answered, as soon as the first working prototypes are ready.

Preliminary results

  • no results yet

Conversational data exploration

Topics: chatbots, data engineering, human-computer interaction

Data exploration is a data analysis practice that, given a database with concrete data, aims to understand what is in the database (instances) and which characteristics describe the data in the database (schema). It is typically carried out by an expert interested in learning how to use the data and albe to write even complex SQL queries. Chatbots are bots (software agents) that provide textual, conversational user interfaces toward software, that is, they allow users to interact with software using natural language text messages, like the ones we all use in Whatsapp, Telegram, Messenger. The goal of the research in this context is to develop a chatbot for data exploration that allows also non experts to explore and understand the content of a given database.

Research questions and possible thesis/project topics

  • Heuristic conversation generation: Given a database endpoint, how do we “teach” the chatbot knowledge about the database? A preliminary analysis has shown that it may be necessary to annotate the schema with conversational clues, so that the chatbot is able to learn the necessary terminology, grammar and user questions. But which features can a chatbot support? What exactly needs to be annotated, and how? The output will be a chatbot able to interact with users and to answer his/her questions about the database.
  • Automated annotation: Assuming that some form of annotation of the schema is needed in order to help the chatbot create a conversational competence, how much of this annotation can be generated or derived automatically from the schema and/or instances of the database? Do we need suitable heuristics encoding expert knowledge or can AI/machine learning algorithms do the job automatically? The final output will again be a working prototype of data exploration chatbot.
  • User experience: How effective is conversational data exploration in practice? Is the paradigm actually useful? Which are hurdles for adoption, which instead the strong points compared to SQL-based data exploration? All questions that require suitable user studies to be answered, as soon as the first working prototypes are ready.

Preliminary results

  • no results yet

Service-oriented computing on blockchain

Topics: blockchain, smart contracts, web services

Service-orientated computing is the development paradigm that leverages on web services, e.g., REST APIs or SOAP services, for the implementation of composite applications. Web services are software components that provide programmatic access to application logic via standard interaction protocols over the Internet. Blockchain is a new technology for developing immutable, distributed ledgers (logs) of transactions, e.g., used for the implementation of Bitcoin. Smart contracts are small programs that can be executed inside the blockchain, e.g., to enforce contractual agreements. The goal of the research proposed in this area is to study how smart contracts can be used to implement a service-oriented computing paradigm inside the blockchain, in order to make blockchain functionality accessible and reusable.

Research questions and possible thesis/project topics

  • Service model: What does it actually mean to interpret smart contracts as services? How does this affect the traditional model of web services we know for example from SOAP or REST? What could be a good model for a service implemented using smart contracts? All these questions require a thorough analysis of current service-oriented computing technology and a profound understanding of smart contracts. The goal is developing a model of reusable service for the blockchain and the implementation of suitable demonstrators of viability.
  • Service search and discovery: How do we enable developers or other potential consumers of a smart contract service to search for and discovery interesting services? Can we do so in a cross-blockchain fashion (supporting different blockchain technologies)? Currently, no suitable registry or solution exists that helps service consumers in their task. Solving this issue requires knowledge in web engineering, web services and blockchain. The target is a prototype of openly accessible registry.
  • Composition model: Given a set of candidate smart contract services, how could a possible composition paradigm for smart contracts look like? Do we need a dedicated modeling language or middleware for execution? The goal is to develop a new service by integrating existing ones. This is very challenging, and given a good answers will require good knowledge of both web services and blockchain/smart contracts. Target outputs are a composition model and a respective notation, perhaps also a middleware for composition execution.
  • Composition editor: Assuming we have some form of composition paradigm, can we develop some form of visual editor to assiste developers in their composition task? How could this editor look like? Which are the key features such an editor should have? All these questions have no answers yet and different solutions may be compared to given an answer. The goal is to provide the composition paradigm of the previous point with a suitable, easy to use development environment.

Preliminary results

Blockchain-based service level agreements

Topics: blockchain, smart contracts, web services, service level agreements

Blockchain is a new technology for developing immutable, distributed ledgers (logs) of transactions, e.g., used for the implementation of the cryptocurrency Bitcoin. Smart contracts are small programs that can be executed inside the blockchain, e.g., to enforce contractual agreements. A service level agreement (SLA) is a contract between a provider of a service and its consumer that states which constraints and requirements regarding the quality of the service must hold, in order for the delivery of the service to be considered satisfying by the consumer (e.g., max response time < 10ms). The typical challenge in the delivery of digital services over a network, e.g., the Internet, is certifying service levels in a manner that is trustful to both parties involved. The goal of the research proposed in this context is to study if and how blockchain technology can be used as a distributed service level certification authority.

Research questions and possible thesis/project topics

  • Automated payment and compensation: Assuming there are three actors, a provider, a consumer, and a trusted service level monitor, how can we model the services offered and the respective service level agreements in terms of smart contracts able to manage payments for service consumption and compensations for SLA violations? The idea is to establish two channels, one for the actual delivery of the service and one for billing and compensation, to enable automated payments and enforcement of agreements. Answering the question requires knowledge in web services, blockchain and smart contracts. The implementation of a working scenario is expected.
  • Blockchain SLA monitor: Assuming instead that there is no trusted service level monitor we can rely on, is it possible to use features provided by blockchain technology itself, e.g., transactions, payloads, timestamps or smart contracts, to implement this trusted service level monitor? Which quality of service attributes can be supported, which not? The answer to these questions requires a profound understanding of blockchain technology and smart contracts. The implementation of a working prototype is expected.

Preliminary results

  • no results yet

Domain-specific content extraction

Topics: web engineering, data extraction, homebrewing

Web content extraction is the collection of structured content (data) from generic web pages. The challenge is understanding which data elements are relevant and how to extract them, taking into account the structure of the page (HTML), its style (CSS), and perhaps the actual data items and/or their semantics. Typical synonyms of the practice are web wrapping or web scraping. While there are some generic tools that typically require user intervention, the goal of this line of research is to study methods for the fully automated extraction of domain-specific content, that is, content for which we already know something (e.g., target instances or typical data structures). The specific domain this research focuses on is homebrewing, i.e., making beer at home, and beer recipes.

Research questions and possible thesis/project topics

  • Automated recipe extraction: Given a web page that describes the recipe of a beer in a textual, colloquial fashion, how can we automatically extract a structured recipe from the page? How can we do so from pages with different structures and presentation styles? Are we able to automatically identify the units of the ingredients and to convert them from the imperial to the metric system (or vice-versa) and to import the recipe into a suitable editor for further editing? Answering these questions requires web engineering skills and may ask for the development of a browser extension for the extraction of content.

Preliminary results