# Inrernship programme -Domestic internships

Internships scholarships are granted to young scientists (up to their 35th year) interested in the problems of Information Technologies. The candidates may apply to the Commission with their own project or select one of the internship topics proposed by the Management of the IPhDS.

Internships for PhD students of IPhDS and employees of the Institutes that are carrying out the Project can be implemented at an arbitrary domestic research centre, after approval by the Recruitment Commission.

Internships for other interns are implemented under the supervision of a research worker of one of the Institutes that are implementing the Project.

The candidate’s supervisor should develop, together with the candidate intern, an internship schedule, and submit it to the Recruitment Commission. Besides the internship schedule and project description, interns submit to the Commission other application documents, i.e. application letter, application for internship, motivation letter, CV, list of documented research achievements, recommendations and consent to internship from the institution where the candidate is to undergo the internship.

After obtaining a positive decision from the Recruitment Commission, the intern signs two agreements: a scholarship agreement between the intern and the IPhDS Manager, and a trilateral agreement between the intern, the Directors’ Board of ICS PAS as the party sending the intern to the internship, and the Directors’ Board of the Research Entity where the internship will take place. Both agreements specify in detail the intern’s rights and obligations under the internship.

During the internship, the intern is obliged to maintain an Internship Journal and to sign the attendance list at the workplace. Both the documents (Internship Journal and attendance list), together with a monthly report, signed by the intern’s supervisor, should be delivered to the Dean’s Office, and constitute the basis for payment of the internship scholarship . After completion of the internship , each participant of the internship receives an internship certificate.

#### DOMESTIC INTERNSHIPS – THE INTERNSHIP SCHOLARSHIP IS UP TO PLN 3.500 PER MONTH FOR INTERNS WITH THE M.Sc. DEGREE AND PhD STUDENTS, AND PLN 4.500 FOR YOUNG DOCTORS.

## Proposals for topics of internships:

### Automatic Interior Layout Design (dr S. Chojnacki)

The purpose of this project is to develop a tool that automatically divides an interior into rooms. The problem has been addressed recently by Merrel. It is similar to other layout design problems e.g. designing content of websites, locations of paragraphs and images in documents or furniture in rooms.

Resources:

[1] http://graphics.stanford.edu/~pmerrell/floorplan-final.pdf

[2] http://research.microsoft.com/apps/pubs/?id=69470

[3] http://dl.acm.org/citation.cfm?id=2407773&CFID=164685610&CFTOKEN=54262296

[4] http://www.sweethome3d.com

### Dynamics of cortical-hippocampal interactions during consolidation of remote memory in Rodents

### Pierre Meyrand PhD CNRS

The goal of this project is to elucidate cortical-hippocampal interactions underlying remote memory formation in awaked and behaving animals. To reach this aim the activation patterns of cortical and hippocampal networks involved in the processes of encoding, storage and retrieval of memory in rodents will be studied using multi-electrode arrays (MEA) approaches.

Complex interactions between neocortex and hippocampus are the neural basis of memory formation. Immediately after learning, memories are labile, that is, subject to interference and trauma, but later they are stabilized, such that they are not disrupted by the same interfering events. The identification of cortical and hippocampal structures involved in memory encoding, consolidation and retrieval will be estimated using activity dependent genes c-fos and zif268 classically used as indirect correlate of neural activation. Subsequent multi-channel recordings of neuronal activity from underlying structures will be performed in freely moving animal engaged in remote memory task.

We are looking for a candidate for a PhD study, or post-doc training who would participate in the above project. The main aim of the project will be focus on the electrophysiological data analysis. For that reason the candidate should have knowledge and practical experience in computer science. For a better understanding of the experimental paradigm, the student could participate in behavioural experiments and identification of brain structures involved in memory processing using immunohistological techniques, as well as in acquisition of electrophysiological data.

The project will be realized in collaboration with Prof. Tiaza Bem from IBIB PAS and Profs Pierre MEYRAND and Bruno BONTEMPI from the Institut des Maladies Neurodégénératives (IMN) at Bordeaux2 University and CNRS.

### „New model checking methods for hybrid systems”

### prof. Wojciech Penczek, ICS PAS

Hybrid systems (HS) can be viewed as generalisations of Real Time Systems (RTS). The theoretical analysis of HS as well as the methods of their formal verification follow advances in the approaches for RTS. However, for HS the main obstacle is manifested by high complexity of the problems involved. In 1996, Henzinger established a theory of hybrid automata. Several papers followed the resulting model checking approach, accompanied by implementations. The conclusion was that model checking of hybrid automata is feasible, but suffers heavily from the state space explosion problem and more research is needed. The further research focused on efficient representations for representing equivalence classes of the variable values, combined with encoding of the discrete parts, and using external tools for solving sub-problems such as computing the preimages. In the recent years, SAT solvers have been extended to Satisfiability Modulo Theorem (SMT) solvers, the tools combining traditional SAT solving with decidable domains like Presburger arithmetics. Concerning the representations, several variants of DDs have been developed, capable of representing linear constraints. However, recently an alternative representation of And-Inverter-Graphs (AIGs) has become quite popular, being more compact compared to DDs in many applications. Moreover, several new heuristics and metaheuristics are available for solving hard problems in Artificial Intelligence.

The project consists in examining to which extent the advances in non-hybrid symbolic model checking and in solving hard problems in AI can be generalised to hybrid systems.

Our objectives in the project include the following four tasks:

- Analysis of the existing verification methods for HS,
- Analysis of the existing verification methods for RTS,
- Analysis of the novel heuristics and metaheuristics in AI,
- Application of the selected methods for RTS and from AI to HS.

### “Random Subset Method for Regression”

### prof. Jan Mielniczuk, ICS PAS

Random Subset Method (RSM) is a powerful method of model choice in classification when the problem is high dimensional. In the RSM a random subset of features is chosen having cardinality m smaller than the number of potentially useful features M and the problem is solved in the reduced feature space of the selected predictors. Features under consideration are assigned weights based on their performance in the constructed solution. The procedure is repeated several times, cumulative weights of predictors are calculated and the selection of predictors is then based on them. The Random Forest method is the most known example of such approach.

The project concerns extension of the RSM to regression problems with quantitative response. The following topics will be researched: construction of appropriate measures of variables’ performance tailored to regression problem considered, data dependent choice of cardinality m of the small model chosen and weighted versions of the method when inclusion of variables into the small model depends on a preliminary assessment of their performance.

Literature:

- T.K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell., 832-844, 1998.
- P. Buhlmann, M. Kalisch, and M.H. Maathuis. Variable selection in high-dimensional linear models: partially faithful distributions and the pc-simple algorithm. Biometrika, 261-278, 2010.
- J. Fan and J. Lv. Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of the Royal Statistical Society B, 849-911, 2008.

### “Monitoring data sets using the methods of Statistical Process Control (SPC)”

### prof. Olgierd Hryniewicz, SRI PAS

Contemporary methods of data acquisition let build dynamically changing data sets with complicated structures. Methods of data mining allow to analyze such structures, and to use the acquired knowledge for practical purposes. In the case of large data sets the change of the structure of newly acquired data may not influence the structure of the whole data set. However, such information may be very important in practice. When we describe the structure of a data set with a certain one- or multivariate index, we may assume that that in case of stable data the values of such an index calculated for new small data sets may vary purely randomly. Therefore, there is a need to discriminate between pure random variation of such an index (indices) and its statistically significant change. The methodology of Statistical Process Control (SPC), that has been developed during the last 80 years, provides simple and easy to understand (very important in practice!) methods for the analysis of the parameters of discrete production processes. One can think that this methodology could be used for the analysis of possible changes in the structure of data sets. When parameters of the monitored production process are changing the SPC procedures generate alarms. In the context of the analysis of the structure of data sets such an alarm may indicate the need for a new detailed analysis of the whole available data.

Problems to be solved:

- Proposal od indices describing the structure of a data set (Hint: to use methods known from the cluster analysis).
- Proposal of statistical methodology to deal with such data (Hint: to use statistical sequential methods).
- Preliminary test of the proposed methodology using real data, e.g. acquired from the Internet.

### “The method of Maximum Likelihood on data streams”

### Prof. Szymon Jaroszewicz, ICS PAS

In recent years we can observe an exponential growth of the amount of data being collected. That growth made it necessary to develop new analytical approaches capable of handling such vast amounts of data. One of those approaches is the so called data stream analysis, where every record is seen exactly once, and immediately discarded; the data are not stored, but treated as a continuous stream. The method of maximum likelihood is one of the most important statistical methods which is used to find the parameters of many popular models such as linear or logistic regression. It thus seems natural, that Maximum Likelihood based methods should be adapted to the data stream setting. Unfortunately, the solutions currently available, based on methods of stochastic approximation, are very inefficient. This project will aim at developing more efficient algorithms for the implementation of the Maximum Likelihood methods on data streams.

Literature:

- S. Pang, S. Ozawa, N. Kasabov, Chunk Incremental LDA Computing on Data Streams, Advances in Neural Networks, Springer 2005
- S. Muthukrishnan, Data Streams and Applications
- H. Kushner and G. Yin, Stochastic Approximation Algorithms and Applications, Springer, 1997

### “Implementations and theoretical properties of efficiently computable grammar-based codes”

### Dr Łukasz Dębowski – ICS PAS

Grammar-based coding is a method of lossless data compression originally developed for compression of texts in natural language. The method consists in constructing a possibly smallest context-free grammar that generates the text as its only production. It has been proved that the global grammar minimization is an NP-hard problem [1] but many interesting algorithms have been proposed that consist in local grammar minimization [1,4,5,6]. Some of these codes are universal, i.e., their asymptotic compression rate equals entropy rate for any stationary process [3,4].

The aim of this project is to develop new methods of grammar-based compression, with an emphasis on admissibly minimal codes proposed in [3]. We suppose that admissibly minimal codes are particularly good at compression because the excess length of such a code is bounded by the number of nonterminal symbols in the grammar [3]. An interesting research problem is whether there exists an admissibly minimal grammar-based code which is computable in the polynomial time. If so, this code would be worth implementing and checking its properties in practice.

The successful candidate should be familiar with book [2].

References:

- M. Charikar, E. Lehman, A. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat, "The smallest grammar problem," IEEE Transactions on Information Theory, vol. 51, pp. 25542576, 2005.
- T. M. Cover, J. A. Thomas, Elements of Information Theory, 2nd ed., Wiley, 2006.
- Ł. Dębowski, "On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts," IEEE Transactions on Information Theory, vol. 57, pp. 4589-4599, 2011.
- J. C. Kieffer and E. Yang, "Grammar-based codes: A new class of universal lossless source codes," IEEE Transactions on Information Theory, vol. 46, pp. 737754, 2000.
- C. G. de Marcken, "Unsupervised language acquisition," Ph.D. dissertation, Massachussetts Institute of Technology, 1996.
- C. G. Nevill-Manning, "Inferring sequential structure," Ph.D. dissertation, University of Waikato, 1996.