This paper describes the efforts, pitfalls, and successes of applying unsupervised classification techniques to analyze the Trap-2017 dataset. Guided by the informative perspective on the nature of the dataset obtained through a set of specifically-written perl/bash scripts, we devised an automated clustering tool implemented in python upon openly-available scientific libraries. By applying our tool on the original raw data it is possibile to infer a set of trending behaviors for vehicles travelling over a route, yielding an instrument to classify both routes and plates. Our results show that addressing the main goal of the Trap-2017 initiative (``to identify itineraries that could imply a criminal intent'') is feasible even in the presence of an unlabelled and noisy dataset, provided that the unique characteristics of the problem are carefully considered. Albeit several optimizations for the tool are still under investigation, we believe that it may already pave the way to further research on the extraction of high-level travelling behaviors from gates transit records.
Traffic Data
Clustering
Unsupervised Classification
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network limit the applicability of standard techniques and demand for specific algorithms to explore and analyze it. The attention of the research community has focused on assessing the security of the Tor infrastructure (i.e., its ability to actually provide the intended level of anonymity) and on discussing what Tor is currently being used for. Since there are no foolproof techniques for automatically discovering Tor hidden services, little or no information is available about the topology of the Tor Web graph. Even less is known on the relationship between content similarity and topological structure. The present article aims at addressing such lack of information. Among its contributions: A study on automatic Tor Web exploration/data collection approaches; the adoption of novel representative metrics for evaluating Tor data; a novel in-depth analysis of the hidden services graph; a rich correlation analysis of hidden services' semantics and topology. Finally, a broad interesting set of novel insights/considerations over the TorWeb organization and content are provided.
Automatic web exploration
Correlation analysis
Network topology
Web graphs
With black-box access to the cipher being its unique requirement, Dinur and Shamir's cube attack is a flexible cryptanalysis technique which can be applied to virtually any cipher. However, gaining a precise understanding of the characteristics that make a cipher vulnerable to the attack is still an open problem, and no implementation of the cube attack so far succeeded in breaking a real-world strong cipher. In this paper, we present a complete implementation of the cube attack on a GPU/CPU cluster able to improve state-of-the-art results against the Trivium cipher. In particular, our attack allows full key recovery up to 781 initialization rounds without brute-force, and yields the first ever maxterm after 800 initialization rounds. The proposed attack leverages a careful tuning of the available resources, based on an accurate analysis of the offline phase, that has been tailored to the characteristics of GPU computing. We discuss all design choices, detailing their respective advantages and drawbacks. Other than providing remarkable results, this paper shows how the cube attack can significantly benefit from accelerators like GPUs, paving the way for future work in the area.
We present a comprehensive study of concentrated emulsions flowing in microfluidic channels, one wall of which is patterned with micron-size equally spaced grooves oriented perpendicularly to the flow direction. We find a scaling law describing the roughness-induced fluidization as a function of the density of the grooves, thus fluidization can be predicted and quantitatively regulated. This suggests common scenarios for droplet trapping and release, potentially applicable for other jammed systems as well. Numerical simulations confirm these views and provide a direct link between fluidization and the spatial distribution of plastic rearrangements.
The Voronoi diagrams are an important tool having theoretical and practical applications in a large number of fields. We present a new procedure, implemented as a set of CUDA kernels, which detects, in a general and efficient way, topological changes in case of dynamic Voronoi diagrams whose generating points move in time. The solution that we provide has been originally developed to identify plastic events during simulations of soft-glassy materials based on a lattice Boltzmann model with frustrated-short range attractive and mid/long-range repulsive-interactions. Along with the description of our approach, we present also some preliminary physics results.
We present a solution based on a suitable combination of heuristics and parallel processing techniques for finding the best allocation of the financial assets of a pension fund, taking into account all the specific rules of the fund. We compare the values of an objective function computed with respect to a large set (thousands) of possible scenarios for the evolution of the Net Asset Value (NAV) of the share of each asset class in which the financial capital of the fund is invested. Our approach does not depend neither on the model used for the evolution of the NAVs nor on the objective function. In particular, it does not require any linearization or similar approximations of the problem. Although we applied it to a situation in which the number of possible asset classes is limited to few units (six in the specific case), the same approach can be followed also in other cases by grouping asset classes according to their features.
Searching for words or sentences within large sets of textual documents can be very challenging unless an index of the data has been created in advance. However, indexing can be very time consuming especially if the text is not readily available and has to be extracted from files stored in different formats. Several solutions, based on the MapReduce paradigm, have been proposed to accelerate the process of index creation. These solutions perform well when data are already distributed across the hosts involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index text. We further improve the performance by using GPUs for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark. As proof-of-concept, we developed a tool to index forensic disk images that can easily be used by investigators through a web interface.
We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a breadth first search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a combination of techniques to reduce both the number of communications among the GPUs and the amount of exchanged data. The final result is a code that can visit more than 800 billion edges in a second by using a cluster equipped with 4,096 Tesla K20X GPUs.
Breadth First Search
CUDA
GPU
Large graphs
Parallel computing
By means of mesoscopic numerical simulations of a model soft-glassy material, we investigate the role of boundary roughness on the flow behaviour of the material, probing the bulk/wall and global/local rheologies. We show that the roughness reduces the wall slip induced by wettability properties and acts as a source of fluidisation for the material. A direct inspection of the plastic events suggests that their rate of occurrence grows with the fluidity field, reconciling our simulations with kinetic elasto-plastic descriptions of jammed materials. Notwithstanding, we observe qualitative and quantitative differences in the scaling, depending on the distance from the rough wall and on the imposed shear. The impact of roughness on the orientational statistics is also studied.
Graphics Processing Units (GPUs) exhibit significantly higher peak performance than conventional CPUs. However, in general only highly parallel algorithms can exploit their potential. In this scenario, the iterative solution to sparse linear systems of equations could be carried out quite efficiently on a GPU as it requires only matrix-by-vector products, dot products, and vector updates. However, to be really effective, any iterative solver needs to be properly preconditioned and this represents a major bottleneck for a successful GPU implementation. Due to its inherent parallelism, the factored sparse approximate inverse (FSAI) preconditioner represents an optimal candidate for the conjugate gradient-like solution of sparse linear systems. However, its GPU implementation requires a nontrivial recasting of multiple computational steps. We present our GPU version of the FSAI preconditioner along with a set of results that show how a noticeable speedup with respect to a highly tuned CPU counterpart is obtained.
Cooperativity effects have been proposed to explain the non-local rheology in the dynamics of soft jammed systems. Based on the analysis of the free-energy model proposed by L. Bocquet, A. Colin and A. Ajdari, Phys. Rev. Lett., 2009, 103, 036001, we show that cooperativity effects resulting from the nonlocal nature of the fluidity (inverse viscosity) are intimately related to the emergence of shear-banding configurations. This connection materializes through the onset of inhomogeneous compact solutions (compactons), wherein the fluidity is confined to finite-support subregions of the flow and strictly zero elsewhere. The compacton coexistence with regions of zero fluidity ("non-flowing vacuum") is shown to be stabilized by the presence of mechanical noise, which ultimately shapes up the equilibrium distribution of the fluidity field, the latter acting as an order parameter for the flow-noflow transitions occurring in the material.
We study the Poiseuille flow of a soft-glassy material above the jamming point, where the material flows like a complex fluid with Herschel-Bulkley rheology. Microscopic plastic rearrangements and the emergence of their spatial correlations induce cooperativity flow behavior whose effect is pronounced in presence of confinement. With the help of lattice Boltzmann numerical simulations of confined dense emulsions, we explore the role of geometrical roughness in providing activation of plastic events close to the boundaries. We probe also the spatial configuration of the fluidity field, a continuum quantity which can be related to the rate of plastic events, thereby allowing us to establish a link between the mesoscopic plastic dynamics of the jammed material and the macroscopic flow behaviour.
Boundary conditions
Fluidity
Lattice Boltzmann models
Soft-glassy systems
We present a highly optimized implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin-glass model with bimodal disorder, i.e.; the 3D Edwards-Anderson model running on CUDA enabled GPUs. Multi-GPU systems exchange data by means of the Message Passing Interface (MPI). The chosen MC dynamics is the classic Metropolis one, which is purely dissipative, since the aim was the study of the critical off-equilibrium relaxation of the system. We focused on the following issues: (i) the implementation of efficient memory access patterns for nearest neighbours in a cubic stencil and for lagged-Fibonacci-like pseudo-Random Numbers Generators (PRNGs); (ii) a novel implementation of the asynchronous multispin-coding Metropolis MC step allowing to store one spin per bit and (iii) a multi-GPU version based on a combination of MPI and CUDA streams. Cubic stencils and PRNGs are two subjects of very general interest because of their widespread use in many simulation codes.
GPU
Lattice
Multi-GPU
Multispin coding
Random numbers
Spin glass
Graphics processing units (GPU) are currently used as a cost-effective platform forcomputer simulations and big-data processing. Large scale applications require thatmultiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times,sub-optimal because the GPU features are not exploited at their best. We describe how itis possible to achieve an excellent efficiency for applications in statistical mechanics,particle dynamics and networks analysis by using suitable memory access patterns andmechanisms like CUDA streams, profiling tools, etc. Similar concepts andtechniques may be applied also to other problems like the solution of Partial DifferentialEquations.
Background
Cell organization is governed and maintained via specific interactions among its constituent macromolecules. Comparison of the experimentally determined protein interaction networks in different model organisms has revealed little conservation of the specific edges linking ortholog proteins. Nevertheless, some topological characteristics of the graphs representing the networks - namely non-random degree distribution and high clustering coefficient - are shared by networks of distantly related organisms. Here we investigate the role of the topological features of the protein interaction network in promoting cell organization.
Methods
We have used a stochastic model, dubbed ProtNet representing a computer stylized cell to answer questions about the dynamic consequences of the topological properties of the static graphs representing protein interaction networks.
Results
By using a novel metrics of cell organization, we show that natural networks, differently from random networks, can promote cell self-organization. Furthermore the ensemble of protein complexes that forms in pseudocells, which self-organize according to the interaction rules of natural networks, are more robust to perturbations.
Conclusions
The analysis of the dynamic properties of networks with a variety of topological characteristics lead us to conclude that self organization is a consequence of the high clustering coefficient, whereas the scale free degree distribution has little influence on this property.
We describe a solution for fast indexing and searching within large heterogeneous data sets whose main purpose is to support investigators that need to analyze forensic disk images originated by seizures or created from bodies of evidence. Our approach is based on a combination of techniques aimed at improving efficiency and reliability of the indexing process.We do not rely on existing frameworks like Hadoop but borrow concepts from different contexts including High Performance Computing and Database management.