Table of Contents

**S V N Vishwanathan**, Departments of Statistics and Computer Science, Purdue University

Reproducible Machine Learning Research: Using Tools Efficiently

We all have faced the frustration with trying to reproduce someone else's results at some point or the other. This talk will show how to make your results reproducible so that others will not face the same frustration. There are two parts to the talk. In the first part I will talk about some tools for making your life easier. In the second part I will run through a case study of a recent paper of ours. The talk is loosely structured and intended to be interactive. Suitable audience includes (graduate) students and anybody who is interested in reproducible research.

**Risi Kondor**, Center for the Mathematics of Information, Caltech

Non-commutative harmonic analysis in machine learning

Non-commutative harmonic generalizes the notion of Fourier transformation to rotations, permutations, or indeed, any compact group of transformations acting on some underlying space. We have found that this theory and the corresponding fast Fourier transforms have a whole range of natural applications in machine learning and other areas of computer science. I will give an overview of these developments, touching on invariant features for computer vision, compact representations of uncertainty in multi-object tracking, graph kernels, and strategies for solving hard optimization problems.

**Alan Qi**, Department of Computer Science, Purdue Univeristy

Turning ideas into reality - Scalable Bayesian Learning

In this talk I will talk about some recent work of our group on scalable Bayesian learning. In particular, I will present three Bayesian approaches developed by our group: Bayesian variable selection of correlated variables for p»n (p is the number of variables and n is the number of data points), online learning for n»p, and parallel inference for the case both p and n are large. I will also show experimental results to demonstrate the effectiveness of our new approaches.

**Marina Meila**, Department of Statistics, University of Washington

(Joint colloquium with Statistics Department)

Consensus Finding, Exponential Models, and Infinite Rankings

This talk is concerned with summarizing – by means of statistical models – of data that expresses preferences. This data is typically a set of rankings of n items by a panel of experts; the simplest summary is the “consensus ranking”, or the “centroid” of the set of rankings. Such problems appear in many tasks, ranging from combining voter preferences to boosting of search engines.

We study the problem in its more general form of estimating a parametric model known as the Generalized Mallows (GM) model. I will present an exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate prior for this model class.

Then we introduce the infinite GM model, corresponding to “rankings” over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering.

Joint work with: Harr Chen, Alnur Ali, Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes

**Chuanhai Liu**, Department of Statistics, Purdue University

Uncertain Inference and Artificial Intelligence

It is difficult, perhaps, to believe that artificial intelligence can be made intelligent enough without a valid probabilistic inferential system as a critical module. With a brief review of existing schools of thought on uncertain inference, we introduce a valid probabilistic inferential framework using inferential models (IMs). With several simple and benchmark examples, we discuss potential applications of IMs in artificial intelligence.

**Sofus A. Macskassy**, Director of Fetch Labs, Fetch Technologies

Towards Deeper Analysis of User Generated Content: Adding Semantics and Topics to Profiles, Links and Communities

The amount of data available from user generated content, such as social media content produced by Twitter and various Blog sites, is growing at a phenomenal rate. This dynamic environment presents a large number of unique and very difficult challenges in the areas of information management, extraction, aggregation, integration and analysis, and all related sub-fields. Some of the key challenges in this environment center around the problem of understanding the dynamics of the information in terms of users and communities: what are the hot topics, what is new, what is of interest to a user or a community, what are the communities, who are key people or key information providers, what web-sites cater to what demographic, … I argue that the majority of these tasks require that one first needs to identify communities and understand the specific interests of users and communities. I further argue that links created in social media are created for a variety of reasons and one cannot identity a community by only looking at the links nor by treating all links the same. I will in this talk explore two tasks: categorizing users and how we can leverage these to understand information diffusion and community structure. Users generate a lot of content and we can learn a lot about users based on this content. This can help identify the underlying processes for information diffusion as well as new aspects of community structures both by shared interest as well as link structure. In short, we can leverage this information to gain much better insight into social media and how this can be used in a variety of applications.

**John Langford**, Yahoo! Research

Learning with Exploration

The default successful paradigm of machine learning is supervised learning. Here humans are hired (often through mechanical turk these days) to label items, and then the labeled information is fed into a learning algorithm to create a learned predictor. A more natural, less expensive, and accurate approach is observe what works in practice, and use this information to learn a predictor.

For people familiar with the first approach, there are several failure modes ranging from plain inconsistency to mere suboptimality. A core basic issue here is the need for exploration—because if a choice is not explored, we can't optimize for it. The need for exploration implies a need for using exploration to evaluate learned solutions, to guide the optimization of learned predictors, and the need to control the process of exploration so as to accomplish it efficiently.

I will provide an overview of the problems which crop up in this area, and how to solve them. This includes new results on policy evaluation and optimization, with the first ever optimization-based exploration control algorithms for this setting.

**Barnabás Póczos**, Carnegie Mellon University

Nonparametric Divergence Estimation

There are several problems in machine learning and statistics where we need the efficient estimation of the divergence between two distributions. In this presentation we propose new, nonparametric, consistent estimators for the Renyi, Tsallis, and L2-divergences, and demonstrate the applicability of the estimators for feature selection, anomaly detection, independent subspace analysis, and manifold learning of distributions. While a “naive” approach would simply estimate the underlying densities, our method can avoid estimating them. The estimators are simple, and under certain conditions we can prove their consistency. If you ever wanted to write E[X/Y]=E[X]/E[Y], then this presentation might be interesting for you.

Bio: Barnabas Poczos (Applied Mathematics M.Sc. 2002, Computer Science Ph.D. 2007) is a postdoctoral fellow at the Autonlab, School of Computer Science, Carnegie Mellon University. His research interests lie in the theoretical questions of statistics and their applications to machine learning. His ongoing research involves the development of nonparametric estimators for entropy, mutual information, conditional dependence, and divergences. After working as a lecturer at the Eotvos Lorand university, Budapest, Hungary, he moved to Edmonton, Canada and conducted research from 2007 through 2010 spring as a postdoctoral fellow at the Alberta Ingenuity Centre for Machine Learning.

**Dalton Lunga**, Purdue University

Generating Similar Graphs from Spherical Features

In this talk, we propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches which compute the features of the graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks.

**Paul Bennett**, Microsoft Research

Class-Based Contextualized Search

Abstract: Information retrieval has made significant progress in returning relevant results for a single query. However, much search activity is conducted within a much richer context of a current task focus, recent search activities as well as longer-term preferences. For example, our ability to accurately interpret the current query can be informed by knowledge of the web pages a searcher was viewing when initiating the search or recent actions of the searcher such as queries issued, results clicked, and pages viewed. We develop a framework based on classification that enables representation of a broad variety of context including the searcher's long-term interests, recent activity, and current focus as a class intent distribution. We then demonstrate how that can be used to improve the quality of search results. In order to make such an approach feasible, we need reasonably accurate classification into a taxonomy, a method of extracting and representing a user's query and context as a distribution over classes, and a method of using this distribution to improve the retrieval of relevant results. We describe recent work to address each of these challenges.

This talk presents joint work with Nam Nguyen, Krysta Svore, Susan Dumais, and Ryen White.

Bio: Paul Bennett is a Researcher in the Context, Learning & User Experience for Search (CLUES) group at Microsoft Research where he works on using machine learning technology to improve information access and retrieval. His recent research has focused on classification-enhanced information retrieval, pairwise preferences, human computation, and text classification while his previous work focused primarily on ensemble methods, active learning, and obtaining reliable probability estimates, but also extended to machine translation, recommender systems, and knowledge bases. He completed his dissertation on combining text classifiers using reliability indicators in 2006 at Carnegie Mellon where he was advised by Profs. Jaime Carbonell and John Lafferty.

**Cosma Shalizi**, Department of Statistics, Carnegie Mellon University

Markovian (and conceivably causal) representations of stochastic processes

Basically any stochastic process can be represented as a random function of a homogeneous, measured-valued Markov process. The Markovian states are minimal sufficient statistics for predicting the future of the original process. This has been independently discovered multiple times since the 1970s, by researchers in probability, dynamical systems, reinforcement learning and time series analysis, under names like “the measure-theoretic prediction process”, “epsilon machines”, “causal states”, “observable operator models”, and “predictive state representations”. I will review mathematical construction and information-theoretic properties of these representations, touch briefly on their reconstruction from data, and finally consider under what conditions they may allow us to form causal models of dynamical processes.

Bio: Cosma Shalizi is an assistant professor of statistics at Carnegie Mellon University, and a member of the external faculty of the Santa Fe Institute. He got his doctorate in physics from the University of Wisconsin-Madison in 2001 with a thesis on complexity measures and self-organization, and was a post-doc at the Santa Fe Institute and the University of Michigan's Center for the Study of Complex Systems. He works on statistical learning with dynamical systems, network analysis, heavy-tailed distributions, and inference with simulation models, and still blogs sporadically at http://bactra.org/weblog/.

**Hoda Eldardiry**, Department of Computer Science, Purdue University

Across-Model Collective Ensemble Classification

Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approaches follow the conventional approach to ensemble learning, they improve performance by reducing the error due to variance in learning. We note however, that variance in inference can be an additional source of error in relational methods that use collective classification, since inferred values are propagated during inference. We propose a novel ensemble mechanism for collective classification that reduces both learning and inference variance, by incorporating prediction averaging into the collective inference process itself. We show that our proposed method significantly outperforms a straightforward relational ensemble baseline on both synthetic and real-world datasets.

**David McAllester**, Toyota Technological Institute at Chicago

Object Detection Grammars

Driven by the PASCAL object detection challenge researchers have achieved dramatic improvements in objects detection over the past four years. This talk will describe the UoCTTI detector which has been the leading system during this time and just received a “lifetime achievement award” from the PASCAL VOC organizers. The talk will describe this detector in the context of general grammar models and will also larger issues of the role of machine learning in computer vision and artificial intelligence more generally.

Bio: Professor McAllester received his B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively. He served on the faculty of Cornell University for the academic year of 1987-1988 and served on the faculty of MIT from 1988 to 1995. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002. He has been a fellow of the American Association of Artificial Intelligence (AAAI) since 1997. Since 2002 he has been Chief Academic Officer at the Toyota Technological Institute at Chicago (TTIC). He has authored over 90 refereed publications. Professor McAllester's research areas include machine learning, the theory of programming languages, automated reasoning, AI planning, computer game playing (computer chess), computational linguistics and computer vision. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1993 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue system that defeated Gary Kasparov. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and nonBayesian methods. He is currently part of a team that has scored in the top two places in the PASCAL object detection challenge (computer vision) in 2007, 2008 and 2009.