S V N Vishwanathan, Departments of Statistics and Computer Science, Purdue University
Reproducible Machine Learning Research: Using Tools Efficiently
We all have faced the frustration with trying to reproduce someone else's results at some point or the other. This talk will show how to make your results reproducible so that others will not face the same frustration. There are two parts to the talk. In the first part I will talk about some tools for making your life easier. In the second part I will run through a case study of a recent paper of ours. The talk is loosely structured and intended to be interactive. Suitable audience includes (graduate) students and anybody who is interested in reproducible research.
Risi Kondor, Center for the Mathematics of Information, Caltech
Non-commutative harmonic analysis in machine learning
Non-commutative harmonic generalizes the notion of Fourier transformation to rotations, permutations, or indeed, any compact group of transformations acting on some underlying space. We have found that this theory and the corresponding fast Fourier transforms have a whole range of natural applications in machine learning and other areas of computer science. I will give an overview of these developments, touching on invariant features for computer vision, compact representations of uncertainty in multi-object tracking, graph kernels, and strategies for solving hard optimization problems.
Alan Qi, Department of Computer Science, Purdue Univeristy
Turning ideas into reality - Scalable Bayesian Learning
In this talk I will talk about some recent work of our group on scalable Bayesian learning. In particular, I will present three Bayesian approaches developed by our group: Bayesian variable selection of correlated variables for p»n (p is the number of variables and n is the number of data points), online learning for n»p, and parallel inference for the case both p and n are large. I will also show experimental results to demonstrate the effectiveness of our new approaches.
Marina Meila, Department of Statistics, University of Washington
(Joint colloquium with Statistics Department)
Consensus Finding, Exponential Models, and Infinite Rankings
This talk is concerned with summarizing – by means of statistical models – of data that expresses preferences. This data is typically a set of rankings of n items by a panel of experts; the simplest summary is the “consensus ranking”, or the “centroid” of the set of rankings. Such problems appear in many tasks, ranging from combining voter preferences to boosting of search engines.
We study the problem in its more general form of estimating a parametric model known as the Generalized Mallows (GM) model. I will present an exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate prior for this model class.
Then we introduce the infinite GM model, corresponding to “rankings” over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering.
Joint work with: Harr Chen, Alnur Ali, Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes
Chuanhai Liu, Department of Statistics, Purdue University
Uncertain Inference and Artificial Intelligence
It is difficult, perhaps, to believe that artificial intelligence can be made intelligent enough without a valid probabilistic inferential system as a critical module. With a brief review of existing schools of thought on uncertain inference, we introduce a valid probabilistic inferential framework using inferential models (IMs). With several simple and benchmark examples, we discuss potential applications of IMs in artificial intelligence.
Sofus A. Macskassy, Director of Fetch Labs, Fetch Technologies
Towards Deeper Analysis of User Generated Content: Adding Semantics and Topics to Profiles, Links and Communities
John Langford, Yahoo! Research
Learning with Exploration
The default successful paradigm of machine learning is supervised learning. Here humans are hired (often through mechanical turk these days) to label items, and then the labeled information is fed into a learning algorithm to create a learned predictor. A more natural, less expensive, and accurate approach is observe what works in practice, and use this information to learn a predictor.
For people familiar with the first approach, there are several failure modes ranging from plain inconsistency to mere suboptimality. A core basic issue here is the need for exploration—because if a choice is not explored, we can't optimize for it. The need for exploration implies a need for using exploration to evaluate learned solutions, to guide the optimization of learned predictors, and the need to control the process of exploration so as to accomplish it efficiently.
I will provide an overview of the problems which crop up in this area, and how to solve them. This includes new results on policy evaluation and optimization, with the first ever optimization-based exploration control algorithms for this setting.
Barnabás Póczos, Carnegie Mellon University
Nonparametric Divergence Estimation
There are several problems in machine learning and statistics where we need the efficient estimation of the divergence between two distributions. In this presentation we propose new, nonparametric, consistent estimators for the Renyi, Tsallis, and L2-divergences, and demonstrate the applicability of the estimators for feature selection, anomaly detection, independent subspace analysis, and manifold learning of distributions. While a “naive” approach would simply estimate the underlying densities, our method can avoid estimating them. The estimators are simple, and under certain conditions we can prove their consistency. If you ever wanted to write E[X/Y]=E[X]/E[Y], then this presentation might be interesting for you.
Bio: Barnabas Poczos (Applied Mathematics M.Sc. 2002, Computer Science Ph.D. 2007) is a postdoctoral fellow at the Autonlab, School of Computer Science, Carnegie Mellon University. His research interests lie in the theoretical questions of statistics and their applications to machine learning. His ongoing research involves the development of nonparametric estimators for entropy, mutual information, conditional dependence, and divergences. After working as a lecturer at the Eotvos Lorand university, Budapest, Hungary, he moved to Edmonton, Canada and conducted research from 2007 through 2010 spring as a postdoctoral fellow at the Alberta Ingenuity Centre for Machine Learning.
Dalton Lunga, Purdue University
Generating Similar Graphs from Spherical Features
In this talk, we propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches which compute the features of the graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks.
Paul Bennett, Microsoft Research
Class-Based Contextualized Search
Abstract: Information retrieval has made significant progress in returning relevant results for a single query. However, much search activity is conducted within a much richer context of a current task focus, recent search activities as well as longer-term preferences. For example, our ability to accurately interpret the current query can be informed by knowledge of the web pages a searcher was viewing when initiating the search or recent actions of the searcher such as queries issued, results clicked, and pages viewed. We develop a framework based on classification that enables representation of a broad variety of context including the searcher's long-term interests, recent activity, and current focus as a class intent distribution. We then demonstrate how that can be used to improve the quality of search results. In order to make such an approach feasible, we need reasonably accurate classification into a taxonomy, a method of extracting and representing a user's query and context as a distribution over classes, and a method of using this distribution to improve the retrieval of relevant results. We describe recent work to address each of these challenges.
This talk presents joint work with Nam Nguyen, Krysta Svore, Susan Dumais, and Ryen White.
Bio: Paul Bennett is a Researcher in the Context, Learning & User Experience for Search (CLUES) group at Microsoft Research where he works on using machine learning technology to improve information access and retrieval. His recent research has focused on classification-enhanced information retrieval, pairwise preferences, human computation, and text classification while his previous work focused primarily on ensemble methods, active learning, and obtaining reliable probability estimates, but also extended to machine translation, recommender systems, and knowledge bases. He completed his dissertation on combining text classifiers using reliability indicators in 2006 at Carnegie Mellon where he was advised by Profs. Jaime Carbonell and John Lafferty.
Cosma Shalizi, Department of Statistics, Carnegie Mellon University
Markovian (and conceivably causal) representations of stochastic processes
Basically any stochastic process can be represented as a random function of a homogeneous, measured-valued Markov process. The Markovian states are minimal sufficient statistics for predicting the future of the original process. This has been independently discovered multiple times since the 1970s, by researchers in probability, dynamical systems, reinforcement learning and time series analysis, under names like “the measure-theoretic prediction process”, “epsilon machines”, “causal states”, “observable operator models”, and “predictive state representations”. I will review mathematical construction and information-theoretic properties of these representations, touch briefly on their reconstruction from data, and finally consider under what conditions they may allow us to form causal models of dynamical processes.
Bio: Cosma Shalizi is an assistant professor of statistics at Carnegie Mellon University, and a member of the external faculty of the Santa Fe Institute. He got his doctorate in physics from the University of Wisconsin-Madison in 2001 with a thesis on complexity measures and self-organization, and was a post-doc at the Santa Fe Institute and the University of Michigan's Center for the Study of Complex Systems. He works on statistical learning with dynamical systems, network analysis, heavy-tailed distributions, and inference with simulation models, and still blogs sporadically at http://bactra.org/weblog/.
Hoda Eldardiry, Department of Computer Science, Purdue University
Across-Model Collective Ensemble Classification
Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approaches follow the conventional approach to ensemble learning, they improve performance by reducing the error due to variance in learning. We note however, that variance in inference can be an additional source of error in relational methods that use collective classification, since inferred values are propagated during inference. We propose a novel ensemble mechanism for collective classification that reduces both learning and inference variance, by incorporating prediction averaging into the collective inference process itself. We show that our proposed method significantly outperforms a straightforward relational ensemble baseline on both synthetic and real-world datasets.
David McAllester, Toyota Technological Institute at Chicago
Object Detection Grammars
Driven by the PASCAL object detection challenge researchers have achieved dramatic improvements in objects detection over the past four years. This talk will describe the UoCTTI detector which has been the leading system during this time and just received a “lifetime achievement award” from the PASCAL VOC organizers. The talk will describe this detector in the context of general grammar models and will also larger issues of the role of machine learning in computer vision and artificial intelligence more generally.
Bio: Professor McAllester received his B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively. He served on the faculty of Cornell University for the academic year of 1987-1988 and served on the faculty of MIT from 1988 to 1995. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002. He has been a fellow of the American Association of Artificial Intelligence (AAAI) since 1997. Since 2002 he has been Chief Academic Officer at the Toyota Technological Institute at Chicago (TTIC). He has authored over 90 refereed publications. Professor McAllester's research areas include machine learning, the theory of programming languages, automated reasoning, AI planning, computer game playing (computer chess), computational linguistics and computer vision. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1993 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue system that defeated Gary Kasparov. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and nonBayesian methods. He is currently part of a team that has scored in the top two places in the PASCAL object detection challenge (computer vision) in 2007, 2008 and 2009.