Abstract

Workshop Part 1
Big Data: Fundamentals and Theory

Patrick Wolfe  (University College London)

Title:

Understanding the Behaviour of Large Networks

Abstract:

In this talk – which will be accessible to a general audience – we show how the asymptotic behavior of random networks gives rise to universal statistical summaries. These summaries are related to concepts that are well understood in the other contexts outside of Big Data – such as stationarity and ergodicity – but whose extension to networks requires recent developments from the theory of graph limits and the corresponding analog of de Finetti’s theorem. We introduce a new tool based on these summaries, which we call a network histogram, obtained by fitting a statistical model called a blockmodel to a large network. Blocks of edges play the role of histogram bins, and so-called network community sizes that of histogram bandwidths or bin sizes. For more details, see recent work in the Proceedings of the National Academy of Sciences (doi:10.1073/pnas.1400374111, with Sofia Olhede) and the Annals of Statistics (doi:10.1214/13-AOS1173, with David Choi).

CV:

Patrick J. Wolfe is Professor of Statistics and Honorary Professor of Computer Science at University College London, where he is a member of the Department’s Senior Management Team and a Royal Society and EPSRC Established Career Research Fellow in the Mathematical Sciences.

From 2001-2004 he held a Fellowship and College Lectureship in Engineering and Computer Science at Cambridge University, where he completed his PhD in 2003 following a National Science Foundation Graduate Fellowship.  Prior to joining UCL he was Assistant (2004-2008) and Associate (2008-2011) Professor at Harvard University, where he received the Presidential Early Career Award for Scientists and Engineers from the White House.

Professor Wolfe currently serves as Executive Director of the UCL Big Data Institute. Externally to UCL, he serves on the editorial board of the Proceedings of the Royal Society A (Mathematical, Physical & Engineering Sciences), the Research Section Committee of the Royal Statistical Society, the Program Committee of the 2015 Joint Statistical Meetings, and as an organizer of the 2016 Newton Institute program on Theoretical Foundations for Statistical Network Analysis.

Kenji Fukumizu
(The Institute of Statistical Mathematics)

Title:

Machine learning approach to data science

Abstract:

I will introduce our activities of the Research Center for Statistical Machine Learning at The Institute of Statistical Mathematics; the center aims at developing the research community of machine learning and carries out original research projects.
Among the projects, I will talk about recent works on kernel methods for nonlinear data analysis and on sparse modeling for big data.

CV:

Kenji Fukumizu obtained BS in 1989 and Ph.D (Science) in 1996 from Kyoto University.
He worked as a researcher in Ricoh, Co., Ltd. till 1997. After working in RIKEN Brain Science Institute for two years, he joined The Institute of Statistical Mathematics in 2000, and has been a professor since 2009.
His research interest includes machine learning and mathematical statistics.

Daichi Mochihashi
(The Institute of Statistical Mathematics)

Title:

Nonparametric Bayesian Methods in Audio and Language processing

Abstract:

In order to deal with increasingly complex data of audio and language, unsupervised statistical methods are indispensable for these data.
In this talk, I will talk about three such methods: unsupervised audio events extraction (audio analysis), modeling styles in singing songs (music analysis), and unsupervised joint induction of words and part-of-speech from raw strings (natural language processing).
All of these methods are becoming important, and can be solved by nonparametric Bayesian methods: indian buffet processes and Gaussian processes.

CV:

Daichi Mochihashi obtained BS from The University of Tokyo and PhD from Nara Institute of Science and Technology in 1998 and 2005, respectively. Prior to ISM, He was a research associate at ATR Spoken Language Communication Research Laboratories and NTT Communication Science Laboratories, both at Kyoto, Japan. His research interests focus on statistical natural language processing and Bayesian statistics.

Ken-ichi Kawarabayashi
(National Institute of Informatics)

Title:

Large Graphs: Analysis and Efficient Algorithm

Abstract:

Large-scale networks are now everywhere . Such large networks include the web structures of the Internet, and social networks like Facebook and Twitter. All these networks are expanding rapidly and are expected to grow to a scale of more than 10 billion users in the near future.

Starting from 2013, I am leading JST ERATO project (> $15M) for 5 years, and the main focus is “Large Graphs”. The purpose of this project is to tackle on large graphs to provide efficient algorithms, based on theoretical research. Theoretical research includes Discrete Math, Combinatorial Optimization, Graph Algorithm, Theoretical Computer Science, Statics, Machine Learning, Data Mining, Statical Physics etc.

In this talk, we introduce this project in details, and present some sucess.

CV:

Ken-ichi Kawarabayashi is a full professor of National Institute of Informatics. His main research interests include Mathematics (Discrete Mathematics, Combinatorics, Graph Theory), Computer Science (Theoretical Computer Science, Algorithm, Machine Learning, Data Mining, Graph Database, Networking, Natural Language) and Operations Research (Scheduling, Combinatorial Optimization).  He serves editors of several outstanding international journals, including Siam J. Discrete Mathematics, Algorithmica, J. Graph Theory. He is currently leading JST’s ERATO program on large graphs (> $15M), starting in 2012. He has obtained many awards, including Best paper award for Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2013, IBM Japan Prize, the Japan Society for the Promotion of Science prize in 2013, and the Japan Academy Medal in 2013.

Kohei Hayashi
(National Institute of Informatics)

Title:

Factorized information criterion for sparse model selection.

Abstract:

Latent variable models such as mixture models and hidden Markov models represent high-dimensional observed data by low-dimensional latent variables. Factorized information criterion (FIC) has been recently developed for determining the latent variable dimensionality. FIC has nice properties that it is asymptotically equivalent to the marginal log-likelihood and provides a tractable optimization algorithm. In this talk, we introduce FICs of latent feature models and Bayesian principal component analysis.

CV:

Kohei Hayashi is a project researcher at Global Research Center for Big Data Mathematics, National Institute of Informatics. He received the B.Eng degree from Ritsumeikan University in 2007, and M.Eng and Ph.D degrees from Nara Institute of Science and Technology in 2009 and 2012, respectively. From 2012 to 2013, he had been a JSPS Postdoc at the University of Tokyo.
His research interests are in machine learning especially for relational data analysis and Bayesian probabilistic modeling.

 

Workshop Part 2
Big Data: Application and Utilisation

Masayuki Hirafuji
(National Agriculture and Food Research Organization / University of Tsukuba)

Title:

Agricultural Big Data and Applications

Abstract:

Decision making in agriculture is very difficult, since the agricultural production is related to many complex phenomena. Decision support systems should assist users based on environment conditions, crop conditions, farm works, and production costs. The agricultural big data including such data can be created automatically by IoT/M2M devices, which are field sensor networks, UAVs, ECUs embedded on agricultural machines and mobile/wearable gadgets. In addition, recent new fabrication method, called personal/digital fabrication, is enabling instant development of new sensing devices corresponding to various demands in plant phenotyping. The agricultural big data including Omics data such as phenome and genome can improve farming methods and breeding methods. We are developing a platform (CLOP: CLoud Open Platform) to create the agricultural big data. CLOP provides also functions such as recommendation service and prediction service based on machine learning using open source software (Hadoop and Mahout). Cloud service vendors can construct advanced agricultural cloud services in low-cost.

CV:

Masayuki Hirafuji (Japan, 1956), Ph.D. is a director of NARO in HARC (Hokkaido Agricultural Research Center) and professor of University of Tsukuba. After he graduated from a postgraduate course (Biological environmental control engineering) in University of Tokyo, he has been working as a research scientist in the Instate of MAFF (Ministry of Agriculture, Forestry and Fisheries). He has investigated applications and theories in computational modeling of biological/agricultural/evolutional systems, neural networks, environmental control systems, space agriculture, and so on. Recently he is studying data integration methods for CLOP (CLoud Open Platform) and applications based on big data for smart agriculture.

Naoki Nakashima  (Kyushu University Hospital)

Title:

A Big Data Analysis Made a comprehensive eHealth-Telemedicine Program Cost Effective

Abstract:

The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention.
We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required.
The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries.
The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.

CV:

Naoki Nakashima MD PhD is the Director/Professor (2014-) of the Medical Information Center of Kyushu University Hospital, and also a visiting professor of National Institute of Informatics, Japan. He has been a specialist of diabetes mellitus for 25 years and simultaneously worked as a specialist of medical informatics for 13 years. He is a councillor member of Japanese Society of Diabetes Mellitus and the vice-president of Japan Association for Medical Informatics (JAMI). He focuses on the disease management methodology of chronic diseases from primary to tertiary prevention with sensor networking technology.
He is also a founding member (2003-)/vice director (2012-) of “Telemedicine Development Center of Asia (TEMDEC)” in Kyushu University. TEMDEC is the most active institute for international telemedicine in Asia-Pacific area. He is recently conducting cost-effective health check-up by sensor network and telemedicine with Grameen group in Bangladesh (2012-).

Yuzuru Tanaka  (Hokkaido University)

Title:

Exploratory Visual Analytics for Winter Road Management using Statistically Preprocessed Probe-Car Data

Abstract:

Social CPSs (Cyber-Physical Systems) denote the extended application of the idea of CPSs to the monitoring and control of urban-scale social infrastructure systems. They utilize both cyber data stored in databases and physical data coming from sensor networks in the target physical world for the analysis and optimized control of urban infrastructure systems such as traffic, energy, and water services. This talk will focus on the winter road management in Sapporo where we have the world biggest annual snow fall among the cities with more than 1 million populations. For monitoring the road conditions over the whole city, the use of probe-car data without violating personal data protection is fundamental. This talk will first show that probe car data statistically preprocessed over road links for an urban-scale area still allow us to visualize the dynamic change of the traffic flow in terms of the divergence and flow vector field. These give us sufficient information about the dynamic change of hotspots of traffic, main traffic streams, and route selection preference. The talk will also shows more complex and advanced analyses of such data, especially for better winter road management in Sapporo. We will extend the well-known multiple coordinated views framework for exploratory visual analytics to multiple coordinated views and analyses by integrating analysis tools with their result visualization views into the same environment. These newly added views may also coordinate with others, and allow users to directly select clusters or mined patterns calculated at runtime to further quantify the underlying database view. Exploratory visual analytics with such an environment enables us to detect road links for effective pinpoint snow removal.

CV:

Yuzuru Tanaka has been a full professor of computer architecture at the Department of Electrical Engineering (1990-2003), then of knowledge media architecture at the Department of Computer Science, Graduate School of Information Science and Technology (2004- ), Hokkaido University, and the founding director of Meme Media Laboratory (1995-2013), Hokkaido University. He was also a full professor of Digital Library, Graduate School of Informatics, Kyoto University (1998-2000) in parallel, and has been an adjunct professor of National Institute of Informatics (2004- ). His research areas covered multiprocessor architectures, database schema-design theory, database machine architectures, full text search of document image files, and automatic cut detection in movies and full video search. His current research areas cover meme media architectures, knowledge federation frameworks, proximity-based federation of smart objects, and their application to digital libraries, e-Science, clinical trials, and social cyber-physical systems for the optimization or improvement of social system services such as snow plowing and removing in Sapporo City. He worked as a visiting research fellow at IBM T.J. Watson Research Center (1985-1986), fellows of Information Processing Society of Japan and Japanese Society of Software Science, an affiliated scientist of FORTH in Crete (2010- ), a series editor of Springer’s LNAI (lecture Notes in Artificial Intelligence), and the program officer of JST’S eight year CREST Program on Big Data Application Technologies (2013- ). He has been involved in EU’s FP6 Integrated Project ACGT (Advancing Clinico-Genomic Trials on Cancer), FP7 Best Practice Network Project ASSETS (Advanced Search Services and Enhanced Technological Solutions for the European Digital Library), and FP7 Large Integration Project p-medicine (personalized medicine).

Yi-ke Guo
(Imperial College London, Data Science Institute)

Title:

Big Data for Better Science : An introduction to Data Science Institute of Imperial College

Abstract:

We live in a world where billions of gigabytes of data are generated every day around us. Data is a major asset in the search for solutions across Science, Engineering, Business and Medicine, and finding news methods to store, mine and visualise this data is becoming increasing important.
In April 2014, Imperial College London launched the Data Science Institute (DSI) as its 5th cross Faculty Institute tackling grand challenges. Its mission is to provide a focal point for Imperial College’s capabilities in multidisciplinary data-driven research by coordinating advanced data research for College scientist and partners, alongside educating the next generation of scientists. The DSI conducts research on core data science to develop advanced theory, technology, and systems that will contribute to state-of-the-art in data science and support world-class research at Imperial and beyond.
Located in London, the DSI is very much a global Institute, developing international partnerships and collaborations to empower engagement between Institutions and Industry in pursuit of data driven innovation. The Institute aims to generate significant intellectual property and, through strategic partnerships, to translate this into social and economic impacts. In this talk, we will present the missions and the vision of the DSI. We will also overview the one year progress of the DSI, the experience of building the institute and the future of its development.

CV:

Director, Data Science Institute (DSI), Imperial College London (ICL), UK.
He is the founding Director of the Data Science Institute at Imperial College, as well as leading the Discovery Science Group in the department. Professor Guo also holds the position of CTO of the tranSMART Foundation, a global open source community using and developing data sharing and analytics technology for translational medicine.

 

Workshop Part 3
Big Data: Data Disclosure and Privacy

Ichiro Sato  (National Institute of Informatics)

Title:

Trends in Japan’s Legal Reforms for Big Data and Personal Data

Abstract:

The utilization of personal data may offer unprecedented opportunities to create social and economic value by using big data technologies, but may result in privacy problems. The growing volumes of data and the lack of institutional capacities have outstripped existing policy frameworks. To solve this problem, Japan government plans to change institutions for protecting personal data. I briefly introduce trends in reforming laws for personal data in Japan.

CV:

Ichiro Satoh received his B.E., M.E, and Ph.D. degrees in Computer Science from Keio University, Japan in 1996. From 2001 to 2005, he was an associate professor in National Institute of Informatics (NII), Japan. Since 2006, he has been a professor of NII. His current research interests include, distributed and ubiquitous computing. A member of the Cabinet Secretariat’s study group on personal data and the chairman of its working group for technical issues.

Sir Nigel Shadbolt  (Open Data Institute)

Title:

Privacy in an Age of Data

Abstract:

We live in an age of superabundant information. The Internet and World Wide Web have been the agents of this revolution. A deluge of information and data has led to a range of scientific discoveries and engineering innovations. Data published on the Web has enabled the mobilisation of hundreds of thousands of humans to solve problems beyond any individual or single organisation. Open data published on the Web is improving the efficiency of our public services and giving rise to open innovation. Data science is emerging as an area of competitive advantage for individuals, companies, universities, public and private sector organisations and nation states. But data collected at scale by public and private agencies also gives rise to concerns about its use and abuse. How are we to retain our concept of privacy in this age of data? This talk will examine this challenge.

CV:

Sir Nigel Shadbolt is Professor of Artificial Intelligence at the University of Southampton. He is also the Chairman and Co-Founder of the Open Data Institute (ODI). Since 2009, Sir Nigel has acted as an Information Adviser to the UK Government, helping transform public access to Government information, including the widely acclaimed data.gov.uk site. With over 400 publications he researches and publishes on computer science, artificial intelligence, open data and web science. During his career, he has also worked in philosophy, psychology and linguistics.

Today, Nigel draws together this multidisciplinary expertise to focus on understanding how the web is evolving and changing society. He is passionate about how humans and computers can solve problems together at web scale. He is currently Principal Investigator on a £6.14M EPSRC funded Programme Grant researching the theory of social machines – Web scale problem solving systems comprising large numbers of humans and computers. Throughout his career he has been involved in translating research into commercial products and services. In 2006 he was one of three founding Directors of Garlik Ltd, which in 2008 was awarded Technology Pioneer status by the Davos World Economic Forum and won the prestigious UK national BT Flagship Award. Garlik was acquired by Experian Ltd in 2011.

In 2013 he was awarded a Knighthood for services to science and engineering. In August 2015 he will move to Oxford to become Principal of Jesus College and will join the Department of Computer Science as a Professorial Research Fellow in Computer Science.

Copyright © National Institute of Informatics Kawarabayashi Large Graph Project.